Saturday, October 24, 2015

Bayes 10: independence and relevance

Since the Bayes logic by its nature works only with the mutually exclusive hypotheses, another way to use it for the independent hypotheses is by splitting a single probability table into multiple probability tables, one per hypothesis. Each hypothesis will get its own table that deals with only two outcomes: H and ~H. This way the overlap of multiple hypotheses in the same case gets represented accurately.

For example, let's look again at the training data from the previous installment:

# tab09_01.txt and tab10_01.txt
              evA evB evC
1 * hyp1,hyp2 1   1   0
1 * hyp2,hyp3 0   1   1
1 * hyp1,hyp3 1   0   1

But this time let's compute the independent tables. I'll show only one table for hyp2, the other tables get computed in the same way:

# tab10_01.txt
!,,evA,evB,evC
hyp2,0.66667,0.5,1,0.5
~hyp2,0.33333,1,0,1

The total number of cases is 3, and the hypothesis hyp2 is true for two of them. So the prior probability P(hyp2) = 0.66667. The prior probability P(~hyp2) = 1 - P(hyp2) = 0.33333. The event evA is true in 1 of 2 cases where the hypothesis hyp2 is true, so P(evA|hyp2) = 0.5. The event evA is true in 1 of 1 cases where the hypothesis hyp2 is false, so P(evA|~hyp2) = 1. And so on.

By the way, if you think that you can compute P(E|~H) for the independent tables by deriving it from the table for the mutually-exclusive hypotheses using a formula like the following, you're wrong.

P(E|~H) = count(E|~H) / count(~H) = ( count(E)-count(E|H) ) / count(~H)
 = ( P(E)*count(all) - P(E|H)*count(H) ) / ( count(all) - count(H) )
 |divide by count(all) = ( P(E) - P(E|H)*P(H) ) / ( 1 - P(H) )

This formula describes the dependency correctly only for the mutually-exclusive hypotheses. But we had to distort the data to make it fit the mutually-exclusive approach. If you use this formula, you'll just get the other form of the distorted data. Instead you need to go back to the training cases and compute the probabilities directly from it to get the benefit of the independent-hypothesis approach.

Let's try it with the input that points to all 3 hypotheses being true:

# in10_01_02.txt
evA,1
evB,1
evC,1

$ perl ex06_01run.pl -v tab10_01.txt in10_01_02.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.66667,0.50000,1.00000,0.50000,
~hyp2  ,0.33333,1.00000,0.00000,1.00000,
--- Applying event evA, c=1.000000
P(E)=0.666665
H hyp2 *0.500000/0.666665
H ~hyp2 *1.000000/0.666665
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.50000,0.50000,1.00000,0.50000,
~hyp2  ,0.50000,1.00000,0.00000,1.00000,
--- Applying event evB, c=1.000000
P(E)=0.500004
H hyp2 *1.000000/0.500004
H ~hyp2 *0.000000/0.500004
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,1.00000,0.50000,1.00000,0.50000,
~hyp2  ,0.00000,1.00000,0.00000,1.00000,
--- Applying event evC, c=1.000000
P(E)=0.500000
H hyp2 *0.500000/0.500000
H ~hyp2 *1.000000/0.500000
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,1.00000,0.50000,1.00000,0.50000,
~hyp2  ,0.00000,1.00000,0.00000,1.00000,

It shows very clearly that hyp2 is true. Another input, this time pointing to a different hypothesis:

# in10_01_03.txt
evA,1
evB,0
evC,0

$ perl ex06_01run.pl -v -c 0.01 tab10_01.txt in10_01_03.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.66667,0.50000,1.00000,0.50000,
~hyp2  ,0.33333,1.00000,0.00000,1.00000,
--- Applying event evA, c=0.990000
P(E)=0.666665
H hyp2 *0.500000/0.663332
H ~hyp2 *0.990000/0.663332
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.50252,0.50000,1.00000,0.50000,
~hyp2  ,0.49748,1.00000,0.00000,1.00000,
--- Applying event evB, c=0.010000
P(E)=0.502516
H hyp2 *0.010000/0.497534
H ~hyp2 *0.990000/0.497534
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.01010,0.50000,1.00000,0.50000,
~hyp2  ,0.98990,1.00000,0.00000,1.00000,
--- Applying event evC, c=0.010000
P(E)=0.994950
H hyp2 *0.500000/0.014949
H ~hyp2 *0.010000/0.014949
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.33782,0.50000,1.00000,0.50000,
~hyp2  ,0.66218,1.00000,0.00000,1.00000,

Shows fairly clearly that the hyp2 is unlikely in this case.

Now the other input, with all 3 hypotheses being false (and also an "impossible input" from the standpoint of the training data, so we'll use the capping option):

# in10_01_01.txt
evA,0
evB,0
evC,0

$ perl ex06_01run.pl -v -c 0.01 tab10_01.txt in10_01_01.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.66667,0.50000,1.00000,0.50000,
~hyp2  ,0.33333,1.00000,0.00000,1.00000,
--- Applying event evA, c=0.010000
P(E)=0.666665
H hyp2 *0.500000/0.336668
H ~hyp2 *0.010000/0.336668
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.99010,0.50000,1.00000,0.50000,
~hyp2  ,0.00990,1.00000,0.00000,1.00000,
--- Applying event evB, c=0.010000
P(E)=0.990099
H hyp2 *0.010000/0.019703
H ~hyp2 *0.990000/0.019703
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.50252,0.50000,1.00000,0.50000,
~hyp2  ,0.49748,1.00000,0.00000,1.00000,
--- Applying event evC, c=0.010000
P(E)=0.748742
H hyp2 *0.500000/0.256233
H ~hyp2 *0.010000/0.256233
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.98058,0.50000,1.00000,0.50000,
~hyp2  ,0.01942,1.00000,0.00000,1.00000,

Whoops, it shows that hyp2 is very likely, even though it shouldn't be. I was very enthusiastic about the approach with the independent tables until I've discovered this effect, that the "impossible data" can easily drive the computation way wrong.

In most cases this issue can be resolved by using both the mutually-exclusive and independent approaches. First run the mutually-exclusive table and find the candidate hypotheses. Then run the independent computation for these candidate hypotheses, and accept them only if their probability gets over the acceptable probability limit (the limits may be chosen differently for the mutually-exclusive and the independent cases). This approach generally works well to pick the relevant hypotheses in case if the mutually-exclusive approach picks a large number of them as probable.

In this particular case though the result of the mutually-exclusive computation is:

$ perl ex09_02run.pl -v -c 0.01  tab09_02.txt in10_01_01.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,0.33333,1.00000,0.50000,0.50000,
hyp2   ,0.33333,0.50000,1.00000,0.50000,
hyp3   ,0.33333,0.50000,0.50000,1.00000,
--- Applying event evA, c=0.010000
P(E)=0.666660
H hyp1 *0.010000/0.336673
H hyp2 *0.500000/0.336673
H hyp3 *0.500000/0.336673
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,0.00990,1.00000,0.50000,0.50000,
hyp2   ,0.49503,0.50000,1.00000,0.50000,
hyp3   ,0.49503,0.50000,0.50000,1.00000,
--- Applying event evB, c=0.010000
P(E)=0.747503
H hyp1 *0.500000/0.257447
H hyp2 *0.010000/0.257447
H hyp3 *0.500000/0.257447
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,0.01923,1.00000,0.50000,0.50000,
hyp2   ,0.01923,0.50000,1.00000,0.50000,
hyp3   ,0.96143,0.50000,0.50000,1.00000,
--- Applying event evC, c=0.010000
P(E)=0.980658
H hyp1 *0.500000/0.028955
H hyp2 *0.500000/0.028955
H hyp3 *0.010000/0.028955
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,0.33204,1.00000,0.50000,0.50000,
hyp2   ,0.33204,0.50000,1.00000,0.50000,
hyp3   ,0.33204,0.50000,0.50000,1.00000,
--- Result:
hyp1   ,0.33204,
hyp2   ,0.33204,
hyp3   ,0.33204,

It also thinks that all three hypotheses are true! Well, it's one of these cases when the addition of the hypothesis "ok" as described above would help a lot.

The other approach to resolving this issue comes from the analysis of why did the probabilities get driven like this. Looking at the log of the independent computation for hyp2, we can see that P(hyp2) gets driven up a lot after applying evA=0 and evC=0. What does the table of probabilities contain for these events? Let's look at it again:

# tab10_01.txt
!,,evA,evB,evC
hyp2,0.66667,0.5,1,0.5
~hyp2,0.33333,1,0,1

The probabilities P(evA|hyp2) and P(evC|hyp2) equal to 0.5, which means "these events don't matter for this hypothesis"! However they turn out to matter a lot for ~hyp2, where P(evA|~hyp2) and P(evC|~hyp2) equal to 1. We can try to doctor the probability table, saying that if these events are irrelevant for hyp2, they should also be irrelevant for ~hyp2.

To generalize, we can say that if P(E|H) is within a certain range around 0.5 (say, 0.25..0.75, or fine-tune this range to some other width), we'll consider the event irrelevant for the hypothesis and replace its probability with 0.5 for both H and ~H. We could as well just remove it from the table altogether but then the program will complain about an unknown event name, so disabling the event by altering its probabilities is more convenient. We'll get the table for hyp2 that looks like this:

# tab10_02.txt
!,,evA,evB,evC
hyp2,0.66667,0.5,1,0.5
~hyp2,0.33333,0.5,0,0.5

$ perl ex06_01run.pl -v -c 0.01 tab10_02.txt in10_01_01.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.66667,0.50000,1.00000,0.50000,
~hyp2  ,0.33333,0.50000,0.00000,0.50000,
--- Applying event evA, c=0.010000
P(E)=0.500000
H hyp2 *0.500000/0.500000
H ~hyp2 *0.500000/0.500000
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.66667,0.50000,1.00000,0.50000,
~hyp2  ,0.33333,0.50000,0.00000,0.50000,
--- Applying event evB, c=0.010000
P(E)=0.666670
H hyp2 *0.010000/0.336663
H ~hyp2 *0.990000/0.336663
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.01980,0.50000,1.00000,0.50000,
~hyp2  ,0.98020,0.50000,0.00000,0.50000,
--- Applying event evC, c=0.010000
P(E)=0.500000
H hyp2 *0.500000/0.500000
H ~hyp2 *0.500000/0.500000
!      ,       ,evA    ,evB    ,evC    ,
hyp2   ,0.01980,0.50000,1.00000,0.50000,
~hyp2  ,0.98020,0.50000,0.00000,0.50000,

The result is much better! The model had picked up on the idea of what events are relevant for what hypothesis.

This approach is not perfect, since the relevance computation depends a lot on the randomness in the mix of the cases in the training data but it's definitely a big improvement. I'll talk more about the ways of finding the relevance in the future installments.

No comments:

Post a Comment