Tuesday, December 8, 2015

Bayes 18: finding the relevance

I've previously promised to describe the ideas I have for computing the relevance values, and here we go. I haven't tested this ideas with a set of production data. They're just ideas. But they look like a reasonable start.

The general idea should be to tell apart if this value of this event differentiates this hypothesis/case from the other hypotheses/cases or not. If it does differentiate, it's relevant, otherwise it's irrelevant.

Taking two cases, we should be able to differentiate them based only on the relevant events in them both (and obviously a relevant event in one case can only be used to make a decision for this case, not for the other one).

For example, suppose that we have two cases with outcome of two different hypotheses where the event values are represented as a bit string:

hyp1 0 0 0 1 0 0
hyp2 0 1 0 0 0 0

We can mark some events irrelevant by "-" (how we do this is a separate question, for now let's suppose that we've done it somehow) and still be able to differentiate these cases:

hyp1 - - - 1 - -
hyp2 - 1 - - - -

The bit string "0 0 0 1 0 0" will still match only the case for hyp1, and "0 1 0 0 0 0" will still match only the case for hyp2. Obviously, when working with the full training set, we should be able to differentiate between any two cases in the training set.

As it turns out, this problem of finding the smallest set of differentiating bits is already known. It's the same problem as the optimization of a set of boolean functions in the hardware engineering. The very classic way to approach it is with the Karnaugh maps, though they don't scale so easily to the large number of bits. But there's got to be some newer and better ways for this optimization.

This approach has its pitfalls too. For another example, suppose that we have the cases:

hyp3 0 0 0 1 0 0
hyp4 0 1 0 1 0 0

Then we can differentiate them based purely on one event:

hyp3 - 0 - - - -
hyp4 - 1 - - - -

Of course, this works only as long as we have only two events. And it would likely result in the spurious false positives: in case if all the events in the input are at 0, they would still be taken to mean hyp3.

This can be resolved or at least improved by having the training set include the set of the cases where there is nothing wrong. What is known as the "null hypothesis", and what I've been marking as the "ok" hypothesis in the previous examples. The events that differ in value from the null hypothesis should always be relevant. If we arrange the event polarity so that the null hypothesis has all the events at 0, this rule translates into "if an event is at 1, it must be relevant", and only the events at 0 need to be considered for the possible irrelevance.

Things get more complicated when we have some cases resulting in multiple hypotheses. Splitting their symptoms into the single-hypotheses cases is not obvious. It can be attempted by subtracting the other single-hypothesis cases and looking at what is left. If some hypothesis have no single-hypothesis cases, that becomes even more strange. Then perhaps these cases should be left as-is because they express the special interactions between multiple hypotheses. Or we can try to find the commonality between all the cases that include this particular hypothesis and discard the differences stemming from the other hypotheses included in these cases.

Let's look at the highly pathological example I've shown before and try to do these manipulations with it. The example was:

# tab09_01.txt
!             evA evB evC
1 * hyp1,hyp2 1   1   0
1 * hyp2,hyp3 0   1   1
1 * hyp1,hyp3 1   0   1

Previously I've calculated the probabilities by hypothesis:

# tab09_01.txt
!,,evA,evB,evC
hyp1,0.66667,1,0.5,0.5
hyp2,0.66667,0.5,1,0.5
hyp3,0.66667,0.5,0.5,1

Those probabilities P(E|H) can be used as a guidance for relevance. If they're close to 0 or 1 (say, below 0.1 or above 0.9) , we'll say that this event is relevant for this hypothesis, and if it's somewhere in the middle, we'll say that it's irrelevant. Then we can use this knowledge to split the cases:

!             evA evB evC
1 * hyp1      1   -   -
1 *      hyp2 -   1   -
1 * hyp2      -   1   -
1 *      hyp3 -   -   1
1 * hyp1      1   -   -
1 *      hyp3 -   -   1

Then we can merge the cases that are the same:

!        evA evB evC
2 * hyp1 1   -   -
2 * hyp2 -   1   -
2 * hyp3 -   -   1

It worked well for this special example but might be not so great in the bigger world. It needs more thinking and experimentation.

No comments:

Post a Comment