Sunday, March 5, 2017

Bayes 25 & AdaBoost: the partial confidence problem

The code in the previous example works only with the full-confidence events, those that have the confidence of 0 or 1. Unfortunately I haven't found a good way to handle the partial confidence yet. I've found a way but it has problems. Let me show you.

Previously I was treating the partial confidence as a superposition of 0 and 1. This approach cannot be properly used here because it doesn't cover the connections between the events. To give an example, in the previous post the code had correctly found that in tab24_01.txt the events E1and E2 are the same, so after E1 is applied, E2 becomes ignored (it's AdaBoost probabilities become 0.5). So now if we then do a computation and find that E1 is unknown (i.e. its confidence is 0.5) but E2 is known, E2 will still be ignored because its table entry says to always ignore it.

The correct way should handle this situation properly: if E1 is unknown, this should bring up the table values for E2 to compensate for it and treat E2 as E1 was formerly.

The obvious way to do it is to treat the partial confidence as a superposition of "the event is unknown" and "the event is known". So if the event is known, we process it as before. If the event is unknown, we would follow the unknown branch and then apply all the following events as if this events wasn't there. And if the event is partially known, we would split the weights of all the outcomes proportionally and handle one part with the known branch and another one with the unknown branch.

So the confidence of 0.7 would mean a superposition of 0.4 of unknown and 0.6 of the known value "1". The confidence 0.1 would mean a superposition of 0.8 of unknown and 0.2 of the known value "0". So if we denote C(E) as the event confidence, K(E) as the known weight in the superposition, and U(E) as the unknown weight in the superposition, we can write them as:

K(E) = abs(C(E) - 0.5) * 2
U(E) = 1 - K(E)

The downside of this obvious approach is that the table and computation becomes more complex. As we go through more events, two branches become four, then eight, then sixteen, and so on. The formerly quadratic complexity O(Noutcomes * Nevents) becomes exponential O(Noutcomes*Nevents*2^Nevents). I don't know yet if there is a better way.

Maybe it would be possible to build a table of "gradients" between the events: i.e. if the confidence of the event A changes, how are the probabilities of another event B affected. The problem though is that this gradient depends on all the events in the sequence between A and B, so tis idea doesn't seem to really fly.

Maybe another idea is workable, that would reduce the exponential cost to a cubic one: after splitting the superposition, instead of following two branches, just take the probabilities for each following event from both branches and mix them by the weight of the branches:

P(H|Ei) = K(Ej) * P(H|KjEi) + U(Ej) * P(H|UjEi)

It's better than exponential but still not great, with the run time O(Noutcomes*Nevents*Nevents). And the table of these pre-computed values would still seem to be exponential, each event depending on all the possible combinations of the previous events.

No comments:

Post a Comment