Wednesday, October 7, 2015

Bayes 4: the fuzzy logic

Before moving on to the other aspects, I want to wrap up with the basic formula, this time in its fuzzy-logic version. As I've already mentioned, sometimes it's hard to tell if the event is true or not. It would be nice to have some way to express the idea of the event being very likely or slighlty likely or slightly or very unlikely. It would be even nicer to represent our confidence in the event happening as a continuous range, with 1 meaning "the event is definitely true", 0 meaning "the event is definitely false", 0.5 meaning "we can't tell at all if the event is true or false", and the points in between representing the values in between. Let's use the letter C to denote this confidence, or maybe C(E) to show the confidence of which event we're talking about.

There is a way to generalize the Bayes formula for this confidence range. Way back when I've read it in that British book without explantions and marveled at how people come up with these formulas but in the more recent times I was able to re-create it from scratch from the logical consideration, so I'll show you, where does this marvelous formula come from (and a little later I'll show you where the bayes formula comes from too).

First, let's look at the boundary conditions. Kind of obviously, with C(E)=1 the formula must give the same result as the basic Bayes formula with E being true, with C(E)=0 it must give the same result as the basic formula for E being false, and for C(E)=0.5 it must leave the P(H) unchanged.

Let's look at the basic formulas again:

P(H|E) = P(H) * ( P(E|H) / P(E) )
P(H|~E) = P(H) * ( P(~E|H) / P(~E) )

If we substitute

P(~E|H) = 1 - P(E|H)
P(~E) = 1 - P(E)

then the formulas become:

P(H|E) = P(H) * ( P(E|H) / P(E) )
P(H|~E) = P(H) * ( (1-P(E|H)) / (1-P(E)) )

In both cases we multiply the same P(H) by some coefficient. The coefficient in both cases is computed by division of something done with P(E|H) by something done with P(E). For symmetry we can write in the first formula P(E|H) as 0+P(E|H) and P(E) as 0+P(E), then the formulas become even more like each other:

P(H|E)  = P(H) * ( (0+P(E|H)) / (0+P(E)) )
P(H|~E) = P(H) * ( (1-P(E|H)) / (1-P(E)) )

If we represent the points 0+P(E|H) and 1-P(E|H) graphically as a line section, we can see that in both cases they are located at the distance P(E|H), only one is to the right from 0, and another one is to the left of 1:

     0   P(E|H)         1-P(E|H)  1

And the same works for P(E). This way it looks even more symmetrical.

The natural follow-up from here is that we can use the confidence value C(E) to split the sections [ 0+P(E|H), 1-P(E|H) ] and [ 0+P(E), 1-P(E) ] proportionally. For example, if C(E) = 0.75, it will look like this:

           P(0.75 E|H)
     0   P(E|H) |       1-P(E|H)  1
             0.25   0.75

The (0+P(E|H)) gets taken with the weight C(E) and (1-P(E|H)) gets taken with the weight 1-C(E) and they get added together. If C(E) is 0.75, it means that the split point will be located closer to (0+P(E|H)), dividing the original range at its quarter-point.

In the formula form this can be expressed as:

P(H|C(E)) = P(H) * ( C(E)*(0+P(E|H)) + (1-C(E))*(1-P(E|H)) )
 / ( C(E)*(0+P(E)) + (1-C(E))*(1-P(E)) )

Let's check that this formula conforms to the boundary conditions formulated above:

P(H|C(E)=1) = P(H) * ( 1*(0+P(E|H)) + (1-1)*(1-P(E|H)) )
 / ( 1*(0+P(E)) + (1-1)*(1-P(E)) )
 = P(H) * P(E|H) / P(E)

P(H|C(E)=0) = P(H) * ( 0*(0+P(E|H)) + (1-0)*(1-P(E|H)) )
 / ( 0*(0+P(E)) + (1-0)*(1-P(E)) )
 = P(H) * (1-P(E|H)) / (1-P(E))

P(H|C(E)=0.5) = P(H) * ( 0.5*(0+P(E|H)) + (1-0.5)*(1-P(E|H)) )
 / ( 0.5*(0+P(E)) + (1-0.5)*(1-P(E)) )
 = P(H) * ( 0.5*P(E|H)) + 0.5*(1-P(E|H)) )
 / ( 0.5*P(E) + 0.5*(1-P(E)) )
 = P(H) * ( 0.5*P(E|H)) + 0.5 - 0.5*P(E|H)) )
 / ( 0.5*P(E) + 0.5 - 0.5*P(E)) )
 = P(H) * 0.5 / 0.5
 = P(H)

They all match. So this is the right formula even though a bit complicated. We can make it a bit simpler by splitting it into two formulas, one representing the function for the proportional splitting, and another one computing the probability with the use of the first function:

Pc(P, C) = P*C + (1-P)*(1-C)
P(H|C(E)) = P(H) * Pc(P(E|H), C(E))/Pc(P(E), C(E))

Now you might complain "but I don't like the shape of this dependency I feel that the confidence should be represented in a more non-linear way in one way or another". Well, if it's too linear for you, you don't have to use it directly. You can define your own confidence function Cmy(E) and map its values into C(E) in any kind of non-linear form, and then use the resulting C(E) values for computation.

In fact, I like a variation of that trick myself. I like limiting the values of C(E) to something like 0.00001 and 0.99999 instead of 0 and 1. It helps the model recover from the situations that it would otherwise think impossible and provide a way to handle the problem of the overfitting. But more on that later.

No comments:

Post a Comment