## Sunday, October 4, 2015

### Bayes 2: hypotheses and events

A bayesian expert system deals with the probabilities of hypotheses. It tries to compute these probabilities based on the experimentation with the black box, and choose the most probable one (or maybe more than one).

The probability of a hypothesis H is commonly written as P(H). Naturally, it's a real number in the range between 0 ("definitely false") and 1 ("definitely true"). If you haven't dealt with the probabilities before, now you know this.

Each hypothesis H has an opposite hypothesis that I'll write as ~H (in the mathematical texts it's written as H with a line above it but it's inconvenient to type, so I chose a notation like the C language). If the hypothesis Hlupus is "the patient has lupus" then the opposite ~Hlupus will be "the patient doesn't have lupus". I'll also call ~H the "negative hypothesis", as opposed to the "positive" H.

The probabilities of positive and negative hypotheses are connected:

P(H) + P(~H) = 1

Which means that whatever happens, exactly one of H and ~H is true. We might not know for sure which one is true but if we think that H has the probability 0.7 (or in other words 70%) of being true, this means that it has the probability 0.3 of being false, while the opposite ~H must have the probability 0.3 of being true and 0.7 of being false. This is a very rigid connection, and we can use it for various deductions.

An event is not just an experiment but an experiment with a particular outcome. Not just "look at the patient's throat" but "I've looked at the patient's throat and found that it's sore". Just as the hypotheses, the events have their opposites too. If the event E is "I've looked at the patient's throat and found that it's sore", its opposite ~E will be "I've looked at the patient's throat and found that it's NOT sore". The experiment is the same but its outcome is opposite for the negative event. It's real important to note that the case "I haven't looked at the patient's throat yet" is different: until we look at the throat, neither E or ~E have happened. But once we look at the throat, we can tell that either E is true and ~E is false for this particular patient or the other way around, E is false and ~E is true. A typical mistake (one found in Wikipedia for example) is to mix up the cases when we haven't looked at the throat and when we looked at the throat and found that it's not sore. If you mix them up, your calculations will all be way wrong. Beware of it.

A clever reader might say "what if I looked at the patient's throat and can't say for sure whether it's sore or not, it's slightly reddish but not totally so"? Glad you asked, that's the department of the fuzzy logic, and I'll get to it later. But for now let's assume that we can tell whether it's one or the other. If we can't tell then we'll just treat it the same way as we haven't done the experiment at all (which is by the way consistent with how it's treated by the fuzzy logic).

We can't know the outcome of an experiment before we do the experiment, whether E or ~E will be true. But we can estimate the probability of E in advance, based on the other information we have collected so far. This advance estimation is called the probability of an event, P(E). For example, if we know that 30% of patients visiting a doctor have a sore throat, we can estimate P(Esorethroat)=0.3 based on our knowledge that this person came to see a doctor.

Just as with hypotheses, the probabilities of the complementary events are connected:

P(E) + P(~E) = 1

And it's really not any different from a hypothesis. It really is a hypothesis about the outcome of an experiment we haven't done yet, which then collapses to the true or false after the experiment is done and the result is observed.

Some experiments may have more than two possible outcomes. For example, suppose our black box under investigation contains some colored balls in it, and we can stick a hand in it and pull out a ball that might be of one of three colors: blue, yellow or green. This can be treated as three separate events:

Eblue: the ball is blue
Eyellow: the ball is yellow
Egreen: the ball is green

The total of all three probabilities will still be 1:

P(Eblue) + P(Eyellow) + P(Egreen) = 1

The connection between the probabilities of events means that these events are not independent. Observing any of these three events changes the probabilities of the other two events.

But the sum of probabilities in each pair will also be 1:

P(Eblue) + P(~Eblue) = 1
P(Eyellow) + P(~Eyellow) = 1
P(Egreen) + P(~Egreen) = 1

The connected events can be processed sequentially. We can predict the probabilities of the events that haven't been processed yet after each step of processing. We can achieve this in the real world by having an assistant pull out the ball out of the box and then asking him questions with the answers of "yes" or "no". Suppose we know for some reason that there is an equal probability for each color. Thus initially:

P(Eblue) = 1/3
P(~Eblue) = 2/3

P(Eyellow) = 1/3
P(~Eyellow) = 2/3

P(Egreen) = 1/3
P(~Egreen) = 2/3

Then we have the assistant pull out the ball and ask him: "Is the ball blue?" If the answer is "No" then the event probabilities change:

P(Eblue) = 0
P(~Eblue) = 1

P(Eyellow) = 1/2
P(~Eyellow) = 1/2

P(Egreen) = 1/2
P(~Egreen) = 1/2

The probabilities of Eblue and ~Eblue have collapsed to 0 and 1 but the remaining two outcomes are now equi-probable between two, not three.

Now we can ask the next question: "Is the ball yellow?". If the answer is again "No" then the probabilities change like this:

```P(Eblue) = 0
P(~Eblue) = 1

P(Eyellow) = 0
P(~Eyellow) = 1

P(Egreen) = 1
P(~Egreen) = 0```

At this point we also become quite sure that the ball must be green, and we can as well skip asking about it. Not in the real world though. In the real world something might have changed since we've made our original estimations and more colors became available to the box-builder. Or maybe the fourth outcome is extremely rare and our training data didn't contain any example of it. Thus it could happen that we ask "Is the ball green?" expecting to hear "Yes" and get a "No". Obviously, this would be a curve-ball that disturbs the model, but I'll show a possible way of dealing with it.