Tuesday, November 17, 2015

Bayes 17: confidence subtraction

One of the ways I've come up with for differentiating the valid hypotheses from noise is the idea of subtracting the competing hypotheses.

To remind, what the problem is, consider a simple training table:

# tab17_01.txt
!,,evA,evB,evC
hyp1,1,1,0,0
hyp2,1,0,1,0
hyp3,1,0,0,1

One event indicates one hypothesis. For now let's ignore the idea of the relevance because for now the relevance is not obvious to compute. I actually plan to work up to the computation of the relevance from the subtraction logic.

If we feed the input with all events present (and with capping), that results with all hypotheses getting the equal probability:

# in17_01_01.txt 
evA,1
evB,1
evC,1

$ perl ex16_01run.pl -c 0.01 tab17_01.txt in17_01_01.txt 
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,1.00000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,1.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=0.990000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.99000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.01000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.01000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.990000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00990,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00990,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.00010,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.990000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00010,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00010,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.00010,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

If we feed the input with all the events absent (and with capping), that also results with all hypotheses getting the equal probability:

# in17_01_02.txt 
evA,0
evB,0
evC,0

$ perl ex16_01run.pl -c 0.01 tab17_01.txt in17_01_02.txt 
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,1.00000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,1.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.01000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.99000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.99000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00990,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00990,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.98010,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00980,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00980,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.00980,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

How do we know that for the first input the right answer is "all three hypotheses are true" and in the second input the right answer is "all three hypotheses are false"? Note that if we look at the weights, the weights are much higher for the second input.

The idea I've come up with is that we can take the set of the highly probable hypotheses (all three hypotheses in the examples above) and try to subtract the effects of all but one hypothesis in the set from the input. Then run the modified input through the table again and see if that one remaining hypothesis will pop up above all the others. If it will, it should be accepted. If it won't, it should be refused. Repeat the computation for every hypothesis in the set.

To do that, we need to decide, what does it mean, "subtract"?

It seems reasonable to make the decision based on what probability this event has for this one hypothesis and what probability it has for all the other top hypotheses.

This can be interpreted in two ways depending on what case weights we're using for this computation: these from the original table or these from the result of the first computation. Using the weights from the result of the first computation seems to make more sense, since it favors the cases that have actually matched the input.

OK, suppose, we get these two probability values, how do we subtract the effects? Let's look at some examples of what results would make sense.

Let's name the probability of the event Pone (it can also be called P(E|H) fairly consistently with what we designated by it before, or TC(E|H)) in the situation where the one chosen hypothesis is true, and Prest in the situation where all the other top hypotheses are true. Let's call the actual event confidence C, and the confidence after subtraction Csub.

Some obvious cases would be if Pone and Prest are opposite:

Pone=0, Prest=1, C=1 => Csub=0
Pone=1, Prest=0, C=1 => Csub=1
Pone=0, Prest=1, C=0 => Csub=0
Pone=1, Prest=0, C=0 => Csub=1

Basically, if Prest and C are opposite, C stays as it was, if Prest and C are the same, C flips. The other way to say it is that Csub ends up matching Pone.

The less obvious cases are where both Pone and Prest point the same way. Should C stay? Should it move towards 0.5? One thing that can be said for sure is that C shouldn't flip in this situation. There are arguments for both staying and for moving towards 0.5. This situation means that all the remaining cases match the state of this event, so staying means that there is no reason to penalize one case just because the other cases match it. Moving towards 0.5 means that we say that the rest of the hypotheses can account well for this event by themselves, so let's try to eliminate the "also ran" hypothesis. Staying seems to make more sense to me.

The goal of the subtraction is that with the subtracted confidences applied, a valid hypothesis should be strongly boosted above all others. If it doesn't get boosted (i.e. if it still gets tangled with other hypotheses), it's probably not a valid hypothesis but just some random noise.

The only time I've used the subtraction approach with the real data, I did it in a simple way, and it still worked quite well. That implementation can be expressed as:

Csub = C * TCone/(TCone + TCrest)

Here TCone and TCrest are similar to Pone and Prest but represent the sums of weighted training confidences instead of probabilities:

TCone(E) = sum(TC(E|Hone)*W(Hone))
TCrest(E) = sum(TC(E|Hrest)*W(Hrest))

That implementation was asymmetric: if C is 1, Csub may become less than C, but if C is 0, Csub will stay at 0. It handles reasonably well the situations where the event is mostly positive for the top hypotheses but not the situations where the event is mostly negative for the top hypotheses.

If we compute the values of Csub in this way for the first example above (C(evA)=1, C(evB)=1, C(evC)=1), we will get:

  • For hyp1: Csub(evA)=1, Csub(evB)=0, Csub(evC)=0
  • For hyp2: Csub(evA)=0, Csub(evB)=1, Csub(evC)=0
  • For hyp3: Csub(evA)=0, Csub(evB)=0, Csub(evC)=1

These exactly match the training cases, so all three hypotheses will be accepted, with each hypothesis going to the probability 1 on its run.

If we compute the values of Csub in this way for the second example above (C(evA)=0, C(evB)=0, C(evC)=0), we will get:

  • For hyp1: Csub(evA)=0, Csub(evB)=0, Csub(evC)=0
  • For hyp2: Csub(evA)=0, Csub(evB)=0, Csub(evC)=0
  • For hyp3: Csub(evA)=0, Csub(evB)=0, Csub(evC)=0

They stay the same as the original input, thus the results won't change, the probabilities of all hypothesis will stay at 0.33 for each run, and all the hypotheses will be rejected.

The defect of this formula shows itself when the events are negative, their absence pointing towards the hypotheses, as in the following table:

# tab17_02.txt
!,,evA,evB,evC
hyp1,1,0,1,1
hyp2,1,1,0,1
hyp3,1,1,1,0

In this case the all-0 input should produce the result saying that all the hypotheses are true, and all-1 input should have all the hypotheses as false.

For all-one C(evA)=1, C(evB)=1, C(evC)=1, we will get:

  • For hyp1: Csub(evA)=0, Csub(evB)=0.5, Csub(evC)=0.5
  • For hyp2: Csub(evA)=0.5, Csub(evB)=0, Csub(evC)=0.5
  • For hyp3: Csub(evA)=0.5, Csub(evB)=0.5, Csub(evC)=0

Let's try applying the computed values for hyp1:

# in17_02_03.txt
evA,0
evB,0.5
evC,0.5

$ perl ex16_01run.pl -c 0.01 tab17_02.txt in17_02_03.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp2   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp3   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.99000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp2   ,0.01000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp3   ,0.01000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
--- Applying event evB, c=0.500000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.49500,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp2   ,0.00500,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp3   ,0.00500,1.00000/1.00,1.00000/1.00,0.00000/1.00,
--- Applying event evC, c=0.500000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.24750,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp2   ,0.00250,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp3   ,0.00250,1.00000/1.00,1.00000/1.00,0.00000/1.00,
--- Probabilities
hyp1    0.98020
hyp2    0.00990
hyp3    0.00990

The hyp1 still comes out as true! This is the problem caused by the asymmetry of the formula.

My next idea of a better formula was this: instead of subtractions, just either leave Csub=C or "flip" it: Csub=(1-C). The idea of the flipping is that the direction of C, whether it's less or greater than 0.5, shows the "value" of the event while the distance between C and 0.5 shows the confidence as such. The operation of flipping keeps the confidence (i.e. the distance between C and 0.5) the same while changes the direction. And if C was 0.5, the flipping will have no effect.

C would be flipped in the situation where it points against this hypothesis but for another top hypothesis. This situation likely means that this event is a symptom of another hypothesis but not really relevant for this one.

The logic will be like this:

If TCone and C point in the same direction (i.e. both >0.5 or both <0.5)
then Csub = C;
else if there exists another top hypothesis with TCother pointing in the
 same direction as C
then Csub = (1-C);
else Csub = C;

And instead of hypotheses, we can work with the individual training cases. Instead of picking the top hypotheses, pick the top cases. And then do the subtraction/flipping logic case-by-case. Except perhaps exclude the other cases of the same hypothesis from consideration of "exists another" for flipping.

Let's work this logic through our examples.

The first example is the table

# tab17_01.txt
!,,evA,evB,evC
hyp1,1,1,0,0
hyp2,1,0,1,0
hyp3,1,0,0,1

and the all-one input: C(evA)=1, C(evB)=1, C(evC)=1. From the previous computation we know that all three hypotheses are the top probable hypotheses.

Let's check hyp1:

  • For evA, TC=1 and C=1, point the same way, so Csub=1.

  • For evB, TC=0 and C=1, point opposite, and there is TC(evB|hyp2)=1, so flip to Csub=0.

  • For evC, TC=0 and C=1, point opposite, and there is TC(evC|hyp3)=1, so flip to Csub=0.

We've got the subtracted values, let's run them through the processing:

# in17_01_03.txt
evA,1
evB,0
evC,0

$ perl ex16_01run.pl -c 0.01 tab17_01.txt in17_01_03.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,1.00000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,1.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=0.990000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.99000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.01000,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.01000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.98010,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00010,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.00990,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.97030,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00010,0.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.00010,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.99980
hyp2    0.00010
hyp3    0.00010

It says that hyp1 is quite true. Similarly, hyp2 and hyp3 would show themselves as true.

For the second example, let's look at the same table and the inputs of all-0: C(evA)=0, C(evB)=0, C(evC)=0. From the previous computation we know that all three hypotheses are again the top probable hypotheses.

Let's check hyp1:

  • For evA, TC=1 and C=0, point opposite, and there are two other hypotheses with TC=0, so flip to Csub=1.
  • For evB, TC=0 and C=0, point the same, so Csub=0.
  • For evC, TC=0 and C=0, point the same, so Csub=0.

It again produced (1, 0, 0), and hyp1 would also show as true! But that's not a good result, it's a false positive. This idea didn't work out well.

The problem is that we need to differentiate between the states of the event that say "there is nothing wrong" and "there is something wrong", and flip the event only if it was pointing in the direction of "there is something wrong". That's what my first asymmetric logic did, it always assumed that C=0 meant
"there is nothing wrong".

If we have the additional information about which state of the event is "normal" and which is "wrong", that would solve this problem. If we don't have this information, we can try to deduce it. A simple assumption could be that if a symptom is specific to some cases, then in most of the training cases it will be in the "normal" state, and only in a few of these specific cases it will be in the "wrong" state.

Of course, there will be exceptions, for example if a medical diagnostic system has an event with the question "is the patient feeling unwell?" then the answer for this question in most cases will be true, even though this is not the "normal" state. But it doesn't seem to cause problems: for most patients and most hypotheses TC and C on this event will be pointing the same way, and there will be no need for flipping anyway.

So, let's update the logic rules:

If TCone and C point in the same direction (i.e. both >0.5 or both <0.5)
then Csub = C;
else if there exists another top hypothesis with TCother pointing in the
 same direction as C and most cases in the original training data were
 pointing opposite C
then Csub = (1-C);
else Csub = C;

With this revised logic the revisited case of the all-0 inputs (C(evA)=0, C(evB)=0, C(evC)=0) for hyp1 will be:

  • For evA, TC=1 and C=0, point opposite, but most training cases (2 of 3) also point to C=0, so leave Csub=0.
  • For evB, TC=0 and C=0, point the same, so Csub=0.
  • For evC, TC=0 and C=0, point the same, so Csub=0.

With this unchanged input, hyp1 will still finish with the probability of 0.33, and it won't make the cut. Neither will make hyp2 nor hyp3 when processed in the same way.

Let's look at the example with the opposite table

# tab17_02.txt
!,,evA,evB,evC
hyp1,1,0,1,1
hyp2,1,1,0,1
hyp3,1,1,1,0

and again the all-0 input. In it the handling of hyp1 will be:

  • For evA, TC=0 and C=0, point the same, so Csub=0.
  • For evB, TC=1 and C=0, point opposite, most training cases (2 of 3) point to C=1, and there is hyp2 with TC(evB|hyp2) = 0, so flip to Csub=1.
  • For evC, TC=1 and C=0, point opposite, most training cases (2 of 3) point to C=1, and there is hyp3 with TC(evB|hyp3) = 0, so flip to Csub=1.

This result of (C(evA)=0, C(evB)=1, C(evC)=1) will match the training case for hyp1 exactly, and drive its probability all the way up, just as we wanted to.

This last logic has managed to handle all the examples fairly decently.

Monday, November 9, 2015

Bayes 16: code for hypothesis composition

This time I want to show the code that composes the separate cases into the merged hypotheses. I've shown before how to do it manually, now the code to do it.

The short summary of changes is:

  • The %hyphash went away since it turned out to be unnecessary.
  • Option "-compose N" added. N is the fraction in range [0..1] of what part of the weights needs to be composed into the per-hypothesis merged cases. 0 is the default and means that nothing will be composed. 1 means that all the data will be composed per hypothesis.
  • Option "-msplit" added. It enables the splitting of the multi-hypothesis cases into multiple single-hypothesis cases (just as shown before, the same data gets copied to every hypothesis separately).
  • The table loading code has been extended to allow the reading of the multi-hypothesis cases that are formatted on multiple lines. This allows to save the result of the composition and then load it for the diagnosis runs.
  • The new logic is in the function compose_hypotheses().

Initially I wanted to show only the changed code but then decided that since I'm not uploading the files anywhere yet, it would be easier for the people who want to play with the code to copy-and-paste it than to assembly it from bits and pieces. So here we go, the whole new code:

# ex16_01run.pl
#!/usr/bin/perl
#
# Running of a Bayes expert system on a table of training cases.
# Includes the code to compose the training cases by hypothesis.
# And can parse back the multi-hypothesis cases printed in multi-line.

use strict;
use Carp;

our @evname; # the event names, in the table order
our %evhash; # hash of event names to indexes
our @case; # the table of training cases
  # each case is represented as a hash with elements:
  # "hyp" - array of hypotheis names that were diagnosed in this case
  # "wt" - weight of the case
  # "origwt" - original weight of the case as loaded form the table
  # "tc" - array of training confidence of events TC(E|I)
  # "r" - array of relevance of events R(E|I)
our %phyp; # will be used to store the computed probabilities

# options
our $verbose = 0; # verbose printing during the computation
our $cap = 0; # cap on the confidence, factor adding fuzziness to compensate
  # for overfitting in the training data;
  # limits the confidence to the range [$cap..1-$cap]
our $boundary = 0.9; # boundary for accepting a hypothesis as a probable outcome
our $composite = 0.; # fraction [0..1] of the cases to compose by hypothesis
our $mhyp_split = 0; # split the multi-hypothesis cases when computing the composites

# print formatting values
our $pf_w = 7; # width of every field
our $pf_tfmt = "%-${pf_w}.${pf_w}s"; # format of one text field
our $pf_nw = $pf_w-2; # width after the dot in the numeric fields field
our $pf_nfmt = "%-${pf_w}.${pf_nw}f"; # format of one numeric field (does the better rounding)
our $pf_rw = 4; # width of the field R(E|I)
our $pf_rtfmt = "%-${pf_rw}.${pf_rw}s"; # format of text field of the same width as R(E|I)
our $pf_rnw = $pf_rw-2; # width after the dot for R(E|I)
our $pf_rnfmt = "%-${pf_rw}.${pf_rnw}f"; # format of the field for R(E|I)

sub load_table($) # (filename)
{
  my $filename = shift;

  @evname = ();
  %evhash = ();
  @case = ();

  my $nev = undef; # number of events minus 1

  confess "Failed to open '$filename': $!\n"
    unless open(INPUT, "<", $filename);

  my $prepend;
  while(<INPUT>) {
    chomp;
    s/,\s*$//; # remove the trailing comma if any
    if (/^\#/ || /^\s*$/) {
      # a comment line
    } elsif (/^\!/) {
      # row with event names
      @evname = split(/,/); # CSV format, the first 2 elements gets skipped
      shift @evname;
      shift @evname;
    } elsif (/\+\s*$/) {
      # a split-line of hypothesis names
      $prepend .= $_;
    } else {
      $_ = $prepend . $_;
      $prepend = undef;
      my @s = split(/,/); # CSV format for a training case
      # Each line contains:
      # - list of hypotheses, separated by "+"
      # - weight (in this position it's compatible with the format of probability tables)
      # - list of event data that might be either of:
      #   - one number - the event's training confidence TC(E|I), implying R(E|I)=1
      #   - a dash "-" - the event is irrelevant, meaning R(E|I)=0
      #   - two numbers separated by a "/": TC(E|I)/R(E|I)

      my $c = { };
      
      my @hyps = split(/\+/, shift @s);
      for (my $i = 0; $i <= $#hyps; $i++) {
        $hyps[$i] =~ s/^\s+//;
        $hyps[$i] =~ s/\s+$//;
      }

      $c->{hyp} = \@hyps;
      $c->{origwt} = $c->{wt} = shift(@s) + 0.;

      if (defined $nev) {
        if ($nev != $#s) {
          close(INPUT);
          my $msg = sprintf("Wrong number of events, expected %d, got %d in line: %s\n",
            $nev+1, $#s+1, $_);
          confess $msg;
        }
      } else {
        $nev = $#s;
      }

      # the rest of fields are the events in this case
      foreach my $e (@s) {
        if ($e =~ /^\s*-\s*$/) {
          push @{$c->{r}}, 0.;
          push @{$c->{tc}}, 0.;
        } else {
          my @edata = split(/\//, $e);
          push @{$c->{tc}}, ($edata[0] + 0.);
          if ($#edata <= 0) {
            push @{$c->{r}}, 1.;
          } else {
            push @{$c->{r}}, ($edata[1] + 0.);
          }
        }
      }

      push @case, $c;
    }
  }
  close(INPUT);
  if ($prepend) {
    confess "The input contained the hanging hypothese names: $prepend\n";
  }

  if ($#evname >= 0) {
    if ($#evname != $nev) {
      my $msg = sprintf("Wrong number of event names, %d events in the table, %d names\n",
        $nev+1, $#evname+1);
      confess $msg;
    }
  } else {
    for (my $i = 0; $i <= $nev; $i++) {
      push @evname, ($i+1)."";
    }
  }

  for (my $i = 0; $i <= $#evname; $i++) {
    $evname[$i] =~ s/^\s+//;
    $evname[$i] =~ s/\s+$//;
    $evhash{$evname[$i]} = $i;
  }
}

sub print_table()
{
  # the title row
  printf($pf_tfmt . ",", "!");
  printf($pf_tfmt . ",", "");
  foreach my $e (@evname) {
    printf($pf_tfmt . " " . $pf_rtfmt . ",", $e, "");
  }
  print("\n");
  # the cases
  for (my $i = 0; $i <= $#case; $i++) {
    my $c = $case[$i];
    # if more than one hypothesis, print each of them on a separate line
    for (my $j = 0; $j < $#{$c->{hyp}}; $j++) {
      printf($pf_tfmt . "+\n", $c->{hyp}[$j]);
    }

    printf($pf_tfmt . ",", $c->{hyp}[ $#{$c->{hyp}} ]);
    printf($pf_nfmt . ",", $c->{wt});
    for (my $j = 0; $j <= $#evname; $j++) {
      printf($pf_nfmt . "/" . $pf_rnfmt . ",", $c->{tc}[$j], $c->{r}[$j]);
    }
    print("\n");
  }
}

# Compute the hypothesis probabilities from weights
sub compute_phyp()
{
  %phyp = ();
  my $h;

  # start by getting the weights
  my $sum = 0.;
  for (my $i = 0; $i <= $#case; $i++) {
    my $w = $case[$i]->{wt};
    $sum += $w;

    foreach $h (@{$case[$i]->{hyp}}) {
      $phyp{$h} += $w;
    }
  }

  if ($sum != 0.) { # if 0 then all the weights are 0, leave them alone
    for $h (keys %phyp) {
      $phyp{$h} /= $sum;
    }
  }
}


# Print the probabilities of the kypotheses
sub print_phyp()
{
  printf("--- Probabilities\n");
  for my $h (sort keys %phyp) {
    printf($pf_tfmt . " " . $pf_nfmt . "\n", $h, $phyp{$h});
  }
}

# Apply one event
# evi - event index in the array
# conf - event confidence [0..1]
sub apply_event($$) # (evi, conf)
{
  my ($evi, $conf) = @_;

  # update the weights
  for (my $i = 0; $i <= $#case; $i++) {
    my $w = $case[$i]->{wt};
    my $r = $case[$i]->{r}[$evi];
    my $tc = $case[$i]->{tc}[$evi];

    $case[$i]->{wt} = $w * (1. - $r)
      + $w*$r*( $tc*$conf + (1. - $tc)*(1. - $conf) );
  }
}


# Apply an input file
sub apply_input($) # (filename)
{
  my $filename = shift;

  confess "Failed to open the input '$filename': $!\n"
    unless open(INPUT, "<", $filename);
  while(<INPUT>) {
    chomp;
    next if (/^\#/ || /^\s*$/); # a comment

    my @s = split(/,/);
    $s[0] =~ s/^\s+//;
    $s[0] =~ s/\s+$//;

    confess ("Unknown event name '" . $s[0] . "' in the input\n")
      unless exists $evhash{$s[0]};
    my $evi = $evhash{$s[0]};

    my $conf = $s[1] + 0.;
    if ($conf < $cap) {
      $conf = $cap;
    } elsif ($conf > 1.-$cap) {
      $conf = 1. - $cap;
    }
    printf("--- Applying event %s, c=%f\n", $s[0], $conf);
    &apply_event($evi, $conf);
    &print_table;
  }
  close(INPUT);
}

# compose the training cases by hypothesis
sub compose_hypotheses()
{
  if ($mhyp_split) {
    # the number of cases will grow, remember the index of last original case
    my $lastcase = $#case + 1;
    for (my $i = 0; $i <= $lastcase; $i++) {
      my $c = $case[$i];
      while ($#{$c->{hyp}} > 0) {
        # split a copy of the case for each resulting hypothesis in it
        my $hname = pop @{$c->{hyp}};
        push @case, {
          hyp => [$hname],
          wt => $c->{wt},
          origwt => $c->{origwt},
          tc => $c->{tc},
          r => $c->{r},
        };
      }
    }
  }

  if ($composite <= 0.) {
    return; # nothing to do
  }

  # newly-generated composite hypotheses
  my %hyps; # keyed by the hypothesis names
  for (my $i = 0; $i <= $#case; $i++) {
    my $c = $case[$i];
    my $key = join("+", sort @{$c->{hyp}});
    if (!exists $hyps{$key}) {
      $hyps{$key} = {
        hyp => $c->{hyp},
        wt => 0.,
        wtrel => [], # weight of relevant part, by event
        tc => [], # initially will contain the sum
        r => [], # will be filled later
      };
    }
    my $hyp = $hyps{$key};
    my $wt = $c->{wt};

    $hyp->{wt} += $wt;

    for (my $e = 0; $e <= $#evname; $e++) {
      my $r = $c->{r}[$e];
      $hyp->{wtrel}[$e] += $wt * $r;
      $hyp->{tc}[$e] += $wt * $r * $c->{tc}[$e];
    }
  }
  if ($composite >= 1.) {
    # throw away the raw cases, since the hypotheses will replace them
    @case = ();
  } else {
    # 2nd pass: adjust the weight of the raw cases,
    # unless it's the only case of the hypothesis (then throw
    # away that hypothesis)
    for (my $i = 0; $i <= $#case; $i++) {
      my $c = $case[$i];
      my $key = join("+", sort @{$c->{hyp}});
      if ($hyps{$key}->{wt} == $c->{wt}) {
        delete $hyps{$key}; # hypothesis is a copy of the case
      } else {
        $c->{wt} *= (1. - $composite);
      }
    }
  }

  foreach my $h (sort keys %hyps) {
    my $hyp = $hyps{$h};
    my $wt = $hyp->{wt};

    for (my $e = 0; $e <= $#evname; $e++) {
      $hyp->{r}[$e] = $hyp->{wtrel}[$e] / $wt;
      if ($hyp->{wtrel}[$e] == 0.) {
        $hyp->{tc}[$e] = 0.;
      } else {
        $hyp->{tc}[$e] /= $hyp->{wtrel}[$e];
      }
    }

    # scale down the weight
    $hyp->{wt} *= $composite;
    $hyp->{origwt} = $hyp->{wt};
    delete $hyp->{wtrel};
    push @case, $hyp;
  }
}

# main()
while ($ARGV[0] =~ /^-(.*)/) {
  if ($1 eq "v") {
    $verbose = 1;
  } elsif ($1 eq "c") {
    shift @ARGV;
    $cap = $ARGV[0]+0.;
  } elsif ($1 eq "b") {
    shift @ARGV;
    $boundary = $ARGV[0]+0.;
  } elsif ($1 eq "compose") {
    shift @ARGV;
    $composite = $ARGV[0]+0.;
  } elsif ($1 eq "msplit") {
    $mhyp_split = 1;
  } else {
    confess "Unknown switch -$1";
  }
  shift @ARGV;
}
&load_table($ARGV[0]);
&print_table;
if ($composite > 0. || $mhyp_split) {
  &compose_hypotheses;
  printf "--- Composite ${pf_nfmt}:\n", $composite;
  &print_table;
}
&compute_phyp;
&print_phyp;
if ($#ARGV >= 1) {
  &apply_input($ARGV[1]);
  &compute_phyp;
  &print_phyp;
}

The basic example, loading of a multi-line case and splitting of the multi-hypothesis cases:

#tab16_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,

$ perl ex16_01run.pl -msplit tab16_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Composite 0.00000:
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

The same but also fully composed by hypotheses:

$ perl ex16_01run.pl -msplit -compose 1 tab16_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Composite 1.00000:
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,2.00000,1.00000/1.00,0.50000/1.00,0.50000/1.00,
hyp2   ,2.00000,0.50000/1.00,1.00000/1.00,0.50000/1.00,
hyp3   ,2.00000,0.50000/1.00,0.50000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

The same with only 0.5 of the weights composed:

$ perl ex16_01run.pl -msplit -compose 0.5 tab16_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Composite 0.50000:
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.50000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   ,0.50000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,0.50000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp2   ,0.50000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp3   ,0.50000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp3   ,0.50000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
hyp1   ,1.00000,1.00000/1.00,0.50000/1.00,0.50000/1.00,
hyp2   ,1.00000,0.50000/1.00,1.00000/1.00,0.50000/1.00,
hyp3   ,1.00000,0.50000/1.00,0.50000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

It's a combination of the two examples above. The original split events are left half their weight, and the other half of the weight has been composed.

What if we try the composition without splitting?

$ perl ex16_01run.pl -compose 0.5 tab16_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Composite 0.50000:
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.66667
hyp3    0.66667

The result is exactly the same as the original. This is because the original had only one case for each hypothesis combination, so there is nothing to compose and there is no point in splitting them into two.

Now a demo of the relevance computation:

# tab16_02.txt
!,,evA,evB,evC
hyp1,1,1,0,-
hyp1,1,0,1,-
hyp1,6,-,-,1

$ perl ex16_01run.pl -compose 1 tab16_02.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/1.00,0.00000/0.00,
hyp1   ,1.00000,0.00000/1.00,1.00000/1.00,0.00000/0.00,
hyp1   ,6.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Composite 1.00000:
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,8.00000,0.50000/0.25,0.50000/0.25,1.00000/0.75,
--- Probabilities
hyp1    1.00000

According to the logic from the previous installment, the relevance of evA and evB gets set to 0.25 because 2/8=1/4 of the cases for them have them relevant, and for evC the relevance is set to 0.75 because 6/8=3/4 cases for it are relevant.

Wednesday, November 4, 2015

Bayes 15: relevance combination

I've set out to write the code that would do the combination of the cases into the likeness of the probability tables and I've realized that since I've introduced the concept of relevance, now I'd need to deal with it too. What would it mean to combine two cases that have the relevance values?

First, what does it mean to combine two cases without the relevance values? We've been though this computation many times when building the probability tables but I don't think I've given the formula yet.

The weights obviously add up (it were that many cases uncombined, and now they become the same number of cases in a single combination):

W(I) = W(I1) + W(I2)

The training confidences get added up according to their weights:

TC(E|I) = ( W(I1)*TC(E|I1) + W(I2)*TC(E|I2) ) / ( W(I1) + W(I2) )

To check that it's right let's compute the application of one event and see that the combined weights work out the same either way.

Separately they get computed as follows (without including the relevance yet):

W(I1|E) = W(I1) * ( TC(E|I1)*C(E) + (1 - TC(E|I1))*(1 - C(E)) )
    = W(I1) * ( TC(E|I1)*C(E) + 1 - TC(E|I1) - C(E) + TC(E|I1)*C(E) )
    = W(I1) * ( 1 - TC(E|I1) + (2*TC(E|I1) - 1)*C(E) )
    = W(I1) - W(I1)*TC(E|I1) + (W(I1)*2*TC(E|I1) - W(I1))*C(E) )

W(I2|E) follows the same kind of formula. After combination they compute as:

W(I|E) = W(I) * ( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
    = ( W(I1) + W(I2) )*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )

Let's express TC(E|I) as a ratio:

TC(E|I) = TCup(E|I) / TClow(E|I)
TCup(E|I) = W(I1)*TC(E|I1) + W(I2)*TC(E|I2)
TClow(E|I) = W(I1) + W(I2)
    = W(I)

Then substituting it into the formula for weights we get

W(I|E) = W(I) * ( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
    = W(I) * ( TCup(E|I)*C(E)/TClow(E|I) + (1 - TCup(E|I)/TClow(E|I))*(1 - C(E)) )
    = W(I) * ( TCup(E|I)*C(E)/TClow(E|I) + (TClow(E|I) - TCup(E|I))/TClow(E|I) *(1 - C(E)) )
    = W(I) * ( TClow(E|I) - TCup(E|I) + (TCup(E|I) - TClow(E|I) + TCup(E|I))*C(E) ) / TClow(E|I)
    = W(I) * ( W(I) - TCup(E|I) + (2*TCup(E|I) - W(I))*C(E) ) / W(I)
    = W(I) - TCup(E|I) + (2*TCup(E|I) - W(I))*C(E)
    = W(I1) + W(I2) - W(I1)*TC(E|I1) + W(I2)*TC(E|I2) + 
        (2*W(I1)*TC(E|I1) + 2*W(I2)*TC(E|I2) - W(I1) - W(I2))*C(E)
    = ( W(I1) - W(I1)*TC(E|I1) + (2*W(I1)*TC(E|I1) - W(I1))*C(E) ) +
        ( W(I2) - W(I2)*TC(E|I2) + (2*W(I2)*TC(E|I2) - W(I2))*C(E) )
    = W(I1|E) + W(I2|E)

It all computes. The result is the same either way. The difference is of course that as we apply multiple events, the effect of the previous events on the following ones will manifest differently. But that is to be expected.

That taken care of, let's look at the relevance. If we have a case with a fractional R(E|I), we can simulate it by splitting the case into two cases, the fully relevant part and the fully irrelevant part. The case

W(I) * ...,TC/R,...

gets split into two cases, relevant one with the weight W(Ir) and irrelevant one with the weight W(Ii):

W(Ii)=W(I)*(1-R) * ...,0/0,...
W(Ir)=W(I)*R * ...,TC,...

To show that it's equivalent, the original formula for the posterior weight is:

W(I|E) = W(I)*(1 - R(E|I)) + W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )

If we add up the weights of the two split cases, we get:

W(Ii|E) = W(I)*(1-R(E|I))
W(Ir|E) = W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
W(Ii|E) + W(Ir|E) = W(I)*(1 - R(E|I)) + W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )

Both give the exact same result, the split is correct.

This split can also be undone. The formulas for the weights can be expanded as:

W(Ii) = W(I)*(1 - R)  = W(I) - W(I)*R
W(Ir) = W(I)*R

And from there it follows that:

W(Ii) + W(Ir) = W(I) - W(I)*R + W(I)*R = W(I)

Not surprisingly, the original weight can be found as the sum of two split weights. And from there R can be found as:

R = W(Ir) / W(I)

If we want to combine multiple cases with fractional relevances, we can split each of them into the relevant and irrelevant parts, combine the relevant parts together, combine the irrelevant parts together (this is easy: just add up their weights), and then undo the split.

Incidentally, the whole relevance thing can also be expressed purely by the manipulation with the case weights and training confidences.

The point of the relevance concept is to leave the weight of a case unchanged if the event is irrelevant to it. The closest effect that can be achieved with the training confidences is the TC=0.5, in it the weight gets halved, no matter what is C(E):

W(I|E) = W(I)*( TC(E)*C(E) + (1 - TC(E))*(1 - C(E)) )
    = W(I)*( 0.5*C(E) + (1 - 0.5)*(1 - C(E)) )
    = W(I)*( 0.5*(C(E) + 1 - C(E)) )
    = W(I)*0.5

Which means that we can simulate an irrelevant event purely through the training confidence, by setting TC(E|I)=0.5 and doubling the weight of the event. For example, if we had a case with weight 15 and an irrelevant event

15 * ...,0/R=0,...

We can replace it with the case of weight 30 and TC=0.5.

30 * ...,0.5,...

The same can be also seen in another way: create the second case, exactly the same as the first one and with the same weight, except that put TC(E|I)=0 in one case and TC(E|I)=1 in another case. No matter what C(E), one case will get through and another one will be thrown away (or the matching fractions of both cases will be thrown away, still leaving one original weight). For example, taking the same case of weight 15 and an irrelevant event, we can convert it to two cases:

15 * ...,0,...
15 * ...,1,...

Either way, doubling the weight of one case, or splitting into two cases, the total weight gets doubled, and the average TC(E|I) gets set equal to 0.5.

Monday, November 2, 2015

Bayes 14: code for weight-based computation

As promised, here is the code that performs the computation directly from the table of the training cases, using the weights:

# ex14_01run.pl
#!/usr/bin/perl
#
# Running of a Bayes expert system on a table of training cases.

use strict;
use Carp;

our @evname; # the event names, in the table order
our %evhash; # hash of event names to indexes
our %hyphash; # the hypothesis names translation to the arrays of
  # refences to all cases involving this hypothesis
our @case; # the table of training cases
  # each case is represented as a hash with elements:
  # "hyp" - array of hypotheis names that were diagnosed in this case
  # "wt" - weight of the case
  # "origwt" - original weight of the case as loaded form the table
  # "tc" - array of training confidence of events TC(E|I)
  # "r" - array of relevance of events R(E|I)
our %phyp; # will be used to store the computed probabilities

# options
our $verbose = 0; # verbose printing during the computation
our $cap = 0; # cap on the confidence, factor adding fuzziness to compensate
  # for overfitting in the training data;
  # limits the confidence to the range [$cap..1-$cap]
our $boundary = 0.9; # boundary for accepting a hypothesis as a probable outcome

# print formatting values
our $pf_w = 7; # width of every field
our $pf_tfmt = "%-${pf_w}.${pf_w}s"; # format of one text field
our $pf_nw = $pf_w-2; # width after the dot in the numeric fields field
our $pf_nfmt = "%-${pf_w}.${pf_nw}f"; # format of one numeric field (does the better rounding)
our $pf_rw = 4; # width of the field R(E|I)
our $pf_rtfmt = "%-${pf_rw}.${pf_rw}s"; # format of text field of the same width as R(E|I)
our $pf_rnw = $pf_rw-2; # width after the dot for R(E|I)
our $pf_rnfmt = "%-${pf_rw}.${pf_rnw}f"; # format of the field for R(E|I)

sub load_table($) # (filename)
{
  my $filename = shift;

  @evname = ();
  %evhash = ();
  %hyphash = ();
  @case = ();

  my $nev = undef; # number of events minus 1

  confess "Failed to open '$filename': $!\n"
    unless open(INPUT, "<", $filename);
  while(<INPUT>) {
    chomp;
    s/,\s*$//; # remove the trailing comma if any
    if (/^\#/ || /^\s*$/) {
      # a comment line
    } elsif (/^\!/) {
      # row with event names
      @evname = split(/,/); # CSV format, the first 2 elements gets skipped
      shift @evname;
      shift @evname;
    } else {
      my @s = split(/,/); # CSV format for a training case
      # Each line contains:
      # - list of hypotheses, separated by "+"
      # - weight (in this position it's compatible with the format of probability tables)
      # - list of event data that might be either of:
      #   - one number - the event's training confidence TC(E|I), implying R(E|I)=1
      #   - a dash "-" - the event is irrelevant, meaning R(E|I)=0
      #   - two numbers separated by a "/": TC(E|I)/R(E|I)

      my $c = { };
      
      my @hyps = split(/\+/, shift @s);
      my %hypuniq;
      for (my $i = 0; $i <= $#hyps; $i++) {
        $hyps[$i] =~ s/^\s+//;
        $hyps[$i] =~ s/\s+$//;
        $hypuniq{$hyps[$i]} = 1;
      }
      foreach my $h (keys %hypuniq) {
        push @{$hyphash{$h}}, $c;
      }

      $c->{hyp} = \@hyps;
      $c->{origwt} = $c->{wt} = shift(@s) + 0.;

      if (defined $nev) {
        if ($nev != $#s) {
          close(INPUT);
          my $msg = sprintf("Wrong number of events, expected %d, got %d in line: %s\n",
            $nev+1, $#s+1, $_);
          confess $msg;
        }
      } else {
        $nev = $#s;
      }

      # the rest of fields are the events in this case
      foreach my $e (@s) {
        if ($e =~ /^\s*-\s*$/) {
          push @{$c->{r}}, 0.;
          push @{$c->{tc}}, 0.;
        } else {
          my @edata = split(/\//, $e);
          push @{$c->{tc}}, ($edata[0] + 0.);
          if ($#edata <= 0) {
            push @{$c->{r}}, 1.;
          } else {
            push @{$c->{r}}, ($edata[1] + 0.);
          }
        }
      }

      push @case, $c;
    }
  }
  close(INPUT);

  if ($#evname >= 0) {
    if ($#evname != $nev) {
      my $msg = sprintf("Wrong number of event names, %d events in the table, %d names\n",
        $nev+1, $#evname+1);
      confess $msg;
    }
  } else {
    for (my $i = 0; $i <= $nev; $i++) {
      push @evname, ($i+1)."";
    }
  }

  for (my $i = 0; $i <= $#evname; $i++) {
    $evname[$i] =~ s/^\s+//;
    $evname[$i] =~ s/\s+$//;
    $evhash{$evname[$i]} = $i;
  }
}

sub print_table()
{
  # the title row
  printf($pf_tfmt . ",", "!");
  printf($pf_tfmt . ",", "");
  foreach my $e (@evname) {
    printf($pf_tfmt . " " . $pf_rtfmt . ",", $e, "");
  }
  print("\n");
  # the cases
  for (my $i = 0; $i <= $#case; $i++) {
    my $c = $case[$i];
    # if more than one hypothesis, print each of them on a separate line
    for (my $j = 0; $j < $#{$c->{hyp}}; $j++) {
      printf($pf_tfmt . "+\n", $c->{hyp}[$j]);
    }

    printf($pf_tfmt . ",", $c->{hyp}[ $#{$c->{hyp}} ]);
    printf($pf_nfmt . ",", $c->{wt});
    for (my $j = 0; $j <= $#evname; $j++) {
      printf($pf_nfmt . "/" . $pf_rnfmt . ",", $c->{tc}[$j], $c->{r}[$j]);
    }
    print("\n");
  }
}

# Compute the hypothesis probabilities from weights
sub compute_phyp()
{
  %phyp = ();
  my $h;

  # start by getting the weights
  my $sum = 0.;
  for (my $i = 0; $i <= $#case; $i++) {
    my $w = $case[$i]->{wt};
    $sum += $w;

    foreach $h (@{$case[$i]->{hyp}}) {
      $phyp{$h} += $w;
    }
  }

  if ($sum != 0.) { # if 0 then all the weights are 0, leave them alone
    for $h (keys %phyp) {
      $phyp{$h} /= $sum;
    }
  }
}


# Print the probabilities of the kypotheses
sub print_phyp()
{
  printf("--- Probabilities\n");
  for my $h (sort keys %phyp) {
    printf($pf_tfmt . " " . $pf_nfmt . "\n", $h, $phyp{$h});
  }
}

# Apply one event
# evi - event index in the array
# conf - event confidence [0..1]
sub apply_event($$) # (evi, conf)
{
  my ($evi, $conf) = @_;

  # update the weights
  for (my $i = 0; $i <= $#case; $i++) {
    my $w = $case[$i]->{wt};
    my $r = $case[$i]->{r}[$evi];
    my $tc = $case[$i]->{tc}[$evi];

    $case[$i]->{wt} = $w * (1. - $r)
      + $w*$r*( $tc*$conf + (1. - $tc)*(1. - $conf) );
  }
}


# Apply an input file
sub apply_input($) # (filename)
{
  my $filename = shift;

  confess "Failed to open the input '$filename': $!\n"
    unless open(INPUT, "<", $filename);
  while(<INPUT>) {
    chomp;
    next if (/^\#/ || /^\s*$/); # a comment

    my @s = split(/,/);
    $s[0] =~ s/^\s+//;
    $s[0] =~ s/\s+$//;

    confess ("Unknown event name '" . $s[0] . "' in the input\n")
      unless exists $evhash{$s[0]};
    my $evi = $evhash{$s[0]};

    my $conf = $s[1] + 0.;
    if ($conf < $cap) {
      $conf = $cap;
    } elsif ($conf > 1.-$cap) {
      $conf = 1. - $cap;
    }
    printf("--- Applying event %s, c=%f\n", $s[0], $conf);
    &apply_event($evi, $conf);
    &print_table;
  }
  close(INPUT);
}

# main()
while ($ARGV[0] =~ /^-(.*)/) {
  if ($1 eq "v") {
    $verbose = 1;
  } elsif ($1 eq "c") {
    shift @ARGV;
    $cap = $ARGV[0]+0.;
  } elsif ($1 eq "b") {
    shift @ARGV;
    $boundary = $ARGV[0]+0.;
  } else {
    confess "Unknown switch -$1";
  }
  shift @ARGV;
}
&load_table($ARGV[0]);
&print_table;
&compute_phyp;
&print_phyp;
if ($#ARGV >= 1) {
  &apply_input($ARGV[1]);
  &compute_phyp;
  &print_phyp;
}

As you can see, the function apply_event() became much simpler, and there is no chance of division by 0 any more. The function apply_input() stayed exactly the same, just the functions called by it have changed.

The inputs are backwards-compatible with the previously shown examples, although what was formerly the field for hypothesis probabilities now becomes the field for case weights. The new code just works with weights and converts them to hypothesis probabilities at the end for display.

There are a couple of new features supported in the inputs. First, the field for hypothesis names is now allowed to contain multiple hypotheses, separated by the plus sign. That allows to enter the cases with multi-hypothesis results.

Second, what used to be the conditional probabilities of the events now can contain both the training confidence values and the relevance values of the events. These fields may have one of three formats:

  • A single number: the confidence value TC(E|I), similar to the previous P(E|H). The relevance R(E|I) is assumed to be 1.
  • The character "-": means that this event is not relevant for this case. Which means that the relevance value R(E|I) is 0, and TC(E|I) doesn't matter.
  • Two numbers separated by a "/": the first one is TC(E|I) and the second one is R(E|I).

Let's look at some examples.

The very first example I shown was this:

# tab06_01_01.txt
!,,evA,evB,evC
hyp1,0.66667,1,0.66667,0.66667
hyp2,0.33333,0,0,1

The very first input I've shown was this:

# in06_01_01_01.txt
evA,1
evB,0
evC,0

Let's compare the old and the new results. Old:

$ perl ex06_01run.pl tab06_01_01.txt in06_01_01_01.txt
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,0.66667,1.00000,0.66667,0.66667,
hyp2   ,0.33333,0.00000,0.00000,1.00000,
--- Applying event evA, c=1.000000
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,1.00000,1.00000,0.66667,0.66667,
hyp2   ,0.00000,0.00000,0.00000,1.00000,Since this case has no match 
--- Applying event evB, c=0.000000
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,1.00000,1.00000,0.66667,0.66667,
hyp2   ,0.00000,0.00000,0.00000,1.00000,
--- Applying event evC, c=0.000000
!      ,       ,evA    ,evB    ,evC    ,
hyp1   ,1.00000,1.00000,0.66667,0.66667,
hyp2   ,0.00000,0.00000,0.00000,1.00000,

New:

$ perl ex14_01run.pl tab06_01_01.txt in06_01_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.66667,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.33333,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evA, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.66667,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.000000Since this case has no match 
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.22222,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.07407,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    1.00000
hyp2    0.00000

The result is the same, although the intermediate data is printed as weights, not probabilities, and the printed table contains the relevance information (all the relevance values are at 1 here).

This table of probabilities was produced from 6 cases for hyp1 and 3 cases for hyp2:

         evA evB evC
4 * hyp1 1   1   1
2 * hyp1 1   0   0
3 * hyp2 0   0   1

Before entering the raw cases, let's look at the same combined probability table with weights entered directly instead of probabilities:

# tab14_01a.txt
!,,evA,evB,evC
hyp1,6,1,0.66667,0.66667
hyp2,3,0,0,1

$ perl ex14_01run.pl tab14_01a.txt in06_01_01_01.txt 
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,6.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,3.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evA, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,6.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.99998,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.66665,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    1.00000
hyp2    0.00000

The end result is the same, and the intermediate weights are simply scaled up proportionally. Note that even though the case matches one of the training cases one-to-one, the weight of hyp1 ends up at only 0.66665. That's because the table entry is produced by folding two kinds of cases, and this case matched only one-third of the cases. Since originally this particular case had the weight of 2, the result is 2*1/3 = 0.67 (approximately).

If we enter the raw training cases into the table, the result changes:

# tab14_01b.txt
!,,evA,evB,evC
hyp1,4,1,1,1
hyp1,2,1,0,0
hyp2,3,0,0,1

$ perl ex14_01run.pl tab14_01b.txt in06_01_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,4.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,2.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,3.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evA, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,4.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,2.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,2.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,2.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    1.00000
hyp2    0.00000

Now the resulting probability is the same but the weight 2 matches what was in the training table.

Let's see how it handles an impossible input:

# in08_01_01.txt
evC,0
evA,0
evB,0

$ perl ex14_01run.pl tab14_01a.txt in08_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,6.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,3.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evC, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.99998,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evA, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.00000
hyp2    0.00000

Since this case has no match in the training table, the weights come out as 0. The computation of the probabilities would have required to divide the weights by the sum of weights, but since the sum is 0, the code just leaves the probabilities at 0 to avoid the division by 0.

It's interesting to compare the results with capping for two different kinds of the tables:

$ perl ex14_01run.pl -c 0.01 tab14_01a.txt in08_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,6.00000,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,3.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evC, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,2.01998,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.03000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.02020,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.02970,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00680,1.00000/1.00,0.66667/1.00,0.66667/1.00,
hyp2   ,0.02940,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.18784
hyp2    0.81216

$ perl ex14_01run.pl -c 0.01 tab14_01b.txt in08_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,4.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,2.00000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,3.00000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.33333
--- Applying event evC, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.04000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,1.98000,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.03000,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00040,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,0.01980,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.02970,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   ,0.01960,1.00000/1.00,0.00000/1.00,0.00000/1.00,
hyp2   ,0.02940,0.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.40005
hyp2    0.59995

They came out different. Why? In the second run, the first training case for hyp1 mismatches the values of all 3 input events. Its weight gets multiplied by 0.01 thrice and becomes very close to 0. The second training case for hyp1 and the training case for hyp2 mismatch only one input event, so their weights get multiplied by 0.01 only once, and their relative weights decide the resulting probabilities. They were 2:3 to start with and that stayed as 2:3 (the little deviation is contributed by the first training case for hyp1).

On the other hand, in the first run all the cases for hyp1 were lumped into one line, and the average content of that line was seriously tilted towards the case that mismatched all 3 events. Thus hyp1 ended up with a much lower weight but hyp2 ended up with exactly the same weight as in the second run, so it outweighed the hyp1 much more seriously.

To look at the effects of the relevance values and of the training cases with multi-hypothesis results let's revisit the example from the 9th and 10th parts. First, a training table with multiple hypotheses, as in the part 9:

# tab14_02a.txt
!,,evA,evB,evC
hyp1+hyp2,1,1,1,0
hyp2+hyp3,1,0,1,1
hyp1+hyp3,1,1,0,1

With the input data:

# in09_01_01.txt
evA,0
evB,0
evC,1

$ perl ex14_01run.pl -c 0.01 tab14_02a.txt in09_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,1.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,1.00000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,1.00000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.66667
hyp2    0.66667
hyp3    0.66667
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,0.01000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,0.99000,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,0.01000,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,0.00010,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,0.00990,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,0.00990,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Applying event evC, c=0.990000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   +
hyp2   ,0.00000,1.00000/1.00,1.00000/1.00,0.00000/1.00,
hyp2   +
hyp3   ,0.00980,0.00000/1.00,1.00000/1.00,1.00000/1.00,
hyp1   +
hyp3   ,0.00980,1.00000/1.00,0.00000/1.00,1.00000/1.00,
--- Probabilities
hyp1    0.50003
hyp2    0.50003
hyp3    0.99995

The capping was needed since the data doesn't match any of the training cases, but in the end it points pretty conclusively to the hypothesis hyp3. The multi-hypothesis cases are printed out in the intermediate results with one hypothesis per line (this format cannot be read back as input).

Now a variation of the example from the part 10, where I've manually decided, which events should be relevant to which hypotheses, using the same in input as above. Only this table is for all 3 hypotheses at once, not one hypothesis at a time:

# tab14_02b.txt
!,,evA,evB,evC
hyp1,1,1,-,-
hyp2,1,-,1,-
hyp1,1,-,-,1

$ perl ex14_01run.pl tab14_02b.txt in09_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evB, c=0.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,0.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evC, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,0.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.00000
hyp2    0.00000
hyp3    1.00000

You can see in the printout that some of the relevances have been set to 1 and some to 0. This time it picked hyp3 with full certainty, without even any need for capping.

Now the same table but the input with all events true:

# in10_01_02.txt
evA,1
evB,1
evC,1

$ perl ex14_01run.pl tab14_02b.txt in10_01_02.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evA, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evB, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evC, c=1.000000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

As far as the weights are concerned, all the hypotheses got the full match. But the probabilities have naturally gotten split three-way. Let's contrast it with the result of another input:

# in08_01_01.txt
evC,0
evA,0
evB,0

$ perl ex14_01run.pl -c 0.01 tab14_02b.txt in08_01_01.txt
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,1.00000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333
--- Applying event evC, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,1.00000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,0.01000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evA, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.01000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,1.00000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,0.01000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Applying event evB, c=0.010000
!      ,       ,evA         ,evB         ,evC         ,
hyp1   ,0.01000,1.00000/1.00,0.00000/0.00,0.00000/0.00,
hyp2   ,0.01000,0.00000/0.00,1.00000/1.00,0.00000/0.00,
hyp3   ,0.01000,0.00000/0.00,0.00000/0.00,1.00000/1.00,
--- Probabilities
hyp1    0.33333
hyp2    0.33333
hyp3    0.33333

In this run the probabilities also split three-way. But how do we know that for the first input (1,1,1) we should pick all 3 hypotheses and for the second input (0,0,0) we should pick none? Both are split evenly, so we can't pick based on the rule of even splitting discussed in the part 9, both would fit it. One indication is that the second input required capping, or it would have produced the probabilities of 0. For another indication we can compare the final weights of the cases with their initial weights. In the first run they stayed the same. In the second run they got multiplied by 0.01 by the capping. Thus we can say that neither really matched the input. Since only one multiplication by 0.01 was done, each case had only one mismatch. Is one mismatch that bad? Since each hypothesis has only one relevant event, we can say that yes, the mismatch in this one event is real bad, and all these hypotheses should be considered false.

With the larger sets of training data, it would be possible to make decisions based on how many relevant events are available for each case, and what fraction of them is allowed to be mismatched before we consider the case inapplicable. And if a hypothesis has no applicable cases, it would be considered false.