First, what does it mean to combine two cases without the relevance values? We've been though this computation many times when building the probability tables but I don't think I've given the formula yet.
The weights obviously add up (it were that many cases uncombined, and now they become the same number of cases in a single combination):
W(I) = W(I1) + W(I2)
The training confidences get added up according to their weights:
TC(E|I) = ( W(I1)*TC(E|I1) + W(I2)*TC(E|I2) ) / ( W(I1) + W(I2) )
To check that it's right let's compute the application of one event and see that the combined weights work out the same either way.
Separately they get computed as follows (without including the relevance yet):
W(I1|E) = W(I1) * ( TC(E|I1)*C(E) + (1 - TC(E|I1))*(1 - C(E)) ) = W(I1) * ( TC(E|I1)*C(E) + 1 - TC(E|I1) - C(E) + TC(E|I1)*C(E) ) = W(I1) * ( 1 - TC(E|I1) + (2*TC(E|I1) - 1)*C(E) ) = W(I1) - W(I1)*TC(E|I1) + (W(I1)*2*TC(E|I1) - W(I1))*C(E) )
W(I2|E) follows the same kind of formula. After combination they compute as:
W(I|E) = W(I) * ( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) ) = ( W(I1) + W(I2) )*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
Let's express TC(E|I) as a ratio:
TC(E|I) = TCup(E|I) / TClow(E|I) TCup(E|I) = W(I1)*TC(E|I1) + W(I2)*TC(E|I2) TClow(E|I) = W(I1) + W(I2) = W(I)
Then substituting it into the formula for weights we get
W(I|E) = W(I) * ( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) ) = W(I) * ( TCup(E|I)*C(E)/TClow(E|I) + (1 - TCup(E|I)/TClow(E|I))*(1 - C(E)) ) = W(I) * ( TCup(E|I)*C(E)/TClow(E|I) + (TClow(E|I) - TCup(E|I))/TClow(E|I) *(1 - C(E)) ) = W(I) * ( TClow(E|I) - TCup(E|I) + (TCup(E|I) - TClow(E|I) + TCup(E|I))*C(E) ) / TClow(E|I) = W(I) * ( W(I) - TCup(E|I) + (2*TCup(E|I) - W(I))*C(E) ) / W(I) = W(I) - TCup(E|I) + (2*TCup(E|I) - W(I))*C(E) = W(I1) + W(I2) - W(I1)*TC(E|I1) + W(I2)*TC(E|I2) + (2*W(I1)*TC(E|I1) + 2*W(I2)*TC(E|I2) - W(I1) - W(I2))*C(E) = ( W(I1) - W(I1)*TC(E|I1) + (2*W(I1)*TC(E|I1) - W(I1))*C(E) ) + ( W(I2) - W(I2)*TC(E|I2) + (2*W(I2)*TC(E|I2) - W(I2))*C(E) ) = W(I1|E) + W(I2|E)
It all computes. The result is the same either way. The difference is of course that as we apply multiple events, the effect of the previous events on the following ones will manifest differently. But that is to be expected.
That taken care of, let's look at the relevance. If we have a case with a fractional R(E|I), we can simulate it by splitting the case into two cases, the fully relevant part and the fully irrelevant part. The case
W(I) * ...,TC/R,...
gets split into two cases, relevant one with the weight W(Ir) and irrelevant one with the weight W(Ii):
W(Ii)=W(I)*(1-R) * ...,0/0,... W(Ir)=W(I)*R * ...,TC,...
To show that it's equivalent, the original formula for the posterior weight is:
W(I|E) = W(I)*(1 - R(E|I)) + W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
If we add up the weights of the two split cases, we get:
W(Ii|E) = W(I)*(1-R(E|I)) W(Ir|E) = W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) ) W(Ii|E) + W(Ir|E) = W(I)*(1 - R(E|I)) + W(I)*R(E|I)*( TC(E|I)*C(E) + (1 - TC(E|I))*(1 - C(E)) )
Both give the exact same result, the split is correct.
This split can also be undone. The formulas for the weights can be expanded as:
W(Ii) = W(I)*(1 - R) = W(I) - W(I)*R W(Ir) = W(I)*R
And from there it follows that:
W(Ii) + W(Ir) = W(I) - W(I)*R + W(I)*R = W(I)
Not surprisingly, the original weight can be found as the sum of two split weights. And from there R can be found as:
R = W(Ir) / W(I)
If we want to combine multiple cases with fractional relevances, we can split each of them into the relevant and irrelevant parts, combine the relevant parts together, combine the irrelevant parts together (this is easy: just add up their weights), and then undo the split.
Incidentally, the whole relevance thing can also be expressed purely by the manipulation with the case weights and training confidences.
The point of the relevance concept is to leave the weight of a case unchanged if the event is irrelevant to it. The closest effect that can be achieved with the training confidences is the TC=0.5, in it the weight gets halved, no matter what is C(E):
W(I|E) = W(I)*( TC(E)*C(E) + (1 - TC(E))*(1 - C(E)) ) = W(I)*( 0.5*C(E) + (1 - 0.5)*(1 - C(E)) ) = W(I)*( 0.5*(C(E) + 1 - C(E)) ) = W(I)*0.5
Which means that we can simulate an irrelevant event purely through the training confidence, by setting TC(E|I)=0.5 and doubling the weight of the event. For example, if we had a case with weight 15 and an irrelevant event
15 * ...,0/R=0,...
We can replace it with the case of weight 30 and TC=0.5.
30 * ...,0.5,...
The same can be also seen in another way: create the second case, exactly the same as the first one and with the same weight, except that put TC(E|I)=0 in one case and TC(E|I)=1 in another case. No matter what C(E), one case will get through and another one will be thrown away (or the matching fractions of both cases will be thrown away, still leaving one original weight). For example, taking the same case of weight 15 and an irrelevant event, we can convert it to two cases:
15 * ...,0,... 15 * ...,1,...
Either way, doubling the weight of one case, or splitting into two cases, the total weight gets doubled, and the average TC(E|I) gets set equal to 0.5.
No comments:
Post a Comment