Last time we've seen that the left outer join would be better for the logic. Not a problem, just change the join type option:
our $join = Triceps::JoinTwo->new( name => "join", leftTable => $tPosition, leftIdxPath => [ "currencyLookup" ], rightTable => $tToUsd, rightIdxPath => [ "primary" ], type => "left", ); # would die by itself on an error
Now the positions would pass through even if the currency translation is not available. The same input now produces a different result:
cur,OP_INSERT,20120310,USD,1 cur,OP_INSERT,20120310,GBP,2 cur,OP_INSERT,20120310,EUR,1.5 pos,OP_INSERT,20120310,one,AAA,100,15,USD join.leftLookup.out OP_INSERT date="20120310" customer="one" symbol="AAA" quantity="100" price="15" currency="USD" toUsd="1" pos,OP_INSERT,20120310,two,AAA,100,8,GBP join.leftLookup.out OP_INSERT date="20120310" customer="two" symbol="AAA" quantity="100" price="8" currency="GBP" toUsd="2"So far things are going the same as for the inner join.
pos,OP_INSERT,20120310,three,AAA,100,300,RUR join.leftLookup.out OP_INSERT date="20120310" customer="three" symbol="AAA" quantity="100" price="300" currency="RUR"Here comes the first difference: even though there is no translation for RUR, the row still passes through (with the field toUsd being NULL).
pos,OP_INSERT,20120310,three,BBB,200,80,GBP join.leftLookup.out OP_INSERT date="20120310" customer="three" symbol="BBB" quantity="200" price="80" currency="GBP" toUsd="2"This is also unchanged.
cur,OP_INSERT,20120310,RUR,0.04 join.rightLookup.out OP_DELETE date="20120310" customer="three" symbol="AAA" quantity="100" price="300" currency="RUR" join.rightLookup.out OP_INSERT date="20120310" customer="three" symbol="AAA" quantity="100" price="300" currency="RUR" toUsd="0.04"Now the second difference: since this row from the left side has already passed through, just sending another INSERT for it would make the data inconsistent. The original result without the translation must be deleted first, and then a new one, with translation, inserted. JoinTwo is smart enough to figure it out all by itself.
cur,OP_DELETE,20120310,GBP,2 join.rightLookup.out OP_DELETE date="20120310" customer="two" symbol="AAA" quantity="100" price="8" currency="GBP" toUsd="2" join.rightLookup.out OP_INSERT date="20120310" customer="two" symbol="AAA" quantity="100" price="8" currency="GBP" join.rightLookup.out OP_DELETE date="20120310" customer="three" symbol="BBB" quantity="200" price="80" currency="GBP" toUsd="2" join.rightLookup.out OP_INSERT date="20120310" customer="three" symbol="BBB" quantity="200" price="80" currency="GBP"The same logic works for the deletes, only backwards: when the translation for GBP is deleted, the result rows that used it change to the ones without the translation.
cur,OP_INSERT,20120310,GBP,2.2 join.rightLookup.out OP_DELETE date="20120310" customer="two" symbol="AAA" quantity="100" price="8" currency="GBP" join.rightLookup.out OP_INSERT date="20120310" customer="two" symbol="AAA" quantity="100" price="8" currency="GBP" toUsd="2.2" join.rightLookup.out OP_DELETE date="20120310" customer="three" symbol="BBB" quantity="200" price="80" currency="GBP" join.rightLookup.out OP_INSERT date="20120310" customer="three" symbol="BBB" quantity="200" price="80" currency="GBP" toUsd="2.2"And again, when the new translation for GBP comes in, the DELETE-INSERT sequence is done for each of the rows. As you can see, the update of the GBP translation worked in the not most efficient way. Fundamentally, if we knew that a DELETE of GBP will be immediately followed by an INSERT, we could skip inserting and then deleting the rows with the NULL in toUsd. But we don't know, and in Triceps there is no way to know that.
If you really, really want to avoid the propagation of these intermediate changes, put a Collapse after the join, and flush it only after the whole update has been processed. There will be more overhead in the Collapse itself, but all the logic below it will skip the intermediate changes. If this logic below is heavy-weight, that might be an overall win. A caveat though: a Collapse requires that the data has a primary key, a JoinTwo doesn't require its result (nor its inputs) to have a primary key. Because of this, the collapse might not work right with every possible join, you'd have to limit yourself to the joins that produce the data with a primary key.
pos,OP_DELETE,20120310,one,AAA,100,15,USD join.leftLookup.out OP_DELETE date="20120310" customer="one" symbol="AAA" quantity="100" price="15" currency="USD" toUsd="1" pos,OP_INSERT,20120310,one,AAA,200,16,USD join.leftLookup.out OP_INSERT date="20120310" customer="one" symbol="AAA" quantity="200" price="16" currency="USD" toUsd="1"And the rest is again the same as with an inner join.
No comments:
Post a Comment