Tuesday, February 28, 2012

The proper aggregation, part 8: call arguments

After all this talk let's look at an example of the calls to the aggregation computation function.  Just make a "computation" that prints the call arguments:

sub computeAverage9 # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;

  print STDERR &Triceps::aggOpString($aggop), " ",
    &Triceps::opcodeString($opcode), " ",
    $context->groupSize(), " ",
    (!$rh->isNull()? $rh->getRow()->printP(): "NULL"), "\n";
}

It prints the aggregation operation, the result opcode, row count in the group, and the argument row (or "NULL"). The aggregation is on a FIFO index with the size limit of 2.

And here is a printout example, with the input rows in italics as usual. Only to make keeping track of it easier, I broke up the sequence into multiple pieces, with a comment paragraph after each piece.

OP_INSERT,1,AAA,10,10
AO_AFTER_INSERT OP_INSERT 1 id="1" symbol="AAA" price="10" size="10" 
OP_INSERT,2,BBB,100,100
AO_AFTER_INSERT OP_INSERT 1 id="2" symbol="BBB" price="100" size="100"

The insert of the first row in the group causes only one call. There is no previous value to delete, only a new one to insert. The call happens after the row has been inserted into the group.

OP_INSERT,3,AAA,20,20
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_INSERT OP_INSERT 2 id="3" symbol="AAA" price="20" size="20" 

Adding the second record in a group means that the aggregation result for this group is modified. So first the aggregator is called to delete the old result, then the new row gets inserted, and the aggregator is called the second time to produce its new result.

OP_INSERT,5,AAA,30,30
AO_BEFORE_MOD OP_DELETE 2 NULL
AO_AFTER_DELETE OP_NOP 2 id="1" symbol="AAA" price="10" size="10" 
AO_AFTER_INSERT OP_INSERT 2 id="5" symbol="AAA" price="30" size="30" 

The insertion of the third row triggers the replacement policy in the FIFO index. The replacement policy causes the row with id=1 to be deleted before the row with id=5 is inserted. For the aggregator result it's still a single delete-insert pair: First, before modification, the old aggregation result is deleted. Then the contents of the group gets modified with both the delete and insert. And then the aggregator gets told, what has been modified. The deletion of the row with id=1 is not the last step, so that call gets the opcode of OP_NOP. Note that the group size with it is 2, not 1. That's because the aggregator gets notified only after all the modifications are already done. So the additive part of the computation must never read the group size or do any kind of iteration through the group, because that would often cause an incorrect result: it has no way to tell, what other modifications have been already done to the group. The last AO_AFTER_INSERT gets the opcode of OP_INSERT which tells the computation to send the new result of the aggregation.

OP_INSERT,3,BBB,20,20
AO_BEFORE_MOD OP_DELETE 2 NULL
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_DELETE OP_INSERT 1 id="3" symbol="AAA" price="20" size="20" 
AO_AFTER_INSERT OP_INSERT 2 id="3" symbol="BBB" price="20" size="20" 

This insert is of a "dirty" kind, the one that replaces the row using the replacement policy of the hashed primary index, without deleting its old state first. It also moves the row from one aggregation group to another. So the table logic calls AO_BEFORE_MOD for each of the modified groups, then modifies the contents of the groups, then tells both groups about the modifications. In this case both calls with AO_AFTER_* have the opcode of OP_INSERT because each of them is the last and only change to a separate aggregation group.

OP_DELETE,5
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_DELETE OP_INSERT 0 id="5" symbol="AAA" price="30" size="30" 
AO_COLLAPSE OP_NOP 0 NULL

This operation removes the last row in a group. It starts as usual with deleting the old state. The next AO_AFTER_DELETE with OP_INSERT is intended for the Coral8-style aggregators that produce only the rows with the INSERT opcodes, never DELETEs, to let them insert the NULL (or zero) values in all the non-key fields. For the normal aggregators the work is done after OP_DELETE. That's why all the shown examples were checking for $context->groupSize() == 0 and returning if so. The group size will be zero in absolutely no other case than after the deletion of the last row. Finally AO_COLLAPSE allows to clean up the aggregator's group state if it needs any cleaning. It has the opcode OP_NOP in Triceps version 1.0, or OP_DELETE in 0.99.

No comments:

Post a Comment