Wednesday, February 29, 2012

The proper aggregation, part 9: multiple indexes

I've mentioned before that the floating numbers are tricky to handle. Even without additive aggregation the result depends on the rounding. Which in turn depends on the order in which the operations are done. Let's look at a version of the aggregation code that highlights this issue.

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;

  # don't send the NULL record after the group becomes empty
  return if ($context->groupSize()==0
    || $opcode != &Triceps::OP_INSERT);

  my $sum = 0;
  my $count = 0;
  for (my $rhi = $context->begin(); !$rhi->isNull(); 
      $rhi = $context->next($rhi)) {
    $count++;
    $sum += $rhi->getRow()->get("price");
  } 
  my $rLast = $context->last()->getRow() or die "$!";
  my $avg = $sum/$count;

  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"), 
    id => $rLast->get("id"), 
    price => $avg
  ) or die "$!";
  $context->send($opcode, $res) or die "$!";
}

my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId", 
    Triceps::IndexType->newHashed(key => [ "id" ])
  ) 
  ->addSubIndex("bySymbol", 
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("last4",
      Triceps::IndexType->newFifo(limit => 4)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", undef, \&computeAverage10)
      )     
    )   
  ) 
or die "$!";
...
my $lbAverage = $uTrades->makeLabel($rtAvgPrice, "lbAverage",
  undef, sub { # (label, rowop)
    printf("%.17g\n", $_[1]->getRow()->get("price")));
  }) or die "$!";
$tWindow->getAggregatorLabel("aggrAvgPrice")->chain($lbAverage)
  or die "$!";

The differences from the previously shown basic aggregation are:
  • the FIFO limit has been increased to 4
  • the only result value printed by the lbAverage handler is the price, and it's printed with a higher precision to make the difference visible
  • the aggregator computation only does the inserts, to reduce the clutter in the results.

And here is an example of how the order matters:

OP_INSERT,1,AAA,1,10
1
OP_INSERT,2,AAA,1,10
1
OP_INSERT,3,AAA,1,10
1
OP_INSERT,4,AAA,1e16,10
2500000000000001
OP_INSERT,5,BBB,1e16,10
10000000000000000
OP_INSERT,6,BBB,1,10
5000000000000000
OP_INSERT,7,BBB,1,10
3333333333333333.5
OP_INSERT,8,BBB,1,10
2500000000000000

Of course, the real prices won't vary so wildly. But the other values could. This example is specially stacked to demonstrate the point. The final results for AAA and BBB should be the same but aren't. Why? The precision of the 64-bit floating-point numbers is such that as adding 1 to 1e16 makes this 1 fall beyond the precision, and the result is still 1e16. On the other hand, adding 3 to 1e16 makes at least a part of it stick. 1 still falls of but the other 2 of 3 sticks. Next look at the data sets: if you add 1e16+1+1+1, that's adding 1e16+1 repeated three times, and the result is still the same unchanged 1e16. But if you add 1+1+1+1e16, that's adding 3+1e16, and now the result is different and more correct. When the averages get computed by dividing the sums by 4, the results are still different.

Overall the rule of thumb for adding the floating point numbers is this: add them up in the order from the smallest to the largest. (What if the numbers can be negative too? I don't know, that goes beyond my knowledge of floating point calculations. My guess is that you still arrange them in the ascending order, only by the absolute value.) So let's do it in the aggregator.

our $idxByPrice;

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;
  our $idxByPrice;

  # don't send the NULL record after the group becomes empty
  return if ($context->groupSize()==0
    || $opcode != &Triceps::OP_INSERT);

  my $sum = 0;
  my $count = 0;
  my $end = $context->endIdx($idxByPrice);
  for (my $rhi = $context->beginIdx($idxByPrice); !$rhi->same($end);
      $rhi = $rhi->nextIdx($idxByPrice)) {
    $count++;
    $sum += $rhi->getRow()->get("price");
  }
  my $rLast = $context->last()->getRow() or die "$!";
  my $avg = $sum/$count;

  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"),
    id => $rLast->get("id"),
    price => $avg
  ) or die "$!";
  $context->send($opcode, $res) or die "$!";
}

my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId",
    Triceps::IndexType->newHashed(key => [ "id" ])
  )
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("last4",
      Triceps::IndexType->newFifo(limit => 4)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", undef, \&computeAverage11)
      )
    )
    ->addSubIndex("byPrice",
      Triceps::SimpleOrderedIndex->new(price => "ASC",)
      ->addSubIndex("multi", Triceps::IndexType->newFifo())
    )
  )
or die "$!";

$idxByPrice = $ttWindow->findSubIndex("bySymbol")
  ->findSubIndex("byPrice") or die "$!";

Here another index type is added, ordered by price. It has to be non-leaf, with a FIFO index type nested in it, to allow for multiple rows having the same price in them. That would work out more efficiently if the ordered index could have a multimap mode, but that is not supported yet.

When the compute function does its iteration, it now goes by that index. The aggregator can't be simply moved to that new index type, because it still needs to get the last trade id in the order the rows are inserted into the group. Instead it has to work with two index types: the one on which the aggregator is defined, and the additional one. The calls for iteration on an additional index are different. $context->beginIdx() is similar to $context->begin() but the end condition and the next step are done differently. Perhaps the consistency in this department can be improved in the future.

And finally, the reference to that additional index type has to make it somehow into the compute function. It can't be given as an argument because it's not known yet at the time when the aggregator is constructed (and no, reordering the index types won't help because the index types are copied when connected to their parents, and we need the exact index type that ends up in the assembled table type). So a global variable $idxByPrice is used. The index type reference is found and placed there, and later the compute function takes the reference from the global variable.

The printout from this version on the same input is:

OP_INSERT,1,AAA,1,10
1
OP_INSERT,2,AAA,1,10
1
OP_INSERT,3,AAA,1,10
1
OP_INSERT,4,AAA,1e16,10
2500000000000001
OP_INSERT,5,BBB,1e16,10
10000000000000000
OP_INSERT,6,BBB,1,10
5000000000000000
OP_INSERT,7,BBB,1,10
3333333333333334
OP_INSERT,8,BBB,1,10
2500000000000001

Now no matter what the order of the row arrival, the prices get added up in the same order from the smallest to the largest and produce the same correct (inasmuch the floating point precision allows) result.

Which index type is used to put the aggregator on, doesn't matter a whole lot. The computation can be turned around, with the ordered index used as the main one, and the last value from the FIFO index obtained with $context->lastIdx():

our $idxByOrder;

sub computeAverage12 # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;
  our $idxByOrder;

  # don't send the NULL record after the group becomes empty
  return if ($context->groupSize()==0
    || $opcode != &Triceps::OP_INSERT);

  my $sum = 0;
  my $count = 0;
  for (my $rhi = $context->begin(); !$rhi->isNull();
      $rhi = $context->next($rhi)) {
    $count++;
    $sum += $rhi->getRow()->get("price");
  }
  my $rLast = $context->lastIdx($idxByOrder)->getRow() or die "$!";
  my $avg = $sum/$count;

  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"),
    id => $rLast->get("id"),
    price => $avg
  ) or die "$!";
  $context->send($opcode, $res) or die "$!";
}

my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId",
    Triceps::IndexType->newHashed(key => [ "id" ])
  )
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("last4",
      Triceps::IndexType->newFifo(limit => 4)
    )
    ->addSubIndex("byPrice",
      Triceps::SimpleOrderedIndex->new(price => "ASC",)
      ->addSubIndex("multi", Triceps::IndexType->newFifo())
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", undef, \&computeAverage12)
      )
    )
  )
or die "$!";

$idxByOrder = $ttWindow->findSubIndex("bySymbol")
  ->findSubIndex("last4") or die "$!";

The last important note: when aggregating with multiple indexes, always use the sibling index types forming the same group or their nested sub-indexes (since the actual order is defined by the first leaf sub-index anyway). But don't use the random unrelated index types. If you do, the context would return some unexpected values for those, and you may end up with endless loops.

Tuesday, February 28, 2012

The proper aggregation, part 8: call arguments

After all this talk let's look at an example of the calls to the aggregation computation function.  Just make a "computation" that prints the call arguments:

sub computeAverage9 # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;

  print STDERR &Triceps::aggOpString($aggop), " ",
    &Triceps::opcodeString($opcode), " ",
    $context->groupSize(), " ",
    (!$rh->isNull()? $rh->getRow()->printP(): "NULL"), "\n";
}

It prints the aggregation operation, the result opcode, row count in the group, and the argument row (or "NULL"). The aggregation is on a FIFO index with the size limit of 2.

And here is a printout example, with the input rows in italics as usual. Only to make keeping track of it easier, I broke up the sequence into multiple pieces, with a comment paragraph after each piece.

OP_INSERT,1,AAA,10,10
AO_AFTER_INSERT OP_INSERT 1 id="1" symbol="AAA" price="10" size="10" 
OP_INSERT,2,BBB,100,100
AO_AFTER_INSERT OP_INSERT 1 id="2" symbol="BBB" price="100" size="100"

The insert of the first row in the group causes only one call. There is no previous value to delete, only a new one to insert. The call happens after the row has been inserted into the group.

OP_INSERT,3,AAA,20,20
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_INSERT OP_INSERT 2 id="3" symbol="AAA" price="20" size="20" 

Adding the second record in a group means that the aggregation result for this group is modified. So first the aggregator is called to delete the old result, then the new row gets inserted, and the aggregator is called the second time to produce its new result.

OP_INSERT,5,AAA,30,30
AO_BEFORE_MOD OP_DELETE 2 NULL
AO_AFTER_DELETE OP_NOP 2 id="1" symbol="AAA" price="10" size="10" 
AO_AFTER_INSERT OP_INSERT 2 id="5" symbol="AAA" price="30" size="30" 

The insertion of the third row triggers the replacement policy in the FIFO index. The replacement policy causes the row with id=1 to be deleted before the row with id=5 is inserted. For the aggregator result it's still a single delete-insert pair: First, before modification, the old aggregation result is deleted. Then the contents of the group gets modified with both the delete and insert. And then the aggregator gets told, what has been modified. The deletion of the row with id=1 is not the last step, so that call gets the opcode of OP_NOP. Note that the group size with it is 2, not 1. That's because the aggregator gets notified only after all the modifications are already done. So the additive part of the computation must never read the group size or do any kind of iteration through the group, because that would often cause an incorrect result: it has no way to tell, what other modifications have been already done to the group. The last AO_AFTER_INSERT gets the opcode of OP_INSERT which tells the computation to send the new result of the aggregation.

OP_INSERT,3,BBB,20,20
AO_BEFORE_MOD OP_DELETE 2 NULL
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_DELETE OP_INSERT 1 id="3" symbol="AAA" price="20" size="20" 
AO_AFTER_INSERT OP_INSERT 2 id="3" symbol="BBB" price="20" size="20" 

This insert is of a "dirty" kind, the one that replaces the row using the replacement policy of the hashed primary index, without deleting its old state first. It also moves the row from one aggregation group to another. So the table logic calls AO_BEFORE_MOD for each of the modified groups, then modifies the contents of the groups, then tells both groups about the modifications. In this case both calls with AO_AFTER_* have the opcode of OP_INSERT because each of them is the last and only change to a separate aggregation group.

OP_DELETE,5
AO_BEFORE_MOD OP_DELETE 1 NULL
AO_AFTER_DELETE OP_INSERT 0 id="5" symbol="AAA" price="30" size="30" 
AO_COLLAPSE OP_NOP 0 NULL

This operation removes the last row in a group. It starts as usual with deleting the old state. The next AO_AFTER_DELETE with OP_INSERT is intended for the Coral8-style aggregators that produce only the rows with the INSERT opcodes, never DELETEs, to let them insert the NULL (or zero) values in all the non-key fields. For the normal aggregators the work is done after OP_DELETE. That's why all the shown examples were checking for $context->groupSize() == 0 and returning if so. The group size will be zero in absolutely no other case than after the deletion of the last row. Finally AO_COLLAPSE allows to clean up the aggregator's group state if it needs any cleaning. It has the opcode OP_NOP in Triceps version 1.0, or OP_DELETE in 0.99.

The proper aggregation, part 7: additive

From the last example you can see that the additive aggregation contains enough information in its state to generate the result rows quickly without an iteration. This means that keeping the saved result row for DELETEs doesn't give a whole lot of advantage and adds at least a little memory overhead.  We can change the code and avoid keeping it:

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
    my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;
    my $rowchg;

    if ($aggop == &Triceps::AO_COLLAPSE) {
        return
    } elsif ($aggop == &Triceps::AO_AFTER_DELETE) {
        $state->{price_sum} -= $rh->getRow()->get("price");
    } elsif ($aggop == &Triceps::AO_AFTER_INSERT) {
        $state->{price_sum} += $rh->getRow()->get("price");
    }
    # on AO_BEFORE_MOD do nothing

    return if ($context->groupSize()==0
        || $opcode == &Triceps::OP_NOP);

    my $rLast = $context->last()->getRow() or die "$!";
    my $count = $context->groupSize();

    $context->makeHashSend($opcode,
        symbol => $rLast->get("symbol"),
        id => $rLast->get("id"),
        price => $state->{price_sum}/$count,
    ) or die "$!";
}

sub initAverage #  (@args)
{
    return { price_sum => 0 };
}

The other change in this example is that the sum gets directly added or subtracted in AO_AFTER_* instead of computing the sign first. It's all pretty much self-explanatory.

The proper aggregation, part 6: additive

In some cases the aggregation values don't have to be calculated by going through all the rows from scratch every time. If you do a sum of a field, you can as well add the value of the field when a row is inserted and subtract when a row is deleted. Not surprisingly, this is called an "additive aggregation".

The averaging can also be done as an additive aggregation: it amounts to a sum divided by a count. The sum can obviously be done additive. The count is potentially additive too, but even better, we have the shortcut of $context->groupSize(). Well, at least for the same definition of count that has been used in the non-additive example. The SQL definition of count (and of average) includes only the non-NULL values, but here we go with the Perl approach where a NULL is taken to have the same meaning as 0. The proper SQL count could not use the shortcut but would still be additive.

Triceps provides a way to implement the additive aggregation too. It calls the aggregation computation function for each changed row, giving it an opportunity to react. The argument $aggop indicates, what has happened. Here is the same example rewritten in an additive way:

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;
  my $rowchg;

  if ($aggop == &Triceps::AO_BEFORE_MOD) { 
    $context->send($opcode, $state->{lastrow}) or die "$!";
    return;
  } elsif ($aggop == &Triceps::AO_AFTER_DELETE) { 
    $rowchg = -1; 
  } elsif ($aggop == &Triceps::AO_AFTER_INSERT) { 
    $rowchg = 1;
  } else { # AO_COLLAPSE, also has opcode OP_DELETE
    return
  } 

  $state->{price_sum} += $rowchg * $rh->getRow()->get("price");

  return if ($context->groupSize()==0
    || $opcode == &Triceps::OP_NOP);

  my $rLast = $context->last()->getRow() or die "$!";
  my $count = $context->groupSize();
  my $avg = $state->{price_sum}/$count;
  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"), 
    id => $rLast->get("id"), 
    price => $avg
  ) or die "$!";
  $state->{lastrow} = $res;

  $context->send($opcode, $res) or die "$!";
}

sub initAverage #  (@args)
{
  return { lastrow => undef, price_sum => 0 };
}

Also, the tricks of keeping an extra row could not be used with the additive aggregation. An additive aggregation relies on Triceps to tell it, which rows are deleted and which inserted, so it can not do any extra skipping easily. The index for the aggregation has to be defined with the correct limits. If we want an average of the last 2 rows, we set the limit to 2:

      Triceps::IndexType->newFifo(limit => 2)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", \&initAverage, \&computeAverage)
      )

The aggregation state has grown: now it includes not only the last sent row but also the sum of the price, which is used for the aggregation, kept together in a hash. The last sent row doesn't really have to be kept, and I'll show another example without it, but for now let's look at how things are done when it is kept.

The argument $aggop describes, why the computation is being called. Note that Triceps doesn't know if the aggregation is additive or not. It does the calls the same in every case. Just in the previous examples we weren't interested in this information and didn't look at it. $aggop contains one of the constant values:

&Triceps::AO_BEFORE_MOD: the group is about to be modified, need to send a DELETE of the old aggregated row. The argument $opcode will always be OP_DELETE.

&Triceps::AO_AFTER_DELETE: the group has been modified by deleting a row from it. The argument $rh will refer to the row handle being deleted. The $opcode may be either OP_NOP or OP_INSERT. A single operation on a table may affect multiple rows: an insert may trigger the replacement policy in the indexes and cause one or more rows to be deleted. If there are multiple rows deleted or inserted in a group, the additive aggregator needs to know about all of them to keep its state correct but does not need (and even must not) send a new result until the last one of them has been processed. The call for the last modification will have the opcode of OP_INSERT. The preceding intermediate ones will have the opcode of OP_NOP. An important point, even though a row is being deleted from the group, the aggregator opcode is OP_INSERT, because it inserts the new aggregator state!

&Triceps::AO_AFTER_INSERT: the group has been modified by inserting a row into it. Same as for AO_AFTER_DELETE, $rh will refer to the row handle being inserted, and $opcode will be OP_NOP or OP_INSERT.

&Triceps::AO_COLLAPSE: called after the last row is deleted from the group, just before the whole group is collapsed and deleted. This allows the aggregator to destroy its state properly. For most of the aggregators there is nothing special to be done. The only case when you want to do something is if your state causes some circular references. Perl doesn't free the circular references until the whole interpreter exits, and so you'd have to break the circle to let them be freed immediately. The aggregator should not produce any results on this call. In the version 0.99 this aggregator operation carried the opcode of OP_INSERT, but after some thinking, this didn't make a whole lot of sense, so for 1.0 I've changed it to OP_NOP. It doesn't matter a whole lot because the aggregator computation doesn't produce any result and should not care. But for the abstract aesthetic reasons OP_NOP looks better.

The computation reacts accordingly: for the before-modification it re-sends the old result with the new opcode, for the collapse does nothing, and for after-modification calculates the sign, whether the value from $rh needs to be added or subtracted from the sum. I'm actually thinking, maybe this sign should be passed as a separate argument too, and then both the aggregation operation constants AO_AFTER_* can be merged into one. We'll see, maybe it will be changed in the future.

Then the addition/subtraction is done and the state updated.

After that, if the row does not need to be sent (opcode is OP_NOP or group size is 0), can as well return here without constructing the new row.

If the row needs to be produced, continue with the same logic as the non-additive aggregator, only without iteration through the group. The field "id" in the result is produced by essentially the SQL LAST() operator. LAST() and FIRST() are not additive, they refer to the values in the last or first row in the group's order, and simply can not be calculated from looking at which rows are being inserted and deleted without knowing their order in the group. But they are fast as they are and do not require iteration. The same goes for the row count (as long as we don't care about excluding NULLs, violating the SQL semantics). And for averaging there is the last step to do after the additive part is done: divide the sum by the count.

All these non-additive steps are done in this last section, then the result row is constructed, remembered and sent.

Not all the aggregation operations can be expressed in an additive way. It may even vary by the data. For MAX(), the insertion of a row can be always done additively, just comparing the new value with the remembered maximum, and replacing it if the new value is greater. The deletion can also compare the deleted value with the remembered maximum. If the deleted value is less, then the maximum is unchanged. But if the deleted value is equal to the maximum, MAX() has to iterate through all the values and find the new maximum.

Some functions may be even trickier. The calculation of the standard deviation requires to find the mean (the same average but named in a more scientific way) value in an additive or iterative way, and after that iterate again and find the deviation from that mean.

There is also an issue with the floating point precision in the additive aggregation. It's not such a big issue if the rows are only added and never deleted from the group, but can get much worse with the deletion. Let me show it with a sample run of the additive code:

OP_INSERT,1,AAA,1,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="1" 
OP_INSERT,2,AAA,1e20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="1" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="2" price="5e+19" 
OP_INSERT,3,AAA,2,10
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="2" price="5e+19" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="5e+19" 
OP_INSERT,4,AAA,3,10
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="5e+19" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="4" price="1.5" 

Why is the last result 1.5 while it had to be (2+3)/2 = 2.5? Because adding together 1e20 and 2 had pushed the 2 beyond the precision of floating-point number. 1e20+2 = 1e20. So when the row with 1e20 was deleted from the group and subtracted form the sum, that left 0. Which got then averaged with 3.

Of course, with the real stock prices there won't be that much variation. But the subtler errors will still accumulate over time, and you have to expect them and plan accordingly.

Saturday, February 25, 2012

The proper aggregation, part 5: optimized DELETEs

Previously I've mentioned that the deletes in an aggregator do not have to be recalculated every time. Instead the rows can be remembered from the insert time, and simply re-sent with the new opcode. That allows to trade the CPU time for the extra memory. Of course, this works best when there are many rows per aggregation group. How many is "many"? It depends on the particular cases. You'd have to measure. Anyway, here is how it's done:

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;

  # don't send the NULL record after the group becomes empty
  return if ($context->groupSize()==0
    || $opcode == &Triceps::OP_NOP);
  if ($opcode == &Triceps::OP_DELETE) {
    $context->send($opcode, $$state) or die "$!";
    return;
  } 

  my $sum = 0;
  my $count = 0;
  for (my $rhi = $context->begin(); !$rhi->isNull(); 
      $rhi = $context->next($rhi)) {
    $count++;
    $sum += $rhi->getRow()->get("price");
  } 
  my $rLast = $context->last()->getRow() or die "$!";
  my $avg = $sum/$count;

  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"), 
    id => $rLast->get("id"), 
    price => $avg
  ) or die "$!";
  ${$state} = $res;
  $context->send($opcode, $res) or die "$!";
}

sub initRememberLast #  (@args)
{
  my $refvar;
  return \$refvar;
}

my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId", 
    Triceps::IndexType->newHashed(key => [ "id" ])
  ) 
  ->addSubIndex("bySymbol", 
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("last2",
      Triceps::IndexType->newFifo(limit => 2)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", \&initRememberLast, \&computeAverage)
      )
    )
  )
or die "$!";

The rest of the example stays the same, so it's not shown. Even in the part that is shown, very little has changed.

The aggregator type now has an initialization function. (This function is NOT of the same kind as for the sorted index!) This function gets called every time a new aggregation group gets created, before the first row is inserted into it. It initializes the aggregator group's Perl state (the state is per aggregator type, so if there are two parallel index types, each with an aggregator, each aggregator will have its own group state).

The state is stored in the group as a single Perl variable. So it usually is a reference. In this case the value returned is a reference to a variable that would contain a Row reference. (Ironically, the simplest case looks a bit more confusing than if it were a reference to an array or hash). Returning a reference to a "my" variable is a way to create a reference to an anonymous value: each time "my" executes, it creates a new value. Which is then kept in a reference after the initialization function returns. The next time the function executes, "my" would create another new value.

The computation function is passed that state as an argument and now makes use of it. It has two small additions. Before sending a new result row, that row gets remembered in the state reference. And then before doing any computation the function checks, if the required opcode is DELETE, and if so then simply resends the last result with the new opcode. Remember, the rows are not copied but reference-counted, so this is fairly cheap.

The extra level of referencing is used because simply assigning to $state would only change the local variable and not the value kept in the group.

However if you change the argument of the function directly, that would change the value kept in the group (similar to changing the loop variable in a foreach loop). So you can save a bit of overhead by eliminating the extra indirection. The changes will be:

sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
...
  if ($opcode == &Triceps::OP_DELETE) {
    $context->send($opcode, $state) or die "$!";
    return;
  }
...
  $_[5] = $res;
  $context->send($opcode, $res) or die "$!";
}

sub initRememberLast #  (@args)
{
  return undef;
}

Even though the initialization function returns undef, it still must be present. If it's not present, the state argument of the comparison function will contain a special hardcoded and unmodifiable undef constant, and nothing could be remembered.

And here is an example of its work:

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,2,BBB,100,100
tWindow.aggrAvgPrice OP_INSERT symbol="BBB" id="2" price="100" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,4,BBB,200,200
tWindow.aggrAvgPrice OP_DELETE symbol="BBB" id="2" price="100" 
tWindow.aggrAvgPrice OP_INSERT symbol="BBB" id="4" price="150" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="30" 
OP_DELETE,5
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="30" 

Since the rows are grouped by the symbol, the symbols "AAA" and "BBB" will have separate aggregation states.

Thursday, February 23, 2012

The proper aggregation, part 4 (another cute trick)

Now, with a sorted index available, we can put it to the uses of the aggregation. Just change the table type definition in the last example to aggregate on the sorted index and it becomes able to handle the updates:

my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId",
    Triceps::IndexType->newHashed(key => [ "id" ])
  )
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("orderById",
      Triceps::SimpleOrderedIndex->new(id => "ASC",)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", undef, \&computeAverage)
      )
    )
    ->addSubIndex("last3",
      Triceps::IndexType->newFifo(limit => 3))
  )
or die "$!";

Here is a sample of its work:

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="20" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="20" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_INSERT,7,AAA,40,40
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="7" price="35" 

When the row with id=3 gets deleted, the average reverts to the rows 1 and 5.  When the row 3 gets inserted back, the average works on rows 3 and 5 again. Then when the row 7 is inserted, the aggregation moves up to the rows 5 and 7.

Note that the row expiration is controlled by the FIFO index. So after the row 3 is inserted back, the order of rows in the FIFO becomes

1, 5, 3

Then when the row 7 is inserted, it advances to

5, 3, 7

At this point, until the row 3 gets naturally popped out of the FIFO, it's best not to have other deletions nor updates, or the group contents may become incorrect.

The FIFO and sorted index types work in parallel on the same group, and the sorted index always keeps the right order:

1, 3, 5
3, 5, 7

At long as the records with the two highest ids are in the group at all, the sorted index will keep them in the right position at the end.

In this case we could even make a bit of optimization: turn the sorting order around, and have the sorted index arrange the rows in the descending order. Then instead of skipping the rows until the last two, just take the first two rows of the reverse order. They'll be iterated in the opposite direction but for the averaging it doesn't matter. And instead of the last row take the first row of the opposite order. This is a simple modification and is left as an exercise for the reader.

Thinking further, the sensitivity to the ordering comes largely from the FIFO index. If the replacement policy could be done directly on the sorted index, it would become easier. Would be a good thing to add in the future. Also, if you keep all the day's trades anyway, you might not need to have a replacement policy at all: just pick the last 2 records for the aggregation. There is currently no way to iterate back from the end (another thing to add in the future) but the same trick with the opposite order would work.

For the next item, this table type indexes by id twice: once as a primary index, another time as a nested one. Are both of them really necessary or would just the nested one be good enough? That depends on your input data. If you get the deletes like "OP_DELETE,3" with all the other fields as NULL, then a separate primary index is definitely needed. But if the deletes come exactly as the same records that were inserted, only with a different opcode, like "OP_DELETE,3,AAA,20,20" then the primary index can be skipped because the nested sorted index will be able to find the rows correctly and handle them. The bottom line, the fully correct DELETE records are good.

Sorted index initialization, a simple ordered index template

To specify the sorting order in a more SQL-like fashion, Triceps now has the class SimpleOrderedIndex. It's implemented entirely in Perl, on top of the sorted index. Besides being useful by itself, it shows off two concepts: the initialization function of the sorted index, and the template with code generation on the fly.

First, how to create the ordered indexes:

my $tabType = Triceps::TableType->new($rowType)
  ->addSubIndex("sorted", 
    Triceps::SimpleOrderedIndex->new(
      a => "ASC",
      b => "DESC",
    )
  ) or die "$!";

The constructor takes a list of pairs fieldName => order, where the order is either "ASC" for ascending or "DESC" for descending.

The comparison function gets generated automatically. It's smart enough to generate the string comparisons for the string and uint8 fields, and the numeric comparisons for the numeric fields. It's not smart enough to do the locale-specific comparisons for the strings and locale-agnostic for the unit8, it just uses whatever you have set up in cmp for both. It treats the NULL field values as numeric 0 or empty strings. It doesn't handle the array fields at all but can at least detect such attempts and flag them as errors.

An interesting artifact of the boundary between C++ and Perl is that when you get the index type back from the table type like

$sortIdx = $tabType->findSubIndex("sorted") or die "$!";

the reference stored in $sortIdx will be of  the base type Triceps::IndexType. That's because the C++ internals of the TableType object know nothing about any derived Perl types. But it's no big deal, since there are no other useful methods for SimpleOrderedIndex anyway.

If you call $sortIdx->print(), it will give you an idea of how it was constructed:

PerlSortedIndex(SimpleOrder a ASC, b DESC, )

I'm not sure if I mentioned it yet, but all the index types have the method getKey() that for the hashed index types returns an array of key field names, and for the all other index types returns nothing. This includes the sorted index, and the simple ordered index that is derived from it. In the future I plan to allow returning the key list from the sorted indexes too, but haven't got around to do it yet.

The usage of the tables with these indexes is as with any other indexes. Since the PerlSortedIndex can be used in both leaf and non-leaf position, so can the SimpleOrderedIndex. Nothing special there.

Now the interesting part, the implementation of the sorted index. It's a little biggish for a blog post but not too huge:

package Triceps::SimpleOrderedIndex;
use Carp;

our @ISA = qw(Triceps::IndexType);

sub new # ($class, $fieldName => $direction...)
{
  my $class = shift;
  my @args = @_; # save a copy

  # build a descriptive sortName
  my $sortName = 'SimpleOrder ';
  while ($#_ >= 0) { 
    my $fld = shift;
    my $dir = shift;
    $sortName .= quotemeta($fld) . ' ' . quotemeta($dir) . ', ';
  } 

  $self = Triceps::IndexType->newPerlSorted(
    $sortName, \&init, undef, @args
  ) or confess "$!";
  bless $self, $class;
  return $self;
}

sub init # ($tabt, $idxt, $rowt, @args)
{
  my ($tabt, $idxt, $rowt, @args) = @_;
  my %def = $rowt->getdef(); # the field definition
  my $errors; # collect as many errors as possible
  my $compare = "sub {\n"; # the generated comparison function
  my $connector = "return"; # what goes between the comparison operators

  while ($#args >= 0) {
    my $f = shift @args;
    my $dir = uc(shift @args);

    my ($left, $right); # order the operands depending on sorting direction
    if ($dir eq "ASC") {
      $left = 0; $right = 1;
    } elsif ($dir eq "DESC") {
      $left = 1; $right = 0;
    } else {
      $errors .= "unknown direction '$dir' for field '$f', use 'ASC' or 'DESC'\n";
      # keep going, may find more errors
    }

    my $type = $def{$f};
    if (!defined $type) {
      $errors .= "no field '$f' in the row type\n";
      next;
    }

    my $cmp = "<=>"; # the comparison operator
    if ($type eq "string"
    || $type =~ /^uint8.*/) {
      $cmp = "cmp"; # string version
    } elsif($type =~ /\]$/) {
      $errors .= "can not order by the field '$f', it has an array type '$type', not supported yet\n";
      next;
    }

    my $getter = "->get(\"" . quotemeta($f) . "\")";

    $compare .= "  $connector \$_[$left]$getter $cmp \$_[$right]$getter\n";

    $connector = "||";
  }

  $compare .= "  ;\n";
  $compare .= "}";

  if (defined $errors) {
    # help with diagnostics, append the row type to the error listing
    $errors .= "the row type is:\n";
    $errors .= $rowt->print();
  } else {
    # compile the comparison
    #print STDERR "DEBUG Triceps::SimpleOrderedIndex::init: comparison function:\n$compare\n";
    my $cmpfunc = eval $compare
      or return "Triceps::SimpleOrderedIndex::init: internal error when compiling the compare function:\n"
        . "$@\n"
        . "The generated comparator was:\n"
        . $compare;
    $idxt->setComparator($cmpfunc)
      or return "Triceps::SimpleOrderedIndex::init: internal error: can not set the compare function:\n"
      . "$!\n";
  }
  return $errors;
} 
 

Sorry, but I'm too lazy to wrap the long lines manually, and the @#%^ blog engine doesn't wrap them automatically either. They should really use some less brain-damaged formatting.

The class constructor simply builds the sort name from the arguments and offloads the rest of logic to the init function. It can't really do much more: when the index type object is constructed, it doesn't know yet, where it will be used and what row type it will get. It tries to enquote nicely the weird characters in the arguments when they go into the sort name. Not that much use is coming from it at the moment: the C++ code that prints the table type information doesn't do the same, so there still is a chance of misbalanced quotes in the result. But perhaps the C++ code will be fixed at some point too.

The init function is called at the table type initialization time. By this time all this extra information is known, and it gets the references to the table type, index type (itself, but with the class stripped back to Triceps::IndexType), row type, and whatever extra arguments that were passed through the newPerlSorted(). Now the actual work can begin.

By the way, the sorted index type init function is NOT of the same kind as the aggregator type init function. The aggregator type could use an init function of this kind too, but at the time it looked like too much extra complexity. It probably will be added in the future. But more about aggregators later.

The init function's return value is kind of backwards to everything else: on success it returns undef, on error it returns the error message. It could die too, but simply returning an error message is somewhat nicer.

It goes through all the arguments, looks up the fields in the row type, and checks them for correctness. It tries to collect as much of the error information as possible. The returned error messages may contain multiple lines separated by "\n", and the ordered index makes use of it. The error messages get propagated back to the table type level, nicely indented and returned from the table initialization. If the init function finds any errors, it appends the printout of the row type too, to make finding what went wrong easier. A result of a particularly bad call to a table type initialization may look like this:

index error:
  nested index 1 'sorted':
    unknown direction 'XASC' for field 'z', use 'ASC' or 'DESC'
    no field 'z' in the row type
    can not order by the field 'd', it has an array type 'float64[]', not supported yet
    the row type is:
    row {
      uint8 a,
      uint8[] b,
      int64 c,
      float64[] d,
      string e,
    }

Also as the init goes through the arguments, it constructs the text of the compare function in the variable $compare. Here the use of quotemeta() for the user-supplied strings is important to avoid the syntax errors in the generated code. If no errors are found in the arguments, the compare function gets compiled with eval. There should not be any errors, but it's always better to check. Finally the compiled compare function is set in the sorted index with

$idxt->setComparator($cmpfunc)

This method works only on the PerlSorted index types (it knows how to check internally) and would fail on all others. It replaces any previous compare function set in newPerlSorted(), as well as the extra arguments for it. So really if you use an init function, you would always set the compare function in newPerlSorted() to undef because it will be replaced anyway. If you want to pass extra arguments, you do that as setComparator($cmpfunc, @args). But in this class all the information from the arguments is already compiled into the body of the comparator, and there is no more use for them. The init function absolutely must set the compare function. If the comparator is still undef after the init returns, the initialization will see it as an error.

If you uncomment the debugging printout line (and run "make", and maybe "make install" afterwards), you can see the auto-generated code printed on stderr when you use the simple ordered index. It will look somewhat like this:

sub {
  return $_[0]->get("a") cmp $_[1]->get("a")
  || $_[1]->get("c") <=> $_[0]->get("c")
  || $_[0]->get("b") cmp $_[1]->get("b")
  ;
}

That's it! An entirely new piece functionality added in a smallish Perl snippet. This is your typical Triceps template: collect the arguments, use them to build Perl code, and compile it. Of course, if you don't want to deal with the code generation and compilation, you can just call your class methods and whatnot to interpret the arguments. But if the code will be reused, the compilation is more efficient.

Sorted index

Let's take a short break from the aggregation. For a while I wanted to have an index type that would keep the records in a sorted order. It's convenient for both tests and examples.  But I really didn't want to create yet another version of the indexing code until the table navigation methods are worked out better.

Well, I've been thinking about it and figured out a way to not only reuse but even totally share that code between the hashed index and the new sorted index, pretty much unchanged. And added the sorted indexes for version 1.0.  A sorted index type is created with:

$it = Triceps::IndexType->newPerlSorted($sortName,
  \&initFunc, \&compareFunc, @args);

The "Perl" in "newPerlSorted" refers to the fact that the sorting order is specified as a Perl comparison function.

$sortName is just a symbolic name for printouts. It's used when you call $it->print() (directly or as a recursive call from the table type print) to let you know what kind of index type it is, since it can't print the compiled comparison function. It is also used in the error messages if something dies inside the comparison function: the comparison is executed from deep inside the C++ code, and by that time the sortName is the only way to identify the source of the problems. It's not the same name as used to connect the index type into the table type hierarchy with addSubIndex(). As usual, an index type may be reused in multiple hierarchies, with different names, but in all cases it will also keep the same sortName. This may be easier to show with an example:

$rt1 = Triceps::RowType->new(
  a => "int32",
  b => "string",
) or die "$!";

$it1 = Triceps::IndexType->newPerlSorted("basic", undef, \&compBasic
) or die "$!"; 

$tt1 = Triceps::TableType->new($rt1)
  ->addSubIndex("primary", $it1)
or die "$!"; 

$tt2 = Triceps::TableType->new($rt1)
  ->addSubIndex("first", $it1)
or die "$!";

print $tt1->print(), "\n";
print $tt2->print(), "\n";

will print:

table (
  row {
    int32 a,
    string b,
  }
) {
  index PerlSortedIndex(basic) primary,
}
table (
  row {
    int32 a,
    string b,
  }
) {
  index PerlSortedIndex(basic) first,
}

The initFunc and/or compareFunc references specify the sorting order. One of them may be left undef but not both. @args are the optional arguments that will be passed to both functions.

The easiest but least flexible way  is to just use the compareFunc. It gets two Rows (not RowHandles!) as arguments, plus whatever is specified in @args. It returns the usual Perl-style "<=>" result. For example:

sub compBasic # ($row1, $row2)
{
  return $_[0]->get("a") <=>  $_[1]->get("a");
}

Don't forget to use "<=>" for the numbers and "cmp" for the strings. The typical Perl idiom for sorting by more than one field is to connect them by "||".

Or, if we want to specify the field names as arguments, we could define a sort function that sorts first by a numeric field in ascending order, then by a string field in descending order:

sub compAscDesc # ($row1, $row2, $numFldAsc, $strFldDesc)
{
  my ($row1, $row2, $numf, $strf) = @_;
  return $row1->get($numf) <=> $row2->get($numf)
    || $row2->get($strf) cmp $row1->get($strf); # backwards for descending
}

my $sit = Triceps::IndexType->newPerlSorted("by_a_b", undef,
  \&compAscDesc, "a", "b") or die "$!";

This assumes that the row type will have a numeric field "a" and a string field "b". The problem is that if it doesn't then this will not be discovered until you create a table and try to insert some rows into it, which will finally call the comparison function. Even then it won't be exactly obvious because the comparison function never checks "$!" after get(), and you'll see no failures but all the rows will be considered equal and will replace each other.

You could check that the arguments match the row type ($row1->getType()) in the comparison function but that would add extra overhead, and the Perl comparisons are slow enough as they are.

The initFunc provides a way to do that check and more.

You might also wonder, why isn't there a way to just say as in SQL, "sort by this ascending and that descending"? In short, there eventually will be, and it will be also done in C++ for the better efficiency, but for now a comparison function both provides more flexibility and requires less effort to implement it. The initFunc allows to at least sort of handle this issue for now too. The next post will show, how.

Monday, February 20, 2012

The proper aggregation, part 3 (a cute trick)

Let's look again at the sample aggregation output with row deletion, from the last post:

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="30" 
OP_DELETE,5
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="30"

When the row with id=3 is deleted, the average price reverts to "30", which is the price of the trade with id=5, not the average of trades with id 1 and 5. This is because when the row with id=5 was inserted, it pushed out the row with id=1. Deleting the record with id=3 does not put that row with id=1 back (you can see the group contents in an even earlier printout with the manual aggregation). Like the toothpaste, once out of the tube, it's not easy to put back.

But for this particular kind of toothpaste there is a trick: keep more rows in the group just in case but use only the last ones for the actual aggregation. To allow an occasional deletion of a single row, we can keep 3 rows instead of 2.

So, change the table definition:

...
       Triceps::IndexType->newFifo(limit => 3)
... 

and modify the aggregator function to use only the last 2 rows from the group, even if more are available:

...
  my $skip = $context->groupSize()-2;
  my $sum = 0;
  my $count = 0;
  for (my $rhi = $context->begin(); !$rhi->isNull();
      $rhi = $context->next($rhi)) {
    if ($skip > 0) {
      $skip--;
      next;
    }
    $count++;
    $sum += $rhi->getRow()->get("price");
  }
  my $rLast = $context->last()->getRow() or die "$!";
  my $avg = $sum/$count;
...

The output from this version becomes:

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="20" 
OP_DELETE,5
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="20" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 

Now after "OP_DELETE,3" the average price becomes 20, the average of 10 and 30, because the row with id=1 comes into play again. Can you repeat that in the SQLy languages?

This version stores one extra row and thus can handle only one deletion (until the deleted row's spot gets pushed out of the window naturally, then it can handle another). It can not handle the arbitrary modifications properly. If you insert another row with id=3 for the same symbol AAA, the new version will be placed again at the end of the window. It it was the last row anyway, that is fine. But if it was not the last, as in this example, that would be an incorrect order that will produce incorrect results.

Sunday, February 19, 2012

The proper aggregation, part 2

And here is an example of the output from the last aggregation example (as usual, the input lines are in italics):

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="30" 
OP_DELETE,5
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="30" 

As you can see, it's exactly the same as from the manual aggregation example with the helper table, minus the debugging printout of the group contents. However here it's done without the helper table: instead the aggregation function is called before and after each update.

This presents a memory vs CPU compromise: a helper table uses more memory but requires less CPU for the aggregation computations (presumably, the insertion of the row into the table is less computationally intensive than the iteration through the original records).

The managed aggregators can be made to work with a helper table too: just chain a helper table to the aggregator's label, and in the aggregator computation add

return if ($opcode == &Triceps::OP_DELETE
  && $context->groupSize() != 1);

This would skip all the DELETEs except for the last one, before the group collapses.

There is also a way to optimize this logic right inside the aggregator: remember the last INSERT row sent, and on DELETE just resend the same row. This remembered last state can also be used for the other interesting optimizations that will be shown later.

Which approach is better, depends on the particular case. If you need to store the results of aggregation in a table for the future look-ups anyway, then that table is no extra overhead.  That's what the Aleri system does internally: since each element in its model keeps a primary-indexed table ("materialized view") of the result, that table is used whenever possible to generate the DELETEs without involving any logic. Or the extra optimization inside the aggregator can seriously improve the performance on the large groups. Sometimes you may want both.

Now let's look at the example that went wrong with the manual aggregation:

OP_INSERT,1,AAA,10,10
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="1" price="10" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="15" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="5" price="25" 
OP_INSERT,5,BBB,30,30
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="5" price="25" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="3" price="20" 
tWindow.aggrAvgPrice OP_INSERT symbol="BBB" id="5" price="30" 
OP_INSERT,7,AAA,40,40
tWindow.aggrAvgPrice OP_DELETE symbol="AAA" id="3" price="20" 
tWindow.aggrAvgPrice OP_INSERT symbol="AAA" id="7" price="30"

Here it goes right. Triceps recognizes that the second INSERT with id=5 moves the row to another group. So it performs the aggregation logic for both groups. First for the group where the row gets removed, it updates the aggregator result with a DELETE and INSERT (note that id became 3, since it's now the last row left in that group). Then the group where the row gets added, and since there was nothing in that group before, it generates only an INSERT.

Saturday, February 18, 2012

The proper aggregation, part1

Since the manual aggregation is error-prone, Triceps can manage it for you and do it right. The only thing you need to do is do the actual iteration and computation. Here is the rewrite of the same example with a Triceps aggregator:

my $uTrades = Triceps::Unit->new("uTrades") or die "$!";

# the input data
my $rtTrade = Triceps::RowType->new(
  id => "int32", # trade unique id
  symbol => "string", # symbol traded
  price => "float64",
  size => "float64", # number of shares traded
) or die "$!";

# the aggregation result
my $rtAvgPrice = Triceps::RowType->new(
  symbol => "string", # symbol traded
  id => "int32", # last trade's id
  price => "float64", # avg price of the last 2 trades
) or die "$!";

# aggregation handler: recalculate the average each time the easy way
sub computeAverage # (table, context, aggop, opcode, rh, state, args...)
{
  my ($table, $context, $aggop, $opcode, $rh, $state, @args) = @_;

  # don't send the NULL record after the group becomes empty
  return if ($context->groupSize()==0
    || $opcode == &Triceps::OP_NOP);

  my $sum = 0;
  my $count = 0;
  for (my $rhi = $context->begin(); !$rhi->isNull();
      $rhi = $context->next($rhi)) {
    $count++;
    $sum += $rhi->getRow()->get("price");
  }
  my $rLast = $context->last()->getRow() or die "$!";
  my $avg = $sum/$count;

  my $res = $context->resultType()->makeRowHash(
    symbol => $rLast->get("symbol"),
    id => $rLast->get("id"),
    price => $avg
  ) or die "$!";
  $context->send($opcode, $res) or die "$!";
}
my $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("byId",
    Triceps::IndexType->newHashed(key => [ "id" ])
  )
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
    ->addSubIndex("last2",
      Triceps::IndexType->newFifo(limit => 2)
      ->setAggregator(Triceps::AggregatorType->new(
        $rtAvgPrice, "aggrAvgPrice", undef, \&computeAverage)
      )
    )
  )
or die "$!";
$ttWindow->initialize() or die "$!";
my $tWindow = $uTrades->makeTable($ttWindow,
  &Triceps::EM_CALL, "tWindow") or die "$!";

# label to print the result of aggregation
my $lbAverage = $uTrades->makeLabel($rtAvgPrice, "lbAverage",
  undef, sub { # (label, rowop)
    print($_[1]->printP(), "\n");
  }) or die "$!";
$tWindow->getAggregatorLabel("aggrAvgPrice")->chain($lbAverage)
  or die "$!";

while(<STDIN>) {
  chomp;
  my @data = split(/,/); # starts with a string opcode
  $uTrades->makeArrayCall($tWindow->getInputLabel(), @data)
    or die "$!";
  $uTrades->drainFrame(); # just in case, for completeness
}

What changed in this code? The things got rearranged a bit.The aggregator is now defined as a part of the table type, so the aggregation result row type and its computational function had to be moved up.

The AggregatorType object holds the information about the aggregator. In the table type, the aggregator type gets attached to an index type with setAggregator(). In this case, to the FIFO index type.  At present an index type may have no more than one aggregator type attached to it. There is no particular reason for that, other than that it was slightly easier to implement, and that I can't think yet of a real-word situation where multiple aggregators on the same index would be needed. If this situation will ever occur, this support can be added. However a table type may have multiple aggregator types in it, on different indexes.  You can save a reference to an aggregator type in a variable and reuse it in the different table types too (though not multiple times in the same table, since that would cause a naming conflict).

The aggregator type is created with the arguments of result row type, aggregator name, group initialization Perl function (which may be undef, as in this example), group computation Perl function, and the optional arguments for the functions. Note that there is a difference in naming between the aggregator types and index types: an aggregator type knows its name, while an index type does not. An index type is given a name only in its hierarchy inside the table type, but it does not know its name.

When a table is created, it finds all the aggregator types in it, and creates an output label for each of them. The names of the aggregator types are used as suffixes to the table name. In this example the aggregator will have its output label named "tWindow.aggrAvgPrice". This puts all the aggregator types in the table into the same namespace, so make sure to give them different names in the same table type. Also avoid the names "in" and "out" because these are already taken by the table's own labels. The aggregator labels in the table can be found with

$aggLabel = $table->getAggregatorLabel("aggName") or die "$!";

The aggregator types are theoretically multithreaded, but for all I can tell, they will not integrate with the Perl multithreading well, due to the way the Perl objects (the execution methods!) are tied to each thread's separate interpreter. In the future expect that the table types with aggregators could not be shared between the threads.

After the logic is moved into a managed aggregator, the main loop becomes simpler. This new main loop also takes advantage of makeArrayCall() to become a little shorter yet. The label $lbAverage now reverts to just printing the rowops going through it, and gets chained to the aggregator output label.

The computation function gets a lot more arguments than it used to. The most interesting and most basic ones are $context, $opcode, and $rh. The rest are useful in the more complex cases only.

The aggregator type is exactly that: a type. It doesn't know, on which table or index, or even index type it will be used, and indeed, it might be used on multiple tables and index types. But to do the iteration on the rows, the computation function needs to get this information somehow. And it does, in the form of aggregator context. The manual aggregation used the last table output row to find, on which exact group to iterate. The managed aggregator gets the last modified row handle as the argument $rh. But our simple aggregator doesn't even need to consult $rh because the context takes care of finding the group too: it knows the exact group and exact index that needs to be aggregated (look at the index tree drawings for the difference between an index type and an index).

The context provides its own begin() and next() methods. They are actually slightly more efficient than the usual table iteration methods because they take advantage of that exact known index. The most important part, they work differently.

$context->next($rhi)

returns a NULL row handle when it reaches the end of the group. Do not, I repeat, DO NOT use the $rhi->next() in the aggregators, or you'll get some very wrong results.

The context also has a bit more of its own magic.

$context->last()

returns the last row handle in the group. This comes very handy because in most of the cases you want the data from the last row to fill the fields that haven't been aggregated as such. This is like the SQL function LAST(). Using the fields from the argument $rh, unless they are the key fields for this group, is generally not a good idea because it adds an extra dependency on the order of modifications to the table. The FIRST() or LAST() (i.e. the context's begin() or last()) are much better and not any more expensive.

$context->groupSize()

returns the number of rows in the group. It's your value of COUNT(*) in SQL terms, and if that's all you need, you don't need to iterate.

$context->send($opcode, $row)

constructs a result rowop and sends it to the aggregator's output label. Remember, the aggregator type as such knows nothing about this label, so the path through the context is the only path. Note also that it takes a row and not a rowop, because a label is needed to construct the rowop in the first place.

 $context->resultType()

provides the result row type needed to construct the result row. For the version 1.0 I've added a couple of convenience methods that combine the row construction and sending, that can be used instead:

$context->makeHashSend ($opcode, $fieldName => $fieldValue, ...)
$context->makeArraySend($opcode, $fieldValue, ...)

The final thing about the aggregator context: it works only inside the aggregator computation function. Once the function returns, all its methods start returning undefs. So there is no point in trying to save it for later in a global variable or such, don't do that.

As you can see, computeAverage() is has the same logic as before, only now uses the aggregation context. And I've removed the debugging printout of the rows in the group.

The last unexplained piece is the opcode handling and that comparison to OP_NOP.  Basically, the table calls the aggregator computation every time something changes in its index. It describes the reason for the call in the argument $aggop ("aggregation operation"). Depending on how clever an aggregator wants to be, it may do something useful on all of these occasions, or only on some of them. The simple aggregator that doesn't try any smart optimizations but just goes and iterates through the rows every time only needs to react in some of the cases. To make its life easier, Triceps pre-computes the opcode that should be used for the result and puts it into the argument $opcode.  So to ignore the non-interesting calls, the simple aggregator computation can just return if it sees the opcode OP_NOP.

Why does it also check for the group  size being 0? Again, Triceps provides flexibility in the aggregators. Among others, it allows to implement the logic like Coral8, when on deletion of the last row in the group the aggregator would send a row with all non-key fields set to NULL (it can take the key fields from the argument $rh). So for this specific purpose the computation function gets called with all rows deleted from the group, and $opcode set to OP_INSERT. And, by the way, a true Coral8-styled aggregator would ignore all the calls where the $opcode is not OP_INSERT. But the normal aggregators need to avoid doing this kind of crap, so they have to ignore the calls where $context->groupSize()==0.

Rowop creation wrappers

When writing the examples, I've got kind of tired of making all these 3-level-nested method calls to pass a set of data to a label. So now I've added a bunch of convenience wrappers for that purpose. The row-sending part from the manual example in computeAverage() now looks much simpler:

  if ($count) {
    $avg = $sum/$count;
    $uTrades->makeHashCall($lbAvgPriceHelper, &Triceps::OP_INSERT,
      symbol => $rhLast->getRow()->get("symbol"),
      id => $rhLast->getRow()->get("id"),
      price => $avg
    );
  } else {
    $uTrades->makeHashCall($lbAvgPriceHelper, &Triceps::OP_DELETE,
      symbol => $rLastMod->get("symbol"),
    );
  }

The full list of the methods added is:

$label->makeRowopHash($opcode, $fieldName => $fieldValue, ...)
$label->makeRowopArray($opcode, $fieldValue, ...)

$unit->makeHashCall($label, $opcode, $fieldName => $fieldValue, ...)
$unit->makeArrayCall($label, $opcode, $fieldValue, ...)

$unit->makeHashSchedule($label, $opcode, $fieldName => $fieldValue, ...)
$unit->makeArraySchedule($label, $opcode, $fieldValue, ...)

$unit->makeHashLoopAt($mark, $label, $opcode, $fieldName => $fieldValue, ...)
$unit->makeArrayLoopAt($mark, $label, $opcode, $fieldValue, ...)

The label methods amount to calling  makeRowHash() or makeRowArray() on their row type, and then wrapping the result into makeRowop(). The unit methods call the new label methods to create the rowop and then call, schedule of loop it. There aren't similar wrappers for forking or general enqueueing because those methods are not envisioned to be used often.

In the future the convenience methods will move into the C++ code and will become not only more convenient but also more efficient.

Thursday, February 16, 2012

The importance of DELETEs

Why did the last example worry so much about the DELETEs? Because without them, relying on just INSERTs for updates, it's easy to create bugs. The last example itself has an issue with handling the row replacement by INSERTs. Can you spot it from reading the code?

Here is run example that highlights the issue (as usual, the input lines are in italics):

OP_INSERT,1,AAA,10,10
Contents:
  id="1" symbol="AAA" price="10" size="10" 
tAvgPrice.out OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
Contents:
  id="1" symbol="AAA" price="10" size="10" 
  id="3" symbol="AAA" price="20" size="20" 
tAvgPrice.out OP_DELETE symbol="AAA" id="1" price="10" 
tAvgPrice.out OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
Contents:
  id="3" symbol="AAA" price="20" size="20" 
  id="5" symbol="AAA" price="30" size="30" 
tAvgPrice.out OP_DELETE symbol="AAA" id="3" price="15" 
tAvgPrice.out OP_INSERT symbol="AAA" id="5" price="25" 
OP_INSERT,5,BBB,30,30
Contents:
  id="5" symbol="BBB" price="30" size="30" 
tAvgPrice.out OP_INSERT symbol="BBB" id="5" price="30"
OP_INSERT,7,AAA,40,40
Contents:
  id="3" symbol="AAA" price="20" size="20" 
  id="7" symbol="AAA" price="40" size="40" 
tAvgPrice.out OP_DELETE symbol="AAA" id="5" price="25" 
tAvgPrice.out OP_INSERT symbol="AAA" id="7" price="30" 

The row with id=5 has been replaced to change the symbol from AAA to BBB. This act changes both the groups of AAA and of BBB, removing the row from the first one and inserting it into the second one. Yet only the output for BBB came out. The printout of the next row with id=7 and symbol=AAA shows that the row with id=5 has been indeed removed from the group AAA. It even corrects the result. But until that row came in, the average for the symbol AAA remained unchanged and incorrect.

There are multiple ways to fix this issue but first it had to be noticed. Which requires a lot of attention to detail. It's much better to avoid these bugs in the first place by sending the clean and nice input.

More of the manual aggregation

Returning to the last table example, it prints the aggregated information  (the average price of two records). This can be fairly easily changed to put the information into the rows and send them on as labels. The function printAverage() morphs into computeAverage():

my $rtAvgPrice = Triceps::RowType->new(
  symbol => "string", # symbol traded
  id => "int32", # last trade's id
  price => "float64", # avg price of the last 2 trades
) or die "$!";

# place to send the average: could be a dummy label, but to keep the
# code smalled also print the rows here, instead of in a separate label
my $lbAverage = $uTrades->makeLabel($rtAvgPrice, "lbAverage",
  undef, sub { # (label, rowop)
    print($_[1]->printP(), "\n");
  }) or die "$!";

# Send the average price of the symbol in the last modified row
sub computeAverage # (row)
{
  return unless defined $rLastMod;
  my $rhFirst = $tWindow->findIdx($itSymbol, $rLastMod) or die "$!";
  my $rhEnd = $rhFirst->nextGroupIdx($itLast2) or die "$!";
  print("Contents:\n");
  my $avg;
  my ($sum, $count);
  my $rhLast;
  for (my $rhi = $rhFirst;
      !$rhi->same($rhEnd); $rhi = $rhi->nextIdx($itLast2)) {
    print("  ", $rhi->getRow()->printP(), "\n");
    $rhLast = $rhi;
    $count++;
    $sum += $rhi->getRow()->get("price");
  }
  if ($count) {
    $avg = $sum/$count;
    $uTrades->call($lbAverage->makeRowop(&Triceps::OP_INSERT,
      $rtAvgPrice->makeRowHash(
        symbol => $rhLast->getRow()->get("symbol"),
        id => $rhLast->getRow()->get("id"),
        price => $avg
      )
    ));
  }
}

For the demonstration, the aggregated records sent to $lbAverage get printed. The records being aggregated are printed during the iteration too. And here is a sample run's result, with the input records shown in italics:

OP_INSERT,1,AAA,10,10
Contents:
  id="1" symbol="AAA" price="10" size="10" 
lbAverage OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
Contents:
  id="1" symbol="AAA" price="10" size="10" 
  id="3" symbol="AAA" price="20" size="20" 
lbAverage OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
Contents:
  id="3" symbol="AAA" price="20" size="20" 
  id="5" symbol="AAA" price="30" size="30" 
lbAverage OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
Contents:
  id="5" symbol="AAA" price="30" size="30" 
lbAverage OP_INSERT symbol="AAA" id="5" price="30" 
OP_DELETE,5
Contents:

There are a couple of things to notice about it: it produces only the INSERT rowops, no DELETEs, and when the last record of the group is removed, that event produces nothing.

The first item is mildly problematic because the processing downstream from here might not be able to handle the updates properly without the DELETE rowops. It can be worked around fairly easily by connecting another table, with the same primary key as the aggregation key, to store the aggregation results. That table would automatically transform the repeated INSERTs on the same key to a DELETE-INSERT sequence.

The second item is actually pretty bad because it means that the last record deleted gets stuck in the aggregation results. The Coral8 solution for this situation is to send a row with all non-key fields set to NULL, to reset them (interestingly, it's a relatively recent addition, that bug took Coral8 years to notice). But with the opcodes available, we can as well send a DELETE rowop with a similar contents, the helper table will fill in the rest of the fields, and produce a clean DELETE.

All this can be done by the following changes. Add the table, remember its input label in $lbAvgPriceHelper. It will be used to send the aggregated rows instead of $tAvgPrice.

my $ttAvgPrice = Triceps::TableType->new($rtAvgPrice)
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
  )
or die "$!";
$ttAvgPrice->initialize() or die "$!";
my $tAvgPrice = $uTrades->makeTable($ttAvgPrice,
  &Triceps::EM_CALL, "tAvgPrice") or die "$!";
my $lbAvgPriceHelper = $tAvgPrice->getInputLabel() or die "$!";

Then still use $tAvgPrice to print the records coming out, but now connect it after the helper table:

$tAvgPrice->getOutputLabel()->chain($lbAverage) or die "$!";

And in computeAverage() change the destination label and add the case for when the group becomes empty:

...
  if ($count) {
    $avg = $sum/$count;
    $uTrades->call($lbAvgPriceHelper->makeRowop(&Triceps::OP_INSERT,
      $rtAvgPrice->makeRowHash(
        symbol => $rhLast->getRow()->get("symbol"),
        id => $rhLast->getRow()->get("id"),
        price => $avg
      )
    ));
  } else {
    $uTrades->call($lbAvgPriceHelper->makeRowop(&Triceps::OP_DELETE,
      $rtAvgPrice->makeRowHash(
        symbol => $rLastMod->get("symbol"),
      )
    ));
  }
...

Then the output of the same example becomes:

OP_INSERT,1,AAA,10,10Contents:
  id="1" symbol="AAA" price="10" size="10" 
tAvgPrice.out OP_INSERT symbol="AAA" id="1" price="10" 
OP_INSERT,3,AAA,20,20
Contents:
  id="1" symbol="AAA" price="10" size="10" 
  id="3" symbol="AAA" price="20" size="20" 
tAvgPrice.out OP_DELETE symbol="AAA" id="1" price="10" 
tAvgPrice.out OP_INSERT symbol="AAA" id="3" price="15" 
OP_INSERT,5,AAA,30,30
Contents:
  id="3" symbol="AAA" price="20" size="20" 
  id="5" symbol="AAA" price="30" size="30" 
tAvgPrice.out OP_DELETE symbol="AAA" id="3" price="15" 
tAvgPrice.out OP_INSERT symbol="AAA" id="5" price="25" 
OP_DELETE,3
Contents:
  id="5" symbol="AAA" price="30" size="30" 
tAvgPrice.out OP_DELETE symbol="AAA" id="5" price="25" 
tAvgPrice.out OP_INSERT symbol="AAA" id="5" price="30" 
OP_DELETE,5
Contents:
tAvgPrice.out OP_DELETE symbol="AAA" id="5" price="30" 

All fixed, the proper DELETEs are coming out.

Sunday, February 12, 2012

The index tree, part 3

Continuing our excursion into the internals of the table, the next example is of two parallel leaf index types:

TableType
+-IndexType A
+-IndexType B

The resulting internal arrangement is shown on Fig. 8.

Fig. 8 Two top-level index types.

Each index type produces exactly one index under the root (since the top-level index types always produce one index). Both indexes contain the same number of rows, and exactly the same rows. When a row is added to the table, it's added to all the leaf index types (one actual index of each type). When a row is deleted from the table, it's deleted from all the leaf index types. So the total is always the same. However the order of rows in the indexes may differ. The drawing shows the row references stacked in the same order as the index A because the index A is of the first leaf index type, and as such is the default one for the iteration.

The row handle contains the iterators for both paths, A and B. It's pretty normal to find a row through one index type and then iterate from there using the other index type.

The next example on Fig. 9 has a "primary" index with a unique key and a "secondary" index that groups the records:

TableType
+-IndexType A
+-IndexType B
  +-IndexType C

Fig. 9. A "primary" and "secondary" index type.

The index type A still produces one index and references all the rows directly. The index of type B produces the groups, with each group getting an index of type C. The total set of rows referrable through A and through B is still the same but through B they are split into multiple groups.

And Fig. 10 shows two leaf index types nested under one non-leaf.

TableType
+-IndexType A
  +-IndexType B
  +-IndexType C

Fig. 10. Two index types nested under one.

As usual, there is only one index of type A, and it splits the rows into groups. The new item in this picture is that each group has two indexes in it: one of type B and one of type C. Both indexes in the group contain the same rows. They don't decide, which rows they get. The index A decides, which rows go into which group. Then if the group 1 contains two rows, indexes B1 and C1, would both contain two rows each, the exact same set. The stack of row references has been visually split by groups to make this point more clear.

This happens to be a pretty useful arrangement: for example, B might be a hash index type, or in the future, a sorted index type, allowing to find the records by the key (and for the sorted index, to iterate in the order of keys), while C might be a FIFO index, keeping the insertion order, and maybe keeping the window size limited.

That's pretty much it for the basic index topologies. Some much more complex index trees can be created, but they would be the combinations of the examples shown. Also, don't forget that every extra index type adds overhead in both memory and CPU time, so avoid adding indexes that are not needed.

One more fine point has to do with the replacement policies. Consider that we have a table that contains the rows with the single field:

id int32

And the table type has two indexes:

TableType
+-IndexType "A" HashIndex key=(id)
+-IndexType "B" FifoIndex limit=3

Then we send there the rowops:

INSERT id=1
INSERT id=2
INSERT id=3
INSERT id=2

The second insert of id=2 triggers the replacement policy in both index types. In the index A it is a duplicate key and will cause the removal of the previous row with id=2. In the index B it overflows the limit and pushes out the oldest row, the one with id=1. If both records get deleted, the resulting table contents will be 2 rows (shown in FIFO order):

id=3
id=2

Which is probably not the best outcome. It might be tolerable with a FIFO index and a hashed index but gets even more annoying if there are two FIFO index types in the table: one top-level limiting the total number of rows, another one nested under a hashed index, limiting the number of rows per group, and they start conflicting this way with each other.

So the FIFO index is actually smart enough to avoid such problems: it looks at what the preceding indexes have decided to remove, checks if any of these rows belong to its group, and adjusts its calculation accordingly. In this example the index B will find out that the row with id=2 is already displaced by the index A. That leaves only 2 rows in the index B, so adding a new one will need no displacement. The resulting table contents will be

id=1
id=3
id=2

However here the order of index types is important. If the table were to be defined as

TableType
+-IndexType "B" FifoIndex limit=3
+-IndexType "A" HashIndex key=(id)

then the replacement policy of the index type B would run first, find that nothing has been displaced yet, and displace the row id=1. After that the replacement policy of the index type A will run, and being a hashed index, it doesn't have a choice, it has to replace the row id=2. And both rows end up displaced.

If the situations with automatic replacement of rows by the keyed indexes may arise, always make sure to put the keyed leaf index types before the FIFO leaf index types. However if you always diligently send a DELETE before the INSERT of the new version of the recond, then this problem won't occur and the order of index types will not matter.

Saturday, February 11, 2012

The index tree, part 2

Let's use the Fig. 3 from the last post to go through how the other index-related operations work.

The iteration through the whole table starts with begin() or beginIdx(), the first being a form of the second that always uses the first leaf index type. BeginIdx() is fairly straightforward: it just follows the path from the root to the leaf, picking the first position in each index along the way, until it hits the RowHandle, as is shown in Fig. 4. That found RowHandle becomes its result. If the table is empty, it returns the NULL row handle.

If you specify a non-leaf index type as an argument of beginIdx(), the look-up can not just stop there because the non-leaf indexes are not directly connected to the row handles. It has to go through the whole chain to a leaf index. So, for a non-leaf index type argument the look-up is silently extended to its first leaf index type.

Fig. 4. Begin(), beginIdx($itA) and beginIdx($itB) work the same for this table.

The next pair is find() and findIdx(). As usual, find() is the same thing as findIdx() on the table's first leaf index type. It also follows the path from the root to the target index type. On each step it tries to find a matching position in the current index. If the position could not be found, the search fails and a NULL row handle is returned. If found, it is used to progress to the next index.

As has been mentioned before, the search always works internally on a RowHandle argument. If a plain Row is used as an argument, a new temporary RowHandle will be created for it, searched, and then freed after the search. This works well for two reasons. First, the indexes already have the functions for comparing two row handles to build their ordering. The same functions are reused for the search. Second, the row handles contain not only the index iterators but also the cached information from the rows, to make the comparisons faster. The exact kind of cached information varies by the index type. The FIFO indexes use none. The hashed indexes calculate a hash of the key field values, that will be used as a quick differentiator for the search. This information gets created when the row handle gets created. Whether the row handle is then used to insert into the table or to search in it, it's then used in the same way, to speed up the comparisons.

For findIdx(), the non-leaf index type arguments behave differently than the leaf ones: up to and including the index of the target type, the search works as usual. But then at the next level the logic switches to the same as in beginIdx(), going for the first row handle of the first leaf sub-index. This lets you find the first row handle of the matching group under the target index type.

If you use $table->findIdx($itA, $rh), on Fig. 5 it will go through the root index to the index A. There it will try to find the matching position. If none is found, the search ends and returns a NULL row handle. If the position is found, the search progresses towards the first leaf sub-index type. Which is the index type B, and which conveniently sits in this case right under A. The position in the index A determines, which index of type B will be used for the next step. Suppose it's the second position, so the second index of type B is used. Since we're now past the target index A, the logic used is the same as for beginIdx(), and the first position in B2 is picked. Which then leads to the first row handle of the second sub-stack of handles.

Fig. 5 FindIdx($itA, $rh) goes through A and then switches to the begin() logic.


The method firstOfGroupIdx() allows to navigate within a group, to jump from some row somewhere in the group to the first one, and then from there iterate through the group. The example of manual aggregation made use of it.

The Fig. 6 shows the example of $table->firstOfGroupIdx($itB, $rh), where $rh is pointing to the third record in B2. What it needs to do is go back to B2, and then execute the begin() logic from there on. However, remember, the row handle does not have a pointer to the indexes in the path, it only has the iterators. So, to find B2, the method does not really back up from the original row. It has to start back from the root and follow the path to B2 using the iterators in $rh. Since it uses the ready iterators, this works fast and requires no row comparisons. Once B2 (an index of type B) is reached, it goes for the first row in there.

FirstOfGroupIdx() works on both leaf and non-leaf index type arguments in the same way: it backs up from the reference row to the index of that type and executes the begin() logic from there. Obviously, if you use it on a non-leaf index type, the begin()-like part will follow its first leaf index type.

Fig. 6. FirstOfGroupIdx($itB, $rh)


The method nextGroupIdx() jumps to the first row of the next group, according to the argument index type. To do that, it has to retrace one level higher than firstOfGroupIdx(). Fig. 7 shows that $table->nextGroupIdx($itB, $rh) that starts from the same row handle as Fig. 6, has to logically back up to the index A, go to the next iterator there, and then follow to the first row of B3.

As usual, in reality there is no backing up, just the path is retraced from the root using the iterators in the row handle. Once the parent of index type B is reached (which is the index of type A), the path follows not the iterator from the row handle but the next one (yes, copied from the row handle, increased, followed). This gives the index of type B that contains the next group. And from there the same begin()-like logic finds its first row.

Same as firstOfGroupIdx(), nextGroupIdx() may be used on both the leaf and non-leaf indexes, with the same logic.

Fig. 7. NextGroupIdx($itB, $rh)

It's kind of annoying that firstOfGroupIdx() and nextGroupIdx() take the index type inside the group while findIdx() uses takes the parent index type to act on the same group. But as you can see, each of them follows its own internal logic, and I'm not sure if they can be reconciled to be more consistent.

At the moment the only navigation is forward. There is no matching last(), prev() or lastGroupIdx() or prevGroupIdx(). They are in the plan, but so far they are the victims of corner-cutting.