Sergey Babkin on CEP and stuff: June 2012

Wednesday, June 20, 2012

more Unit methods

I still keep adding stuff. In Unit I've added 3 more methods:

$result = $unit->getStackDepth();

Returns the current depth of the call stack (the number of the stack frames on the queue). It isn't of any use for the model logic as such but comes handy for debugging, to check in the loops that you haven't accidentally created a stack growing with iterations. When the unit is not running, the stack depth is 1, since the outermost frame always stays on the stack. When a rowop is being executed, the stack depth is at least 2.

($labelBegin, $labelNext, $frameMark) = $unit->makeLoopHead(
    $rowType, "name", $clearSub, $execSub, @args);

($labelBegin, $labelNext, $frameMark) = $unit->makeLoopAround(
    "name", $labelFirst);

The convenience methods to create the whole front part of the topological loop.

These methods use the new error handling convention, confessing on the errors. There is no need to check the result.

makeLoopHead() creates the front part of the loop that starts with a Perl label. It gets the arguments for that label and creates it among the other things. makeLoopAround() creates the front part of the loop around an existing label that will be the first one executed in the loop. makeLoopHead() is really redundant and can be replaced with a combination of makeLabel() and makeLoopAround().

They both return the same results, a triplet:

The label where you send a rowop to initiate the loop (remember that the loop consists of one row going through the loop at a time), $labelBegin.
The label that you use at the end of the loop in the loopAt() to do the next iteration of the loop, $labelNext.
The frame mark that you use in loopAt(), $frameMark. You don't need to set the frame mark, it will be set for you in the wrapper logic.

The name is used to construct the names of the elements by adding a dotted suffix: “name.begin”, “name.next” for makeLoopHead() or “name.wrapnext” for makeLoopAround(), “name.mark”. makeLoopAround() takes the row type for its created labels from the first label that is given to it as an argument.

The manual contains a whole new big example with them, but I see no point in copying it to the blog now, you'll have to read the manual for it.

Friday, June 15, 2012

the Unit update

I've been editing the docs about Units, and there have been a couple of developments.

First, I've noticed that I've already got a decent fix for the more serious scheduling issue. To get the predictable order with the scheduling, just always feed the rowops one by one into the model. I've been doing it in the later examples all over the place.

Second, I think I've found a nice compromise for the tracing that on one hand doesn't create deep indentation, and on the other hand lets to find the nesting boundaries easy: add "{" and "}" at the end of the lines before and after running the label. Then it can be written to a file and an editor like vi or Emacs can be used to jump from one end to the other.

Wednesday, June 6, 2012

error handling in the Perl wrappers

Since the API started the transition to confessing on the fatal errors instead of just returning an undef and an error message, I've converted the Perl wrapper methods to do the same. This includes:

AggregatorContext::makeHashSend()
AggregatorContext::makeArraySend()

Label::makeRowopHash()
Label::makeRowopArray()

Table::findBy()Table::findIdxBy() 

Unit::makeHashCall()
Unit::makeArrayCall()
Unit::makeHashSchedule()
Unit::makeArraySchedule()
Unit::makeHashLoopAt()
Unit::makeArrayLoopAt()

Tuesday, June 5, 2012

Memory management strikes back

I've started collecting the docs, and I've found that I wrote up a whole set of rules for how to keep the memory references correct. And I've also found that I've completely ignored them. This means that the rules are way too tricky, and thus don't make a whole lot of sense. Something had to be done about it. And here we go:

First, the problems come not with the data that goes through the models but with the models themselves. The data gets reference-counted without any issues. The reference loops can get formed only between the elements of the models: labels, tables etc. If you don't need them destroyed until the program exits (or more exactly, until the Perl interpreter instance exits), there is no problem. The leaks would happen only if the model elements get created and destroyed as the program runs, such as if you use them to parse and process the short-lived ad-hoc queries.

Second, these leaks are pretty hard to diagnose. There are some packages, like Devel::Cycle, but they won't detect the loops that involve a reference at C++ level. And when the Perl interpreter exits, it clears up all the variables used, even the ones involved in the loops, so if you run it under valgrind, valgrind doesn't show any leaks. If the Perl interpreter allowed to detect all these left-over variables, it would work as a high-level version of valgrind, and it would help.

Third, Perl provides the weak references (using the module Scalar::Util) but the problem is that you need to not forget weakening the references manually. Too much work, too much attention.

Fourth, here are the new, simplified rules.

Have more faith in the label clearing as the driving force. When you create the template objects, don't be afraid to have references to other objects, to units etc., as long as they get cleared by the label clearing logic. But make sure that the label clearing logic gets actually called. When you delete a unit, make sure to call its clearing either directly or through deletion of the unit clearing trigger.

I've been talking about how if a label used by an object receives $self as an argument, it should have a clearing function that would undefine that object hash. And I've never actually done it in any of the join and aggregation examples. Defining it every time is too much work and too easy to forget. The new and better approach: now there is a pre-defined function Triceps::clearArgs(). Just reuse it, there is no need to re-create it from scratch. Better yet, now it gets used automatically if the clearing function for a label is specified as undef (if you really want the clearing to do nothing, use "sub {}" instead).

Some templates don't have their own input labels, instead they just combine and tie together a few internal objects, and use the input labels of some of these internal objects as their inputs. JoinTwo is one of them, it just combines two LookupJoins. Without an input label, there would be no clearing, and the template object's has would never be undefined. To make life easier, now there is a way to create the special clearing-only labels:

$lb = $unit->makeClearingLabel("name", @args);

Since the call is from "should never fail" category, on any errors it will confess. There is no need to check the result. The result can be saved in a variable or can be simply ignored. If you throw away the result, you won't be able to access that label from the Perl code but it won't be lost: it will be still referenced from the unit, until the unit gets cleared.

For a concrete usage example, here is how JoinTwo uses it:

$self->{clearingLabel} = $self->{unit}->makeClearingLabel(
  $self->{name} . ".clear", $self);

Passing $self as an argument makes sure that $self gets cleared.

Note how the clearing label doesn't have a row type. In reality every label does how a row type, just it would be silly to abuse the random row types to create the clearing-only labels. Because of this, the clearing labels are created with a special empty row type that has no fields in it. If you ever want to use this row type for any other purposes, you can get it with the method

$rt = $unit->getEmptyRowType();

Each unit has its own copy of the empty row type, and although they are all matching and equal, they are not the same. However nothing stops you from mixing them up, a row type as such is not connected to a unit, it's just convenient to create an empty row type once in a unit and then reuse it when the unit creates the clearing-only labels.

Sergey Babkin on CEP and stuff

Wednesday, June 20, 2012

more Unit methods

Friday, June 15, 2012

the Unit update

Wednesday, June 6, 2012

error handling in the Perl wrappers

Tuesday, June 5, 2012

Memory management strikes back

Links

About Me

Labels

Blog Archive