Sergey Babkin on CEP and stuff: January 2013

Wednesday, January 30, 2013

another approach to the weak references

I've recently developed another idea about the weak references. So, let's look at the basic problem:

There is an object A that has a reference to object B.

A->B

The object B needs to be able to reach back to the object A. But it can't have a reference because that would create a reference cycle. So instead it has to have some kind of weak reference. In the single-threaded situation it can be as simple as a plain pointer that gets reset to NULL when the object A is destroyed, like for example the Triceps Label has a pointer to its Unit.

There could be lots of objects B per object A. So when the object A gets destroyed, it needs to take care to clear the weak references in all the objects B. Which is kind of annoying. And it easily gets worse if the connection is not direct but multi-level, say A->B->C and then C having a back reference back to A.

However there is a way around it. Create a single object W that will keep a weak reference back to A. Then all the objects B can just keep a reference to W. And of course A would also keep a reference to W. When A is destroyed, it will reset the single weak reference in W.

A------------>W
\-->B1-->/
\-->B2-->/
\-->B3-->/

Then the logic for creating and resetting the object W can be placed into a base class and then reused all over the place. The object W could even hold the pieces of information that can be still useful after the object A goes away, such as the name of the object A for the error messages.

Tuesday, January 29, 2013

Table in C++

The Table is defined in table/Table.h. It inherits from Gadget, with the table's output label being the gadget's output label. Naturally, it's an Starget and usable from one thread only.

It's constructor is not public, and it's created from the TableType with its method makeTable():

Autoref<Table> t = tabType->makeTable(unit, Table::EM_CALL, "t");

The arguments are the unit where the table will belong, the enqueueing mode for its output label (this is a legacy argument and will go away soon), and the name of the table.

For the reference, that TableType method is:

Onceref<Table> makeTable(Unit *unit, Gadget::EnqMode emode, const string &name) const;

The table has a large number of methods, grouped into multiple subsets.

EnqMode getEnqMode() const;
const string &getName() const;
Unit *getUnit() const;
Label *getLabel() const;

These methods are inherited from the Gadget. The only special thing to remember is that getLabel() returns the table's output label. Technically, getName() has an overriding implementation in the Table, to return the table's name while its output label has the suffix ".out" appended to it.

const TableType *getType() const;
const RowType *getRowType() const;
const RowHandleType *getRhType() const;

Get back the type of the table, of its rows, and its row handles.

Label *getInputLabel() const;
Label *getPreLabel() const;
Label *getDumpLabel() const;
Label *getAggregatorLabel(const string &agname) const;

Get the assorted labels. The aggregator label getter takes the name of the aggregator (as was defined in the TableType) as an argument.

FnReturn *fnReturn() const;

Get the FnReturn of this table's outputs. It gets created and remembered on the first call, and the subsequent calls return the same object. It has a few labels with the fixed names: "out", "pre" and "dump", and a label for each aggregator with the aggregator's name. It could throw an Exception if you name an aggregator to conflict with one of the fixed labels, which you should not. The return's name will be "tableName.fret".

Next go the operations on the table (and of course the table may also be modified by sending rowops to its input label).
RowHandle *makeRowHandle(const Row *row) const;

Create a row handle for a row. Remember, the row handles are reference-counted, and also have the special kind of references with Rhref. So the returned pointer should be stored in an Rhref. The row handle created will not be inserted into the table yet.

bool insert(RowHandle *rh);

Insert a row handle into the table. This invokes all the row replacement policies along the way. If the handle is already in the table, does nothing and returns false. May also return false if a replacement policy refuses the row, but in practice there are no such refusing policies yet. Otherwise returns true.

It may throw an Exception. It may throw by itself if the row handle doesn't belong to this table or propagate the exception up: since the execution involves calling the output labels and such, an exception might be thrown from there.

bool insertRow(const Row *row);

The version that combines the row handle construction and insertion. Unlike Perl, in C++ this method is named differently instead of overloading. The comments about the replacement policies and return code, and about exceptions apply here too.

void remove(RowHandle *rh);

Remove a row handle from the table. If the handle is not in the table, silently does nothing. May throw an Exception.

bool deleteRow(const Row *row);

Find a matching row and delete it. Returns true if the row was found and removed, false if not found. May throw an Exception.

void clear(size_t limit = 0);

Clear the table by removing all the rows from it. The removed rows are sent as usual to the "pre" and "out" labels. If the limit is not 0, no more than that number of the rows will be removed. The rows are removed in the usual order of the first leaf index.

Next go the iteration methods. The rule of thumb is that for them a NULL row handle pointer means "end of iteration" or "not found" (or sometimes "bad arguments"). And they can handle the NULL row handles OK on the input, just returning NULL on the output.

RowHandle *begin() const;

Get the first row handle in the default order of the first leaf index. If the table is empty, returns NULL.

RowHandle *beginIdx(IndexType *ixt) const;

Get the first handle in the order of a particular index. The index type must belong to this table's type. For an incorrect index type it returns NULL (perhaps in the future this will be changed to an exception).

RowHandle *next(const RowHandle *cur) const;
RowHandle *nextIdx(IndexType *ixt, const RowHandle *cur) const;

Get the next row handle in the order of the default or specific index. Returns NULL after the last handle. It's safe to pass the current row handle as NULL, the result will be NULL, and also on any other error.

RowHandle *firstOfGroupIdx(IndexType *ixt, const RowHandle *cur) const;
RowHandle *lastOfGroupIdx(IndexType *ixt, const RowHandle *cur) const;

Get the first or last row handle in the same group as the current row according to a non-leaf index. The NULL current handle will cause NULL returned. See the details in the description of the Perl API.

RowHandle *nextGroupIdx(IndexType *ixt, const RowHandle *cur) const;

Get the first row handle of the next group. The return will be NULL if the current group was the last one, or if the current handle is NULL.

Next go the size operations:

size_t size() const;

Get the number of rows currently in the table.

size_t groupSizeIdx(IndexType *ixt, const RowHandle *what) const;

Get the size of the group where the handle belongs according to a non-leaf index. If any arguments are wrong, returns 0. The row handle doesn't have to be in the table. If it isn't in the table, the method will find the group where the row would belong if it were inserted and return its current size.

size_t groupSizeRowIdx(IndexType *ixt, const Row *what) const;

A convenience version that makes a row handle from a row, finds the group size and disposes of the handle.

Next go the finding methods:

RowHandle *find(const RowHandle *what) const;
RowHandle *findIdx(IndexType *ixt, const RowHandle *what) const;

Find the handle of a matching row according to the default (first leaf) or the specific index, or return NULL if not found.

RowHandle *findRow(const Row *what) const;
RowHandle *findRowIdx(IndexType *ixt, const Row *what) const;

The convenience versions that create a temporary row handle and then perform the search.

Next goes the dump API that sends the whole contents of the table to the "dump" label, thus making any labels connected to it perform an implicit iteration over the table.

void dumpAll(Rowop::Opcode op = Rowop::OP_INSERT) const;
void dumpAllIdx(IndexType *ixt, Rowop::Opcode op = Rowop::OP_INSERT) const;

The dump can go in the order of default or specific index. The opcode argument is used for the rowops sent on the dump label. Using the argument index type of NULL makes dumpAllIdx() use the default index and work just like DumpAll(). In the furute there probably will be methods that dump only a group of records.

As usual, the general logic of the methods matches the Perl API unless said otherwise. Please refer to the Perl API description for the details and examples.

Thursday, January 24, 2013

FnBinding in C++

FnBinding is defined in sched/FnBinding.h, and substantially matches the Perl version. It inherits from Starget, and can be used in only one thread.

Like many other classes, it has the constructor and the static make() function:

FnBinding(const string &name, FnReturn *fn);
static FnBinding *make(const string &name, FnReturn *fn);

The binding is constructed on a specific FnReturn and gets (references) the RowSetType from it. The FnReturn must be initialized before it can be used to create the bindings. It can be used with any matching FnReturn, not just the one it was constructed with.

It's generally constructed in a chain fashion:

Autoref<FnBinding> bind = FnBinding::make(fn)
    ->addLabel("lb1", lb1, true)
    ->addLabel("lb2", lb2, false);

Each method in the chain returns the same FnBinding object. The method addLabel() adds one concrete label that gets connected to the FnReturn's label by name. The other chainable method is withTray() which switches the mode of collecting the resulting rowops in a tray rather than calling them immediately.

The errors encountered during the chained construction are remembered and can be read later with the method:

Erref getErrors() const;

You must check the bindings for errors before using it. A binding with errors may not be used.

Or you can use the checkOrThrow() wrapper from common/Initialize.h to automatically convert any detected errors to an Exception:

Autoref<FnBinding> bind = checkOrThrow(FnBinding::make(fn)
    ->addLabel("lb1", lb1, true)
    ->addLabel("lb2", lb2, false)
    ->withTray(true)
);

FnBinding *addLabel(const string &name, Autoref<Label> lb, bool autoclear);

Adds a label to the binding. The name must match a name from the FnReturn, and there may be only one label bound to a name (some names from the return may be left unbound). The label must have a type matching the named FnReturn's label. The autoclear flag enables the automatic clearing of the label (and also forgetting it in the Unit) when the binding gets destroyed. This allows to create and destroy the bindings dynamically as needed. So, basically, if you've created a label just for the binding, use autoclear==true. If you do a binding to a label that exists in the model by itself and can be used without the binding, use autoclear==false.

In principle, nothing stops you from adding more labels later (though you can't remove nor replace the labels that are already added). Just make sure that their types match the expected ones.

Not all the available names have to get the labels added. Some (or all) may be left without labels. Any rowops coming to the undefined ones will be simply ignored.

The labels in the FnBinding may belong to a different Unit than the FnReturn. This allows to use the FnReturn/FnBinding coupling to connect the units.

FnBinding *withTray(bool on);

Changes the tray collection mode, the true argument enables it, false disables. Can be done at any time, not just at construction. Disabling the tray mode discards the current tray. If the tray mode is enabled, whenever the binding is pushed onto a return and the rowops come into it, the labels in this binding won't be called immediately but they would adopt the incoming rowops, and the result will be queued into a tray, to be executed later.

Onceref<Tray> swapTray();

Used with the tray collection mode, normally after some rowops have been collected in the tray. Returns the current tray and replaces it in the binding with a new clean tray. You can call the returned tray afterwards. If the tray mode is not enabled, will return NULL, and won't create a new tray.

Tray *getTray() const;

Get the current tray. You can use and modify the tray contents in any usual way. If the tray mode is not enabled, will return NULL.

void callTray();

A convenience combination method that swaps the tray and calls it. This method is smart about the labels belonging to different units. Each rowop in the tray is called with its proper unit, that is found from the rowop's label. Mixing the labels of multiple units in one binding is probably still not such a great idea, but it works anyway.

const string &getName() const;

Get back the binding's name.

RowSetType *getType() const;

Get the type of the binding. It will be the same row set type object as created in the FnReturn that was used to construct this FnBinding.

int size() const;

Get the number of labels in the row set type (of all available labels, not just the ones that have been added).

const RowSetType::NameVec &getLabelNames() const;
const RowSetType::RowTypeVec &getRowTypes() const;
const string *getLabelName(int idx) const;
RowType *getRowType(const string &name) const;
RowType *getRowType(int idx) const;

The convenience wrappers that translate to the same methods in the RowSetType.

Label *getLabel(const string &name) const;
int findLabel(const string &name) const;
Label *getLabel(int idx) const;

Methods similar to FnReturn that allow to translate the names to indexes and get the labels by name or index. The same return values, the index -1 is returned for an unknown name, and a NULL label pointer is returned for an unknown name, incorrect index and a undefined label at a correct name or index.

typedef vector<Autoref<Label> > LabelVec;
const LabelVec &getLabels() const;

Return all the labels as a vector. This is an internal vector of the class, so only a const reference is returned. The elements for undefined labels will contain NULLs.

typedef vector<bool> BoolVec;
const BoolVec &getAutoclear() const;

Return the vector of the autoclear flags for the labels.

bool isAutoclear(const string &name) const;

Get the autoclear flag for a label by name. If the name is unknown, will quietly return false.

bool equals(const FnReturn *t) const;
bool match(const FnReturn *t) const;
bool equals(const FnBinding *t) const;
bool match(const FnBinding *t) const;

Similarly to the FnReturn, the convenience methods that compare the types between the FnReturns and FnBindings. They really translate to the same methods on the types of the returns or bindings.

Tuesday, January 1, 2013

Streaming function helper classes

A couple more of helper classes are defined in sched/FnReturn.h.

ScopeFnBind does a scoped pushing and popping of a binding on an FnReturn. Its only method is the constructor:

ScopeFnBind(Onceref<FnReturn> ret, Onceref<FnBinding> binding);

It's used as:

{
    ScopeFnBind autobind(ret, binding);
    ...
}

It will pop the binding at the end of the block. An unpleasant feature is that if the return stack get messed up, it will throw an Exception from a destructor, which is a big no-no in C++. However since normally in the C++ code the Triceps Exception is essentially an abort, this works good enough. If you make the Exception catchable, such as when calling the C++ code from an interpreter, you better make very sure that the stack can not get corrupted, or do not use ScopeFnBind.

AutoFnBind is a further extension of the scoped binding. It does three additional things: It allows to push multiple bindings on multiple returns as a group, popping them all on destruction. It's a reference-counted Starget object, which allows the scope to be more than one block. It also has a more controllable way of dealing with the exceptions. This last two properties allow to use it from the Perl code, making the scope of a Perl block, not C++ block, and to pass the exceptions properly back to Perl.

AutoFnBind();
AutoFnBind *make();

The constructor just creates an empty object which then gets filled with bindings.

AutoFnBind *add(Onceref<FnReturn> ret, Autoref<FnBinding> binding);

Add a binding, in a chainable fashion. The simple-minded of using the AutoFnBind is:

{
    Autoref<AutoFnBind> bind = AutoFnBind::make()
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
}

However if any of these add()s throw an Exception, this will leave an orphaned AutoFnBind object, since the throwing would happen before it has a chance to do the reference-counting. So the safer way to use it is:

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
}

Then the AutoFnBind will be reference-counted first, and if an add() throws later, this will cause a controlled destruction of the Autoref and of AutoFnBind.

But it's not the end of the story yet. The throws on destruction are still a possibility. To catch them, use an explicit clearing:

void clear();

Pops all the bindings. If any Exceptions get thrown, they can get caught nicely. It tries to be real smart, going through all the bindings in the backwards order and popping each one of them. If a pop() throws an exception, its information will be collected but clear() will then continue going through the whole list. At the end of the run it will make sure that it doesn't have any references to anything any more, and then will re-throw any collected errors as a single Exception. This cleans up the things as much as possible and as much as can be handled, but the end result will still not be particularly clean: the returns that got their stacks corrupted will still have their stacks corrupted, and some very serious application-level cleaning will be needed to continue. Probably a better choice would be to destroy everything and restart from scratch. But at least it allows to get safely to this point of restarting from scratch.

So, the full correct sequence will be:

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
bind->clear() ;
}

Or if any code in "..." can throw anything, then something like (not tested, so use with caution):

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
try {
    ...
    } catch (Triceps::Exception e) {
try {
      bind->clear() ;
} catch (Triceps::Exception ee) {
e->getErrors()->append("Unbinding errors triggered by the last error:", ee->getErrors());
}
throw;
} catch (exception e) {
      bind->clear() ;
throw;

    }
}

It tries to be nice if the exception thrown from "..." was a Triceps one, and add nicely any errors from the binding clearing to it.

Finally, a little about how the Perl AutoFnBind translates to the C++ AutoFnBind:

The Perl constructor creates the C++-level object and adds the bindings to it. If any of them throw, it destroys everything nicely and translates the Exception to Perl. Otherwise it saves a reference to the AutoFnBind in a wrapper object that gets returned to Perl.

The Perl destructor then first clears the AutoFnBind and catches if there is any Exception. However there is just no way to return a Perl exception from a Perl destructor, so it juts prints the error on stderr and calls exit(1). If no exception was thrown, the AutoFnBind gets destroyed nicely by removing the last reference.

For the nicer handling, there is a Perl-level method clear() that does the clearing and translates the exception to Perl.

FnReturn in C++

FnReturn, defined in sched/FnReturn.h, is generally constructed similarly to the RowSetType:

ret = initializeOrThrow(FnReturn::make(unit, name)
    ->addLabel("lb1", rt1)
    ->addFromLabel("lb2", lbX)
);

Or of course piece-meal. As it gets built, it actually builds a RowSetType inseide itself.

FnReturn(Unit *unit, const string &name);
static FnReturn *make(Unit *unit, const string &name);

The constructor and convenience wrapper. The unit will be remembered only as a pointer, not reference, to avoid the reference loops. However this pointer will be used to construct the internal labels. So until the FnReturn is fully initialized, you better make sure that the Unit object has a reference and doesn't get freed. FnReturn is an Starget, and must be used in only one thread.

const string &getName() const;
Unit *getUnitPtr() const;
const string &getUnitName() const;

Get back the information from the constructor. Just like for the label, it reminds you that the Unit is only available as a pointer, not reference, here. The FnReturn also have a concept of clearing: it has the special labels inside, and once any of these labels gets cleared, the FnReturn is also cleared by setting the unit pointer to NULL and forgetting the FnContext (more on that one later). So after the FnReturn is cleared, getUnitPtr() will return NULL. And again similar to the Label, there is a convenience function to get the unit name for informational printouts. When FnReturn is cleared, it returns the same "[fn return cleared]".

FnReturn *addFromLabel(const string &lname, Autoref<Label>from);

Add a label to the return by chaining it off another label. lname is the name within the return. The full name of the label will be return_name.label_name. The label names within a return must be unique and not empty, or it will be returned as an initialization error. The label type will be copied (actually, referenced) from the from label, and the new label will be automatically chained off it. The labels can be added only until the return is initialized, or it will throw an Exception.

FnReturn *addLabel(const string &lname, const_Autoref<RowType>rtype);

Add a new independen label to the return. Works very similar to addFromLabel, only uses the explicit row type and doesn't chain to anything. The label can be later found with getLabel() and either chained off something or used to send the rows to it explicitly. The labels can be added only until the return is initialized.

class FnContext: public Starget
{
public:
    virtual ~FnContext();
    virtual void onPush(const FnReturn *fret) = 0;
     virtual void onPop(const FnReturn *fret) = 0;
};
FnReturn *setContext(Onceref<FnContext> ctx);

Set the context with handlers for the pushing and popping of the bindings in the FnReturn. Triceps generally tries to follow the C++ tradition of using the virtual methods for the callbacks, with the user then subclassing the base class and replacing the callback methods. However subclassing FnReturn is extremely inconvenient, because it gets connected to the other objects in a quite complicated way. So the solution is to make a separate context class for the callbacks, and then connect it. By the way, FnContext is not a subclass of FnReturn but a separate top-level class. I.e. NOT Triceps::FnReturn::FnContext but Triceps::FnContext. The callbacks will be called just before the binding is pushed or popped, but after the check for the correctness of the push or pop. They can be used to adjust the state of the streaming function by pushing or popping its stack of local variables, like was shown in the Perl examples.The context can be set only until the return is initialized.

template<class C> C *contextIn() const;

Get back the context. Since the context will be a subclass of FnContext, this also handles the correct type casting. Use it like:

Autoref<MyFnCtx> ctx = fret1->contextIn<MyFnCtx>();

The type is converted using the static_cast, and you need to know the correct type in advance, or your program will break in some horrible ways. If the context has not been set, it will return a NULL.

void initialize();

Initialize the FnReturn. Very similar to the Type classes, it will collect the errors in an Errors object that has to be checked afterwards, and an FnReturn with errors must not be used. The initialization can be called repeatedly with no ill effects. After initialization the structure of the return (labels and context) can not be changed any more.

Erref getErrors() const;

Get the errors detected. Normally called after initialization but can also be called at any stage, as the errors are collected all the way through the object construction.

bool isInitialized() const;

Check whether the return is initialized.

RowSetType *getType() const;

Get the type of the return, which gets built internally by the return. The names of the row types in the set will be the same as the names of labels in the return, and their order will also be the same. This call can be made only after initialization, or it will throw an Exception.

int size() const;

Get the number of labels in the return. Can be called at any time.

const RowSetType::NameVec &getLabelNames() const;
const RowSetType::RowTypeVec &getRowTypes() const;
const string *getLabelName(int idx) const;
RowType *getRowType(const string &name) const;
RowType *getRowType(int idx) const;

Get the piecemeal information about the label names and types. These are really the convenience wrappers around the RowSetType. Note that they return pointers to be able to return NULL on the argument that is out of range. A somewhat special feature is that even though the row set type can be read only after initialization (after it becomes frozen and can not be messed with any more), these wrappers work at any time, even when the return is being built.

bool equals(const FnReturn *t) const;
bool match(const FnReturn *t) const;
bool equals(const FnBinding *t) const;
bool match(const FnBinding *t) const;

Convenience wrappers that compare the equality or match of the underlying row set types.

Label *getLabel(const string &name) const;
int findLabel(const string &name) const;
Label *getLabel(int idx) const;

Get the label by name or index, or the index of the label by name. Return a NULL pointer or -1 index on an invalid argument.

typedef vector<Autoref<RetLabel> > ReturnVec;
const ReturnVec &getLabels() const;

Get the whole set of labels. FnReturn::RetLabel is a special private label type with undisclosed internals. You need to treat these labels as being a plain Label.

void push(Onceref<FnBinding> bind);

Push a binding on the return stack. The return must be initialized, and the binding must be of a matching type, or an Exception will be thrown. The reference to the binding will be kept until it's popped.

void pushUnchecked(Onceref<FnBinding> bind);

Similar to push(), only the type of the binding is not checked. This is an optimization for the automatically generated code that does all the type checks up front at the generation time. The manually written code probably should not be using it.

void pop(Onceref<FnBinding> bind);

Pop a binding from the return stack. The binding argument specifies, which binding is expected to be popped. It's not strictly necessary but allows to catch any mess-ups with the return stack early. If the stack is empty or the top binding is not the same as the argument, throws an Exception.

void pop();

The unchecked version. It still checks and throws if the stack is empty. This method may come handy occasionally, but in general the checked version should be preferred. Pretty much the only reason to use it would be if you try to restore after a major error and want to pop everything from all your FnReturns untill their stacks become empty. But there is much trouble with this kind of restoration.

int bindingStackSize() const;

Get the size of the return stack (AKA the stack of bindings). Useful for debugging.

typedef vector<Autoref<FnBinding> > BindingVec;
const BindingVec &bindingStack() const;

Get the current return stack. Useful for debugging.

More about Label construction

I forgot to tell it explicitly in the post on the Labels in C++, but the label construction in C++ works just the same as in Perl: it makes the Unit remember the label right away. So there is no need to call Unit::rememberLabel() manually. On the other hand, if you do not want the label to be remembered by its unit (why?), the only way to achieve that is to call Unit::forgetLabel() after its construction.

Sergey Babkin on CEP and stuff