RowSetType, defined in types/RowSetType.h, is another item that is not visible in Perl. Maybe it will be in the future but at the moment things look good enough without it. It has been added for 1.1.0 and expresses the type ("return type" if you want to be precise) of a streaming function (FnReturn and FnBinding classes). Naturally, it's a sequence of the row types, and despite the word "set", the order matters.
A RowSetType is one of these objects that gets assembled from many parts and then initialized, like this:
Autoref<RowSetType> rst = initializeOr Throw(RowSetType::make()
->addRow("name1", rt1)
->addRow("name2", rt2)
);
The function, or actually template, initializeOrThrow() itself is also a new addition, that I'll describe in detail later.
Of course, nothing stops you from adding the row types one by one, in a loop or in some other way, and then calling initialize() manually. And yes, of course you can keep a reference to a row set type as soon as it has been constructed, not waiting for initialization. You could do instead:
Autoref<RowSetType> rst = new RowSetType();
rst->addRow("name1", rt1);
rst->addRow("name2", rt2);
rst->initialize();
if (rst->getErrors()->hasError()) {
...
}
You could use the initializeOrThrow() template here as well, just I also wanted to show the way for the manual handling of the errors. And you can use the new or make() interchangeably too.
All that the initialization does is fixate the row set, forbid the addition of the further row types to it. Which kind of makes sense at the moment but I'm not so sure about the future, in the future the dynamically expandable row sets might come useful. We'll see when we get there.
RowSetType();
static RowSetType *make();
Construct a row set type. The method make() is just a wrapper around the constructor that is more convenient to use with the following ->addRow(), because of the way the operator priorities work in C++. Like any other type, RowSetType is unnamed by itself, and takes no constructor arguments. Like any other type, RowSetType is an Mtarget and can be shared between multiple threads after it has been initialized.
RowSetType *addRow(const string &rname, const_Autoref<RowType>rtype);
Add a row type to the set. All the row types are named, and all the names must be unique within the set. The order of the addition matters too. See the further explanation of why it does in the description of the FnReturn. If this method detects an error (such as duplicate names), it will append the error to the internal Errors object, that can be read later by getErrors(). A type with errors must not be used.
The row types may not be added after the row set type has been initialized.
void initialize();
Initialize the type. Any detected errors can be read afterwards with getErrors(). The repeated calls of initialize() are ignored.
bool isInitialized() const;
Check whether the type has been initialized.
typedef vector<string> NameVec;
const NameVec &getRowNames() const;
typedef vector<Autoref<RowType> > RowTypeVec;
const RowTypeVec &getRowTypes() const;
Read back the contents of the type. The elements will go in the order they were added.
int size() const;
Read the number of row types in the set.
int findName(const string &name) const;
Translate the row type name to index (i.e. the order in which it was added, starting from 0).Returns -1 on an invalid name.
RowType *getRowType(const string &name) const;
Find the type by name. Returns NULL on an invalid name.
const string *getRowTypeName(int idx) const;
RowType *getRowType(int idx) const;
Read the data by index. These methods check that the index is in the valid range, and otherwise return NULL.
The usual methods inherited from Type also work: getErrors(), equals(), match(), printTo().
The row set types are considered equal if they contain the equal row types with equal names going in the same order. They are considered matching if they contain matching row types going in the same order, with any names. If the match condition seems surprising to you, think of it as "nothing will break if one type is substituted for another at execution time".
void addError(const string &msg);
Erref appendErrors();
The ways to add extra errors to the type's errors. It's for convenience of the users of this type, the thinking being that since we already have one Errors object, can as well use it for everything, and also keep all the errors reported in the order of the fields, rather than first all the errors from the type then all the errors from its user. The FnReturn and FnBinding use it.
This started as my thoughts on the field of Complex Event Processing, mostly about my OpenSource project Triceps. But now it's about all kinds of software-related things.
Sunday, December 30, 2012
FrameMark in C++
The FrameMark (defined in sched/FrameMark.h) marks the unit's frame at the start of the loop, to fork there the rowops for the next iterations of the loop. It's pretty simple:
FrameMark(const string &name);
The constructor that gives the mark a name. A FrameMark is an Starget, so it's reference-counted and may be used only in one thread.
const string &getName() const;
Read back the name.
Unit *getUnit() const;
This method is different from getUnit() on most of the other classes. It returns the pointer to the unit, on which it has been set. A freshly created FrameMark would return NULL. Internally a FrameMark doesn't keep a reference to the unit, it's just a pointer, and a way for the Unit to check in loopAt() that the mark has been indeed set on this unit. And you can use it for the entertainment purposes too. Normally when the frame marked with this mark gets popped from the Unit's stack, the mark becomes unset, and its getUnit() will return NULL.
All the actions on the FrameMark are done by passing it to the appropriate methods of the Unit. When a mark is set on a frame, the frame has a reference to it, so the mark won't be destroyed until the frame is freed.
FrameMark(const string &name);
The constructor that gives the mark a name. A FrameMark is an Starget, so it's reference-counted and may be used only in one thread.
const string &getName() const;
Read back the name.
Unit *getUnit() const;
This method is different from getUnit() on most of the other classes. It returns the pointer to the unit, on which it has been set. A freshly created FrameMark would return NULL. Internally a FrameMark doesn't keep a reference to the unit, it's just a pointer, and a way for the Unit to check in loopAt() that the mark has been indeed set on this unit. And you can use it for the entertainment purposes too. Normally when the frame marked with this mark gets popped from the Unit's stack, the mark becomes unset, and its getUnit() will return NULL.
All the actions on the FrameMark are done by passing it to the appropriate methods of the Unit. When a mark is set on a frame, the frame has a reference to it, so the mark won't be destroyed until the frame is freed.
Saturday, December 29, 2012
AggregatorGadget
AggregatorGadget is a fairly internal class, but I'll describe it as well while at it. Each aggregator in a table has its own gadget, and that's what it is. It carries some extra information.
The grand plan was that the different aggregator types may define their own subclasses of AggregatorGadget but in reality there appears no need to. So far all the aggregators happily live with the base AggregatorGadget.
AggregatorGadget(const AggregatorType *type, Table *table, IndexType *intype);
The type of the aggregator and the index type on which this particular aggregator is defined will be kept as references in the AggregatorGadget. The table will be remembered as a simple pointer (as usual, to avoid the circulare references, since the Table references all its AggregatorGadgets).
Table *getTable() const;
const AggregatorType* getType() const;
Get back the information. By now I'm not sure, why there is no method to get back the index type. Looks like nothing needs it, so the index type reference from the gadget is fully superfluous. The potential subclasses may read it from the field indexType_.
The normal way to use the AggregatorGadget is to call its method sendDelayed(). And it's called by other classes, not by its subclasses, so it's exported as publuc. On the other hand, the method send() must never be used with the AggregatorGadget, so it's made private (yes, I know that if you really want, you can use the superclass method, but just don't, the idea here is to guard against the accidental misuse, not against the malicious one).
The grand plan was that the different aggregator types may define their own subclasses of AggregatorGadget but in reality there appears no need to. So far all the aggregators happily live with the base AggregatorGadget.
AggregatorGadget(const AggregatorType *type, Table *table, IndexType *intype);
The type of the aggregator and the index type on which this particular aggregator is defined will be kept as references in the AggregatorGadget. The table will be remembered as a simple pointer (as usual, to avoid the circulare references, since the Table references all its AggregatorGadgets).
Table *getTable() const;
const AggregatorType* getType() const;
Get back the information. By now I'm not sure, why there is no method to get back the index type. Looks like nothing needs it, so the index type reference from the gadget is fully superfluous. The potential subclasses may read it from the field indexType_.
The normal way to use the AggregatorGadget is to call its method sendDelayed(). And it's called by other classes, not by its subclasses, so it's exported as publuc. On the other hand, the method send() must never be used with the AggregatorGadget, so it's made private (yes, I know that if you really want, you can use the superclass method, but just don't, the idea here is to guard against the accidental misuse, not against the malicious one).
Gadget in C++
The Gadget is unique to the C++ API, it has no parallels in Perl. Gadget is a base class defined in sched/Gadget.h, its object being a something with an output label. And the details of what this something is, are determined by the subclass. Presumably, it also has some kind of inputs but it's up to the subclass. The Gadget itself defines only the output label. To make a concrete example, a table is a gadget, and every aggregator in the table is also a gadget. However the "pre" and "dump" labels of the table is not a gadget, it's just an extra label strapped on the side.
Some of the reasons for the Gadget creation are purely historic by now. At some point it seemed important to have the ability to associate a particular enqueueing mode with each output label. Most tables might be using EM_CALL but some, ones in a loop, would use EM_FORK, and those that don't need to produce the streaming output would use EM_IGNORE. This approach didn't work out as well as it seemed at first, and now is outright deprecated: just use EM_CALL everywhere, and there are the newer and better ways to handle the loops. The whole Gadget thing should be redesigned at some point but for now I'll just describe it as it is.
As the result of that history, the enqueueing mode constants are defined in the Gadget class, enum EnqMode: EM_SCHEDULE, EM_FORK, EM_CALL, EM_IGNORE.
static const char *emString(int enval, const char *def = "???");
static int stringEm(const char *str);
Convert from the enqueueing mode constant to string, and back.
Gadget(Unit *unit, EnqMode mode, const string &name = "", const_Onceref<RowType> rt = (const RowType*)NULL);
The Gadget constructor is protected, since Gadget is intended to be used only as a base class, and never instantiated directly. The name and row type can be left undefined if they aren't known yet and initialized later. The output label won't be created until the row type is known, and you better also set the name by that time too. The enqueueing mode may also be changed later, so initially it can be set to anything. All this is intended only to split the initialization in a more convenient way, once the Gadget components are set, they must not be changed any more.
The output label of the Gadget is a DummyLabel, and it shares the name with the Gadget. So if you want to differentiate that label with a suffix in the name, you have to give the suffixed name to the whole Gadget. For example, the Table constructor does:
Gadget(unit, emode, name + ".out", rowt),
A Gadget keeps a reference to both its output label and its unit. This means that the unit won't disappears from under a Gadget, but to avoid the circular references, the Unit must not have references to the Gadgets (having references to their output labels is fine).
void setEnqMode(EnqMode mode);void setName(const string &name);
void setRowType(const_Onceref<RowType> rt);
The protected methods to finish the initialization. Once the values are set, they must not be changed any more. Calling setRowType() creates the output label, and since the name of the output label is taken from the Gadget, you need to set the name before you set the row type.
EnqMode getEnqMode() const;
const string &getName() const;
Unit *getUnit() const;
Label *getLabel() const;
Get back the gadget's information. The label will be returned only after it's initialized (i.e. the row type is known), before then getLabel() would return NULL. And yes, it's getLabel(), NOT getOutputLabel().
The rest of the methods are for convenience of sending the rows to the output label. They are protected, since they are intended for the Gadget subclasses (which in turn may decide to make them pubclic).
void send(const Row *row, Rowop::Opcode opcode) const;
Construct a Rowop from the given row and opcode, and enqueue it to the output label according to the gadget's enqueueing method. This is the most typical use.
void sendDelayed(Tray *dest, const Row *row, Rowop::Opcode opcode) const;
Create a Rowop and put it into the dest tray. The rowop will have the enqueueing mode populated according to the Gadget's setting. This method is used when the whole set of the rowops needs to be generated before any of them can be enqueued, such as when a Table computes its aggregators. After the delayed tray is fully generated, it can be enqueued with Unit::enqueueDelayedTray(), which will consult each rowop's enqueueing method and process it accordingly. Again, this stuff exists for the historic reasons, and will likely be removed somewhere soon.
Some of the reasons for the Gadget creation are purely historic by now. At some point it seemed important to have the ability to associate a particular enqueueing mode with each output label. Most tables might be using EM_CALL but some, ones in a loop, would use EM_FORK, and those that don't need to produce the streaming output would use EM_IGNORE. This approach didn't work out as well as it seemed at first, and now is outright deprecated: just use EM_CALL everywhere, and there are the newer and better ways to handle the loops. The whole Gadget thing should be redesigned at some point but for now I'll just describe it as it is.
As the result of that history, the enqueueing mode constants are defined in the Gadget class, enum EnqMode: EM_SCHEDULE, EM_FORK, EM_CALL, EM_IGNORE.
static const char *emString(int enval, const char *def = "???");
static int stringEm(const char *str);
Convert from the enqueueing mode constant to string, and back.
Gadget(Unit *unit, EnqMode mode, const string &name = "", const_Onceref<RowType> rt = (const RowType*)NULL);
The Gadget constructor is protected, since Gadget is intended to be used only as a base class, and never instantiated directly. The name and row type can be left undefined if they aren't known yet and initialized later. The output label won't be created until the row type is known, and you better also set the name by that time too. The enqueueing mode may also be changed later, so initially it can be set to anything. All this is intended only to split the initialization in a more convenient way, once the Gadget components are set, they must not be changed any more.
The output label of the Gadget is a DummyLabel, and it shares the name with the Gadget. So if you want to differentiate that label with a suffix in the name, you have to give the suffixed name to the whole Gadget. For example, the Table constructor does:
Gadget(unit, emode, name + ".out", rowt),
A Gadget keeps a reference to both its output label and its unit. This means that the unit won't disappears from under a Gadget, but to avoid the circular references, the Unit must not have references to the Gadgets (having references to their output labels is fine).
void setEnqMode(EnqMode mode);void setName(const string &name);
void setRowType(const_Onceref<RowType> rt);
The protected methods to finish the initialization. Once the values are set, they must not be changed any more. Calling setRowType() creates the output label, and since the name of the output label is taken from the Gadget, you need to set the name before you set the row type.
EnqMode getEnqMode() const;
const string &getName() const;
Unit *getUnit() const;
Label *getLabel() const;
Get back the gadget's information. The label will be returned only after it's initialized (i.e. the row type is known), before then getLabel() would return NULL. And yes, it's getLabel(), NOT getOutputLabel().
The rest of the methods are for convenience of sending the rows to the output label. They are protected, since they are intended for the Gadget subclasses (which in turn may decide to make them pubclic).
void send(const Row *row, Rowop::Opcode opcode) const;
Construct a Rowop from the given row and opcode, and enqueue it to the output label according to the gadget's enqueueing method. This is the most typical use.
void sendDelayed(Tray *dest, const Row *row, Rowop::Opcode opcode) const;
Create a Rowop and put it into the dest tray. The rowop will have the enqueueing mode populated according to the Gadget's setting. This method is used when the whole set of the rowops needs to be generated before any of them can be enqueued, such as when a Table computes its aggregators. After the delayed tray is fully generated, it can be enqueued with Unit::enqueueDelayedTray(), which will consult each rowop's enqueueing method and process it accordingly. Again, this stuff exists for the historic reasons, and will likely be removed somewhere soon.
Thursday, December 27, 2012
Label in C++
In C++ the custom labels are defined by defining your own class that inherits from Label (in sched/Label.h). The subclass needs to define its own execution method:
virtual void execute(Rowop *arg) const;
The base class takes care of all the general execution mechanics, chaining etc. All you need to do in this method is perform your user-defined actions. By the way, this method is protected and should never be called directly. The labels must always be called through a unit, which will then execute them in the correct way.
It may (though doesn't have to) also define the custom clearing method:
virtual void clearSubclass();
Currently this method is called by clear() after the label is marked as cleared but before clearing of the chain, though this order may change in the future.
Now, the rest of the methods:
Label(Unit *unit, const_Onceref<RowType> rtype, const string &name);
The base class constructor. It's always constructed from a subclass, you can not instantiate the base Label class because it contains an abstract execute() method. The name argument used to be optional (and if you really want, you still may use an empty string as an explicit argument) but the unnamed labels are very difficult to make sense of later.
The constructed label keeps a reference to its row type, and a pointer (not reference, to avoid the circular references!) to the Unit.
The information from the constructor can be read back:
const string &getName() const;
const RowType *getType() const;
Unit *getUnitPtr() const;
The method getUnitPtr() is named this way and not getUnit() to emphasize that the Label has only a pointer to the Unit, not a reference. After the label gets cleared, getUnitPtr() will return NULL.The reason is that after that the label doesn't know any more whether the unit still exists or has been deleted, and doesn't want to return a pointer to a potentially freed memory.
const string &getUnitName() const;
A convenience method for the special case of getting the label's unit name. It's used in many error message. You can't just say label->getUnitPtr()->getName() because getUnitPtr() might return a NULL. getUnitName() takes care of it and returns a special string "[label cleared]" if the label has been cleared.
void clear();
Clears the label. After that the label stops working. Note that clearing a label doesn't dissociate it from its unit. Well, the label won't tell you its unit any more but the unit will still have a reference to the label! Use the unit's method forgetLabel() to dissociate it (but that won't clear the label by itself, so you have to call both unit->forgetLabel() and label->clear()). Of course, if you call unit->clearLabels(), that would take care of everything.
Clearing cleans the chaining list of this label but doesn't call recursively clear() on the formerly chained labels. If you need that, you have to do it yourself.
bool isCleared() const;
Check if the label is cleared.
void setNonReentrant();
bool isNonReentrant() const;
Mark the label as non-reentrant, and check this flag. There is no way to unset this flag. The meaning of it has been described at length before.
Erref chain(Onceref<Label> lab);
Chain another label to this one (so when this label is executed, the chained labels will also be executed in order). This label will keep a reference of the chained label. The circular chainings are forbidden and will throw an Exception.
typedef vector<Autoref<Label> > ChainedVec;
const ChainedVec &getChain() const;
Get back the information about the chained labels. This returns a reference to the internal vector, so if the chainings are changed afterwards, the changes will be visible in the vector.
bool hasChained() const;
A quick check, whether there is anything chained.
void clearChained();
Clear the chaining list of this label. (But doesn't call clear() on these labels!)
Rowop *adopt(Rowop *from) const;
A convenient factory method for adopting the rowops. Treat it as a constructor: the returned Rowop will be newly constructed and have the reference count of 0; the pointed must be stored in an Autoref (or Onceref). This method by itself doesn't check whether the original Rowop has a matching type, it simply makes a copy with the label reference replaced. It's up to you to make sure that the labels are correct.
A special subclass of the Label is DummyLabel: it's a label that does nothing. It's execute() method is empty. It's constructed very similarly to the Label:
DummyLabel(Unit *unit, const_Onceref<RowType> rtype, const string &name);
The dummy labels are convenient for chaining the other labels to them.
virtual void execute(Rowop *arg) const;
The base class takes care of all the general execution mechanics, chaining etc. All you need to do in this method is perform your user-defined actions. By the way, this method is protected and should never be called directly. The labels must always be called through a unit, which will then execute them in the correct way.
It may (though doesn't have to) also define the custom clearing method:
virtual void clearSubclass();
Currently this method is called by clear() after the label is marked as cleared but before clearing of the chain, though this order may change in the future.
Now, the rest of the methods:
Label(Unit *unit, const_Onceref<RowType> rtype, const string &name);
The base class constructor. It's always constructed from a subclass, you can not instantiate the base Label class because it contains an abstract execute() method. The name argument used to be optional (and if you really want, you still may use an empty string as an explicit argument) but the unnamed labels are very difficult to make sense of later.
The constructed label keeps a reference to its row type, and a pointer (not reference, to avoid the circular references!) to the Unit.
The information from the constructor can be read back:
const string &getName() const;
const RowType *getType() const;
Unit *getUnitPtr() const;
The method getUnitPtr() is named this way and not getUnit() to emphasize that the Label has only a pointer to the Unit, not a reference. After the label gets cleared, getUnitPtr() will return NULL.The reason is that after that the label doesn't know any more whether the unit still exists or has been deleted, and doesn't want to return a pointer to a potentially freed memory.
const string &getUnitName() const;
A convenience method for the special case of getting the label's unit name. It's used in many error message. You can't just say label->getUnitPtr()->getName() because getUnitPtr() might return a NULL. getUnitName() takes care of it and returns a special string "[label cleared]" if the label has been cleared.
void clear();
Clears the label. After that the label stops working. Note that clearing a label doesn't dissociate it from its unit. Well, the label won't tell you its unit any more but the unit will still have a reference to the label! Use the unit's method forgetLabel() to dissociate it (but that won't clear the label by itself, so you have to call both unit->forgetLabel() and label->clear()). Of course, if you call unit->clearLabels(), that would take care of everything.
Clearing cleans the chaining list of this label but doesn't call recursively clear() on the formerly chained labels. If you need that, you have to do it yourself.
bool isCleared() const;
Check if the label is cleared.
void setNonReentrant();
bool isNonReentrant() const;
Mark the label as non-reentrant, and check this flag. There is no way to unset this flag. The meaning of it has been described at length before.
Erref chain(Onceref<Label> lab);
Chain another label to this one (so when this label is executed, the chained labels will also be executed in order). This label will keep a reference of the chained label. The circular chainings are forbidden and will throw an Exception.
typedef vector<Autoref<Label> > ChainedVec;
const ChainedVec &getChain() const;
Get back the information about the chained labels. This returns a reference to the internal vector, so if the chainings are changed afterwards, the changes will be visible in the vector.
bool hasChained() const;
A quick check, whether there is anything chained.
void clearChained();
Clear the chaining list of this label. (But doesn't call clear() on these labels!)
Rowop *adopt(Rowop *from) const;
A convenient factory method for adopting the rowops. Treat it as a constructor: the returned Rowop will be newly constructed and have the reference count of 0; the pointed must be stored in an Autoref (or Onceref). This method by itself doesn't check whether the original Rowop has a matching type, it simply makes a copy with the label reference replaced. It's up to you to make sure that the labels are correct.
A special subclass of the Label is DummyLabel: it's a label that does nothing. It's execute() method is empty. It's constructed very similarly to the Label:
DummyLabel(Unit *unit, const_Onceref<RowType> rtype, const string &name);
The dummy labels are convenient for chaining the other labels to them.
Wednesday, December 26, 2012
SourceForge flux completed
The SourceForge conversion has completed, and I've updated the source code repository links on the web page. I'm not exactly sure, why did they require this conversion. Okay, the svn+ssh access method to SVN is slightly more convenient, but the SVN browser seems to have become worse, and the other project functionality seems to have become slightly worse too.
SourceForge flux
SourceForge has been insisting on the conversion of the project to their new engine, and I've finally given in. This means that the SVN repository location has changed, and the links there don't work any more. I'll update them shortly. And if you've checked out the code from SVN, you'd need to re-do it from the new location.
Tuesday, December 25, 2012
Tray in C++
A Tray in C++, defined in sched/Tray.h, is simply a deque of Rowop references, plus an Starget, so that it can be referenced itself:
class Tray : public Starget, public deque< Autoref<Rowop> >
All it really defines is the constructors:
Tray();
Tray(const Tray &orig);
The operations on the Tray are just the usual deque operations.
Yes, you can copy the trays by constructing a new one from an old one:
Autoref<Tray> t1 = new Tray;
t1->push_back(op1);
Autoref<Tray> t3 = new Tray(*t1);
Afterwards t3 will contain references to the same rowops as t1 (but will be a different Tray than t1!).
The assignments (operator=) happen to just work out of the box because the operator= implementation in Starget does the smart thing and avoids the corruption of the reference counter. So you can do things like
*t3 = *t1;
It's worth noticing once more that unlike Rows and Rowops, the Trays are mutable. If you have multiple references to the same Tray, modifying the Tray will make all the references see its new contents!
An important difference from the Perl API is that in C++ the Tray is not associated with a Unit. It's constructed simply by calling its constructor, and there is no Unit involved. It's possible to create a tray that contains a mix of rowops for different units. If you combine the C++ and Perl code, and then create such mixes in the C++ part, the Perl part of your code won't be happy.
And there is actually a way to create the mixed-unit trays even in the Perl code, in the tray of FnBinding. But this situation would be caught when trying to get the tray into the Perl level, and the workaround is to use the method FnBinding:callTray().
The reason why Perl associates the trays with a unit is to make the check of enqueueing a tray easy: just check that the tray belongs to the right unit, and it's all guaranteed to be right. At the C++ level no such checks are made. If you enqueue the rowops on labels belonging to a wrong unit, they will be enqueued quietly, will attempt to execute, and from there everything will likely to go wrong. So be disciplined. And maybe I'll think of a better way for keeping the unit consistency in the future.
class Tray : public Starget, public deque< Autoref<Rowop> >
All it really defines is the constructors:
Tray();
Tray(const Tray &orig);
The operations on the Tray are just the usual deque operations.
Yes, you can copy the trays by constructing a new one from an old one:
Autoref<Tray> t1 = new Tray;
t1->push_back(op1);
Autoref<Tray> t3 = new Tray(*t1);
Afterwards t3 will contain references to the same rowops as t1 (but will be a different Tray than t1!).
The assignments (operator=) happen to just work out of the box because the operator= implementation in Starget does the smart thing and avoids the corruption of the reference counter. So you can do things like
*t3 = *t1;
It's worth noticing once more that unlike Rows and Rowops, the Trays are mutable. If you have multiple references to the same Tray, modifying the Tray will make all the references see its new contents!
An important difference from the Perl API is that in C++ the Tray is not associated with a Unit. It's constructed simply by calling its constructor, and there is no Unit involved. It's possible to create a tray that contains a mix of rowops for different units. If you combine the C++ and Perl code, and then create such mixes in the C++ part, the Perl part of your code won't be happy.
And there is actually a way to create the mixed-unit trays even in the Perl code, in the tray of FnBinding. But this situation would be caught when trying to get the tray into the Perl level, and the workaround is to use the method FnBinding:callTray().
The reason why Perl associates the trays with a unit is to make the check of enqueueing a tray easy: just check that the tray belongs to the right unit, and it's all guaranteed to be right. At the C++ level no such checks are made. If you enqueue the rowops on labels belonging to a wrong unit, they will be enqueued quietly, will attempt to execute, and from there everything will likely to go wrong. So be disciplined. And maybe I'll think of a better way for keeping the unit consistency in the future.
Monday, December 24, 2012
Rowop in C++
I've jumped right into the Unit without showing the objects it operates on. Now let's start catching up and look at the Rowops. The Rowop class is defined in sched/Rowop.h.
The Rowop in C++ consists of all the same parts as in Perl API: a label, a row, and opcode.
It has one more item that's not really visible in the Perl API, the enqueueing mode, but it's semi-hidden in the C++ API as well. The only place where it's used is in Unit::enqueueDelayedTray(). This basically allows to build a tray of rowops, each with its own enqueueing mode, and then enqueue all of them appropriately in one go. This is actually kind of historic and caused by the explicit enqueueing mode specification for the Table labels. It's getting obsolete and will be removed somewhere soon.
The Rowop class inherits from Starget, usable in one thread only. Since it refers to the Labels, that are by definition single-threaded, this makes sense. A consequence is that you can't simply pass the Rowops between the threads. The passing-between-threads requires a separate representation that doesn't refer to the Labels but instead uses something like a numeric index (and of course the Mtarget base class). This is a part of the ongoing work on multithreading, but you can also make your own.
The opcodes are defined in the union Rowop::Opcode, so you normally use them as Rowop::OP_INSERT etc. As described before, the opcodes actually contain a bitmap of individual flags, defined in the union Rowop::OpcodeFlags: Rowop::OCF_INSERT and Rowop::OCF_DELETE. You don't really need to use these flags directly unless you really, really want to.
Besides the 3 already described opcodes (OP_NOP, OP_INSERT and OP_DELETE) there is another one, OP_BAD. It's a special value returned by the string-to-opcode conversion method instead of the -1 returne dby the other similar method. The reason is that OP_BAD is specially formatted to be understood by all the normal opcode type checks as NOP, while -1 would be seen as a combination of INSERT and DELETE. So if you miss checking the result of conversion on a bad string, at least you would get a NOP and not some mysterious operation. The reason why OP_BAD is not exporeted to Perl is that in Perl an undef is used as the indication of the invalid value, and works even better.
There is a pretty wide variety of Rowop constructors:
Rowop(const Label *label, Opcode op, const Row *row);
Rowop(const Label *label, Opcode op, const Rowref &row);
Rowop(const Label *label, Opcode op, const Row *row, int enqMode);
Rowop(const Label *label, Opcode op, const Rowref &row, int enqMode);
Rowop(const Rowop &orig);
Rowop(const Label *label, const Rowop *orig);
The constructors with the explicit enqMode are best not be used outside of the Triceps internals, and will eventually be obsoleted. The last two are the copy constructor, and the adoption constructor which underlies Label::adopt() and can as well be used directly.
Once a rowop is constructed, its components can not be changed any more, only read.
Opcode getOpcode() const;
const Label *getLabel() const;
const Row *getRow() const;
int getEnqMode() const;
Read back the components of the Rowop. Again, the getEnqMode() is on the way to obsolescence. And if you need to check the opcode for being an insert or delete, the better way is to use the explicit test methods, rather than getting the opcode and comparing it for equality:
bool isInsert() const;
bool isDelete() const;
bool isNop() const;
Check whether the opcode requests an insert or delete (or neither).
The same checks are available as static methods that can be used on the opcode values:
static bool isInsert(int op);
static bool isDelete(int op);
static bool isNop(int op);
And the final part is the conversion between the strings and values for the Opcode and OpcodeFlags enums:
static const char *opcodeString(int code);
static int stringOpcode(const char *op);
static const char *ocfString(int flag, const char *def = "???");
static int stringOcf(const char *flag);
As mentioned above, stringOpcode() returns OP_BAD for the unknown strings, not -1.
The Rowop in C++ consists of all the same parts as in Perl API: a label, a row, and opcode.
It has one more item that's not really visible in the Perl API, the enqueueing mode, but it's semi-hidden in the C++ API as well. The only place where it's used is in Unit::enqueueDelayedTray(). This basically allows to build a tray of rowops, each with its own enqueueing mode, and then enqueue all of them appropriately in one go. This is actually kind of historic and caused by the explicit enqueueing mode specification for the Table labels. It's getting obsolete and will be removed somewhere soon.
The Rowop class inherits from Starget, usable in one thread only. Since it refers to the Labels, that are by definition single-threaded, this makes sense. A consequence is that you can't simply pass the Rowops between the threads. The passing-between-threads requires a separate representation that doesn't refer to the Labels but instead uses something like a numeric index (and of course the Mtarget base class). This is a part of the ongoing work on multithreading, but you can also make your own.
The opcodes are defined in the union Rowop::Opcode, so you normally use them as Rowop::OP_INSERT etc. As described before, the opcodes actually contain a bitmap of individual flags, defined in the union Rowop::OpcodeFlags: Rowop::OCF_INSERT and Rowop::OCF_DELETE. You don't really need to use these flags directly unless you really, really want to.
Besides the 3 already described opcodes (OP_NOP, OP_INSERT and OP_DELETE) there is another one, OP_BAD. It's a special value returned by the string-to-opcode conversion method instead of the -1 returne dby the other similar method. The reason is that OP_BAD is specially formatted to be understood by all the normal opcode type checks as NOP, while -1 would be seen as a combination of INSERT and DELETE. So if you miss checking the result of conversion on a bad string, at least you would get a NOP and not some mysterious operation. The reason why OP_BAD is not exporeted to Perl is that in Perl an undef is used as the indication of the invalid value, and works even better.
There is a pretty wide variety of Rowop constructors:
Rowop(const Label *label, Opcode op, const Row *row);
Rowop(const Label *label, Opcode op, const Rowref &row);
Rowop(const Label *label, Opcode op, const Row *row, int enqMode);
Rowop(const Label *label, Opcode op, const Rowref &row, int enqMode);
Rowop(const Rowop &orig);
Rowop(const Label *label, const Rowop *orig);
The constructors with the explicit enqMode are best not be used outside of the Triceps internals, and will eventually be obsoleted. The last two are the copy constructor, and the adoption constructor which underlies Label::adopt() and can as well be used directly.
Once a rowop is constructed, its components can not be changed any more, only read.
Opcode getOpcode() const;
const Label *getLabel() const;
const Row *getRow() const;
int getEnqMode() const;
Read back the components of the Rowop. Again, the getEnqMode() is on the way to obsolescence. And if you need to check the opcode for being an insert or delete, the better way is to use the explicit test methods, rather than getting the opcode and comparing it for equality:
bool isInsert() const;
bool isDelete() const;
bool isNop() const;
Check whether the opcode requests an insert or delete (or neither).
The same checks are available as static methods that can be used on the opcode values:
static bool isInsert(int op);
static bool isDelete(int op);
static bool isNop(int op);
And the final part is the conversion between the strings and values for the Opcode and OpcodeFlags enums:
static const char *opcodeString(int code);
static int stringOpcode(const char *op);
static const char *ocfString(int flag, const char *def = "???");
static int stringOcf(const char *flag);
As mentioned above, stringOpcode() returns OP_BAD for the unknown strings, not -1.
Unit tracing in C++
By the way, I forgot to mention that Unit lives in sched/Unit.h. Now, to the tracing.
Unlike Perl, in C++ the tracer is defined by inheriting from the class Unit::Tracer. The base class provides the Mtarget, and in the subclass all you need is define your virtual method:
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, TracerWhen when);
It gets called at the exactly same points as the Perl tracer (the C++ part of the UnitTracerPerl forwards the calls to the Perl level). The arguments are also the same as described in the Perl docs. The only difference is that the argument when is a value of enum Unit::TracerWhen.
For example:
class SampleTracer : public Unit::Tracer
{
public:
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, Unit::TracerWhen when)
{
printf("trace %s label '%s' %c\n", Unit::tracerWhenHumanString(when), label->getName().c_str(), Unit::tracerWhenIsBefore(when)? '{' : '}');
}
};
This also shows a few Unit methods used for conversion and testing of the constants:
static const char *tracerWhenString(int when, const char *def = "???");
static int stringTracerWhen(const char *when);
Convert between the when enum value and the appropriate name. def is as usual the default placeholder that will be used for an invalid value. And the conversion from string would return a -1 on an invalid value.
static const char *tracerWhenHumanString(int when, const char *def = "???");
static int humanStringTracerWhen(const char *when);
The same conversion, only using a "human-readable" string format that is nivcer for the messages. Basically, the same thing, only in the lowercase words. For example, TW_BEFORE_CHAINED would become "before-chained".
static bool tracerWhenIsBefore(int when);
static bool tracerWhenIsAfter(int when);
Determines whether a when value is a "before" or "after" kind. This is an addition from 1.1, that was introduced together with the reformed scheduling. As you can see in the example above, it's convenient for printing the braces, or if you prefer indentation, for adjusting the indentation.
The tracer object (not a class but a constructed object!) is set into the Unit:
void setTracer(Onceref<Tracer> tracer);
Onceref<Tracer> getTracer() const;
Theoretically, nothing stops you from using the same tracer object for multiple units, even from multiple threads. But the catch for that is that for the multithreaded calls the tracer must have the internal synchronization. Sharing a tracer between multiple units in the same thread is a more interesting idea. It might be useful in case of the intertwined execution, with the cross-unit calls. But the catch is that the trace will be intertwined all the time.
The SampleTracer above was just printing the trace right away. Usually a better idea is to save the trace in the tracer object and return it on demand. Triceps provides a couple of ready tracers, and they use exactly this approach.
Here is the StringTracer interface:
class StringTracer : public Tracer
{
public:
// @param verbose - if true, record all the events, otherwise only the BEGIN records
StringTracer(bool verbose = false);
// Get back the buffer of messages
// (it can also be used to add messages to the buffer)
Erref getBuffer() const
{
return buffer_;
}
// Replace the message buffer with a clean one.
// The old one gets simply dereferenced, so if you have a reference, you can keep it.
void clearBuffer();
// from Tracer
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, TracerWhen when);
protected:
Erref buffer_;
bool verbose_;
};
An Erref object is used as a buffer, where the data can be added efficiently line-by-line, and later read. On each call StringTracer::execute() builds the string res, and appends it to the buffer:
buffer_->appendMsg(false, res);
The pattern of reading the buffer contents works like this:
string tlog = trace->getBuffer()->print();
trace->clearBuffer();
The log can then be actually printed, or used in any other way. An interesting point is that clearBuffer() doesn't clear the buffer but replaces it with a fresh one. So if you keep a reference to the buffer, you can keep using it:
Erref buf = trace->getBuffer();trace->clearBuffer();
string tlog = buf->print();
The two ready tracers provided with Triceps are:
StringTracer: collects the trace in a buffer, identifying the objects as addresses. This is not exactly easy to read normally but may come useful if you want to analyze a core dump.
StringNameTracer: similar but prints the object identification as names. More convenient but prone to the duplicate names used for different objects.
Unfortunately, at the C++ level there is currently no nice printout of the rowops, like in Perl. But you can always make your own.
The tracing does not have to be used just for tracing. It can also be used for debugging, as a breakpoint: check in your tracer for an arbitrary condition, and stop if it has been met.
There is only one tracer per uint at a time. However if you want, you can implement the chaining in your own tracer (particularly useful if it's a breakpoint tracer): support a reference to another tracer object, and after doing your own part, call that one's execute() method.
Unlike Perl, in C++ the tracer is defined by inheriting from the class Unit::Tracer. The base class provides the Mtarget, and in the subclass all you need is define your virtual method:
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, TracerWhen when);
It gets called at the exactly same points as the Perl tracer (the C++ part of the UnitTracerPerl forwards the calls to the Perl level). The arguments are also the same as described in the Perl docs. The only difference is that the argument when is a value of enum Unit::TracerWhen.
For example:
class SampleTracer : public Unit::Tracer
{
public:
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, Unit::TracerWhen when)
{
printf("trace %s label '%s' %c\n", Unit::tracerWhenHumanString(when), label->getName().c_str(), Unit::tracerWhenIsBefore(when)? '{' : '}');
}
};
This also shows a few Unit methods used for conversion and testing of the constants:
static const char *tracerWhenString(int when, const char *def = "???");
static int stringTracerWhen(const char *when);
Convert between the when enum value and the appropriate name. def is as usual the default placeholder that will be used for an invalid value. And the conversion from string would return a -1 on an invalid value.
static const char *tracerWhenHumanString(int when, const char *def = "???");
static int humanStringTracerWhen(const char *when);
The same conversion, only using a "human-readable" string format that is nivcer for the messages. Basically, the same thing, only in the lowercase words. For example, TW_BEFORE_CHAINED would become "before-chained".
static bool tracerWhenIsBefore(int when);
static bool tracerWhenIsAfter(int when);
Determines whether a when value is a "before" or "after" kind. This is an addition from 1.1, that was introduced together with the reformed scheduling. As you can see in the example above, it's convenient for printing the braces, or if you prefer indentation, for adjusting the indentation.
The tracer object (not a class but a constructed object!) is set into the Unit:
void setTracer(Onceref<Tracer> tracer);
Onceref<Tracer> getTracer() const;
Theoretically, nothing stops you from using the same tracer object for multiple units, even from multiple threads. But the catch for that is that for the multithreaded calls the tracer must have the internal synchronization. Sharing a tracer between multiple units in the same thread is a more interesting idea. It might be useful in case of the intertwined execution, with the cross-unit calls. But the catch is that the trace will be intertwined all the time.
The SampleTracer above was just printing the trace right away. Usually a better idea is to save the trace in the tracer object and return it on demand. Triceps provides a couple of ready tracers, and they use exactly this approach.
Here is the StringTracer interface:
class StringTracer : public Tracer
{
public:
// @param verbose - if true, record all the events, otherwise only the BEGIN records
StringTracer(bool verbose = false);
// Get back the buffer of messages
// (it can also be used to add messages to the buffer)
Erref getBuffer() const
{
return buffer_;
}
// Replace the message buffer with a clean one.
// The old one gets simply dereferenced, so if you have a reference, you can keep it.
void clearBuffer();
// from Tracer
virtual void execute(Unit *unit, const Label *label, const Label *fromLabel, Rowop *rop, TracerWhen when);
protected:
Erref buffer_;
bool verbose_;
};
An Erref object is used as a buffer, where the data can be added efficiently line-by-line, and later read. On each call StringTracer::execute() builds the string res, and appends it to the buffer:
buffer_->appendMsg(false, res);
The pattern of reading the buffer contents works like this:
string tlog = trace->getBuffer()->print();
trace->clearBuffer();
The log can then be actually printed, or used in any other way. An interesting point is that clearBuffer() doesn't clear the buffer but replaces it with a fresh one. So if you keep a reference to the buffer, you can keep using it:
Erref buf = trace->getBuffer();trace->clearBuffer();
string tlog = buf->print();
The two ready tracers provided with Triceps are:
StringTracer: collects the trace in a buffer, identifying the objects as addresses. This is not exactly easy to read normally but may come useful if you want to analyze a core dump.
StringNameTracer: similar but prints the object identification as names. More convenient but prone to the duplicate names used for different objects.
Unfortunately, at the C++ level there is currently no nice printout of the rowops, like in Perl. But you can always make your own.
The tracing does not have to be used just for tracing. It can also be used for debugging, as a breakpoint: check in your tracer for an arbitrary condition, and stop if it has been met.
There is only one tracer per uint at a time. However if you want, you can implement the chaining in your own tracer (particularly useful if it's a breakpoint tracer): support a reference to another tracer object, and after doing your own part, call that one's execute() method.
Thursday, December 20, 2012
Unit in C++
I've been distracted a bit with the other things, and now I'm working on the multithreaded support. I expect that it will be some time until that jells together. In the meantime, let's continue the description of the C++ API. The next class is the Unit. This class has been modified in 1.1.0, and I will be describing the new version, without going separately into 1.0.
Unit(const string &name);
Constructs the execution unit.
const string &getName() const;
void setName(const string &name);
Get back of modify the name. Modifying the name is probably not a good idea, but the method is still here.
void schedule(Onceref<Rowop> rop);
void scheduleTray(const_Onceref<Tray> tray);
void fork(Onceref<Rowop> rop);
void forkTray(const_Onceref<Tray> tray);
void call(Onceref<Rowop> rop);
void callTray(const_Onceref<Tray> tray);
void enqueue(int em, Onceref<Rowop> rop);
void enqueueTray(int em, const_Onceref<Tray> tray);
Schedule, fork or call a rowop or tray, like in Perl. Unlike Perl, the methods with a tray argument have different names. And the enqueueing mode is always an integer constant. These constants are defined in the enum Gadget::EnqMode (the Gadget class will be described soon), and is one of Gadget::EM_SCHEDULE, Gadget::EM_FORK, Gadget::EM_CALL and Gadget::EM_IGNORE. I'm not sure if I've described EM_IGNORE before. I think I did but just in case: it means "do nothing with this rowop", and it's available in Perl too.
bool empty() const;
Check whether all the Unit's frames are empty.
void callNext();
void drainFrame();
Execute the next rowop from the current (innermost) frame, or all the rowops on the current frame. The semantics is the same as in the Perl code.
void setMark(Onceref<FrameMark> mark);
Set a mark on the current frame, same as in Perl.
void loopAt(FrameMark *mark, Onceref<Rowop> rop);
void loopTrayAt(FrameMark *mark, const_Onceref<Tray> tray);
Enqueue a rowop of tray at the marked frame.
void callAsChained(const Label *label, Rowop *rop, const Label *chainedFrom);
This method was introduced in version 1.1, and hasn't propagated to Perl yet. I'm not even sure that I want it visible in Perl, since it's kind of low-level. It executes a label call, assuming that it was chained from another label (before 1.1 the functionality itself had obviously existed but was not visible in the API).
Here the row types of all the arguments must be matching. It asks to call the label with rowop, where the target label was chained from another label. It will do all the correct tracing for the chained calls. This method is used for example in the streaming functions, when an FnReturn calls through an FnBinding. You can use it directly as well, just be careful. And remember that keeping the consistency in the tracing is up to you: if you use the chainedFrom label argument that hasn't actually been called, the trace will look surprising.
void clearLabels();
Clear all the unit's labels, same semantics as when called from Perl.
void rememberLabel(Label *lab);
The method that connects a label to the unit. Normally you don't need to call it manually, the label constructor calls it (and that's why it's not in the Perl API). The only real reason to use this method manually is if you've disconnected a label manually from the unit, and want to reconnect it back (and I'm not sure if anyone would ever want that). Calling this method repeatedly with the same label and unit will have no effect. Remembering the same label in multiple units is not a good idea.
void forgetLabel(Label *lab);
Make the unit forget a label, so on clearLabels() that label won't be cleared. This is another dangerous low-level method, since only the unit will forget about the label but the label will still keep the pointer to the unit, unless it's cleared. Because of the danger, it's also not in the Perl API. The reason to use it would be if you want to disassemble and discard a part of the unit without disturbing the rest of it. However a safer alternative is to just create multiple units in one thread and discard by a whole unit.
RowType *getEmptyRowType() const;
A convenience method to get a reference to a row type with no fields. Such row type is useful for creation of pseudo-labels that have the user-defined clearing handlers that clear some user data. This has been described in more detail before.
void setMaxStackDepth(int v);
int maxStackDepth() const;
void setMaxRecursionDepth(int v);
int maxRecursionDepth() const;
Set and get the maximal unit stack depth and recursion depth, works the same as in Perl.
That's it, except for the tracing support. I'll describe that in a separate post.
And there is also a class that can be used to trigger the unit clearing on leaving scope:
UnitClearingTrigger(Unit *unit);
The trigger is an Mtarget, so the typical use would be:
{
Autoref<UnitClearingTrigger> ctrig = new UnitClearingTrigger(myunit);
...
}
At the block exit the Autoref will get destroyed, destroy the trigger, which would in turn cause the clearing of the unit. Of course, you can also place the Autoref into another object, and then the destruction of that object would cause the clearing, instead of the end of the block.
Unit(const string &name);
Constructs the execution unit.
const string &getName() const;
void setName(const string &name);
Get back of modify the name. Modifying the name is probably not a good idea, but the method is still here.
void schedule(Onceref<Rowop> rop);
void scheduleTray(const_Onceref<Tray> tray);
void fork(Onceref<Rowop> rop);
void forkTray(const_Onceref<Tray> tray);
void call(Onceref<Rowop> rop);
void callTray(const_Onceref<Tray> tray);
void enqueue(int em, Onceref<Rowop> rop);
void enqueueTray(int em, const_Onceref<Tray> tray);
Schedule, fork or call a rowop or tray, like in Perl. Unlike Perl, the methods with a tray argument have different names. And the enqueueing mode is always an integer constant. These constants are defined in the enum Gadget::EnqMode (the Gadget class will be described soon), and is one of Gadget::EM_SCHEDULE, Gadget::EM_FORK, Gadget::EM_CALL and Gadget::EM_IGNORE. I'm not sure if I've described EM_IGNORE before. I think I did but just in case: it means "do nothing with this rowop", and it's available in Perl too.
bool empty() const;
Check whether all the Unit's frames are empty.
void callNext();
void drainFrame();
Execute the next rowop from the current (innermost) frame, or all the rowops on the current frame. The semantics is the same as in the Perl code.
void setMark(Onceref<FrameMark> mark);
Set a mark on the current frame, same as in Perl.
void loopAt(FrameMark *mark, Onceref<Rowop> rop);
void loopTrayAt(FrameMark *mark, const_Onceref<Tray> tray);
Enqueue a rowop of tray at the marked frame.
void callAsChained(const Label *label, Rowop *rop, const Label *chainedFrom);
This method was introduced in version 1.1, and hasn't propagated to Perl yet. I'm not even sure that I want it visible in Perl, since it's kind of low-level. It executes a label call, assuming that it was chained from another label (before 1.1 the functionality itself had obviously existed but was not visible in the API).
Here the row types of all the arguments must be matching. It asks to call the label with rowop, where the target label was chained from another label. It will do all the correct tracing for the chained calls. This method is used for example in the streaming functions, when an FnReturn calls through an FnBinding. You can use it directly as well, just be careful. And remember that keeping the consistency in the tracing is up to you: if you use the chainedFrom label argument that hasn't actually been called, the trace will look surprising.
void clearLabels();
Clear all the unit's labels, same semantics as when called from Perl.
void rememberLabel(Label *lab);
The method that connects a label to the unit. Normally you don't need to call it manually, the label constructor calls it (and that's why it's not in the Perl API). The only real reason to use this method manually is if you've disconnected a label manually from the unit, and want to reconnect it back (and I'm not sure if anyone would ever want that). Calling this method repeatedly with the same label and unit will have no effect. Remembering the same label in multiple units is not a good idea.
void forgetLabel(Label *lab);
Make the unit forget a label, so on clearLabels() that label won't be cleared. This is another dangerous low-level method, since only the unit will forget about the label but the label will still keep the pointer to the unit, unless it's cleared. Because of the danger, it's also not in the Perl API. The reason to use it would be if you want to disassemble and discard a part of the unit without disturbing the rest of it. However a safer alternative is to just create multiple units in one thread and discard by a whole unit.
RowType *getEmptyRowType() const;
A convenience method to get a reference to a row type with no fields. Such row type is useful for creation of pseudo-labels that have the user-defined clearing handlers that clear some user data. This has been described in more detail before.
void setMaxStackDepth(int v);
int maxStackDepth() const;
void setMaxRecursionDepth(int v);
int maxRecursionDepth() const;
Set and get the maximal unit stack depth and recursion depth, works the same as in Perl.
That's it, except for the tracing support. I'll describe that in a separate post.
And there is also a class that can be used to trigger the unit clearing on leaving scope:
UnitClearingTrigger(Unit *unit);
The trigger is an Mtarget, so the typical use would be:
{
Autoref<UnitClearingTrigger> ctrig = new UnitClearingTrigger(myunit);
...
}
At the block exit the Autoref will get destroyed, destroy the trigger, which would in turn cause the clearing of the unit. Of course, you can also place the Autoref into another object, and then the destruction of that object would cause the clearing, instead of the end of the block.