Sergey Babkin on CEP and stuff: July 2013

Saturday, July 27, 2013

string utilities

As I'm editing the documentation for 2.0, I've found an omission: the string helper functions in the C++ API haven't been documented yet. Some of them have been mentioned but not officially documented.

The first two are declared in common/Strprintf.h:

string strprintf(const char *fmt, ...);
string vstrprintf(const char *fmt, va_list ap);

They are entirely similar to sprintf() and vsprintf() with the difference that they place the result of formatting into a newly constructed string and return that string.

The rest are defined in common/StringUtil.h:

extern const string &NOINDENT;

The special constant that when passed to the printing of the Type, causes it to print without line breaks. Doesn't have any special effect on Errors, there it's simply treated as an empty string.

const string &nextindent(const string &indent, const string &subindent, string &target);

Compute the indentation for the next level when printing a Type. The arguments are:

indent - indent string of the current level
subindent - characters to append for the next indent level
target - buffer to store the extended indent string

The passing of target as an argument allows to reuse the same string object and avoid the extra construction.

The function returns the computed reference: if indent was NOINDENT, then reference to NOINDENT, otherwise reference to target. This particular calling pattern is strongly tied to how things are computed inside the type printing, but you're welcome to look inside it and do the same for any other purpose.

void newlineTo(string &res, const string &indent);

Another helper function for the printing of Type, inserting a line break. The indent argument specifies the indentation, with the special handling of NOINDENT: if indent is NOINDENT, a single space is added, thus printing everything in one line; otherwise a \n and the contents of indent are added. The res argument is the result string, where the line break characters are added.

void hexdump(string &dest, const void *bytes, size_t n, const char *indent = "");

Print a hex dump of a sequence of bytes (at address bytes and of length n), appending the dump to the destination string dest. The data will be nicely broken into lines, with 16 bytes printed per line. The first line is added directly to the end of the dest as-is, but if n is over 16, the other lines will follow after \n. The indent argument allows to add indentation at the start of each following string.

void hexdump(FILE *dest, const void *bytes, size_t n, const char *indent = "");

Another version, sending the dumped data directly into a file descriptor.

The next pair of functions provides a generic mechanism for converting enums between a string and integer representation:

struct Valname
{
    int val_;
    const char *name_;
};

int string2enum(const Valname *reft, const char *name);
const char *enum2string(const Valname *reft, int val, const char *def = "???");

The reference table is defined with an n array of Valnames, with the last element being { -1, NULL }. Then it's passed as the argument reft of the conversion functions which do a sequential look-up by that table. If the argument is not found, string2enum() will return -1, and enum2string() will return the value of the def argument (which may be NULL).

Here is an example of how it's used for the conversion of opcode flags:

Valname opcodeFlags[] = {
    { Rowop::OCF_INSERT, "OCF_INSERT" },
    { Rowop::OCF_DELETE, "OCF_DELETE" },
    { -1, NULL }
};

const char *Rowop::ocfString(int flag, const char *def)
{
    return enum2string(opcodeFlags, flag, def);
}

int Rowop::stringOcf(const char *flag)
{
    return string2enum(opcodeFlags, flag);
}

Friday, July 19, 2013

GCC warning flags

The GCC warning flags have been an annoying issue: in general, I've been building Triceps with warning-treated-as-errors, except for a few of them disabled for uselessness. However this presents a problem with the varying versions of GCC: the older versions don't have some of the warning flags I use, so they fail, and the new ones have more warnings of the useless variety that fail in a different way (and there are some non-useless, and they occasionally help detecting more weird stuff but that's not the point).

The solution I've come up, is to use the different warning flags for the build checked out from SVN on the trunk branch, and for all the other builds. All these warning flags are now enabled only if the path of the Triceps directory ends in "trunk". Otherwise they are skipped. If you check out the code directly from SVN, you still need to worry about the warning flags but not notherwise.

Thursday, July 18, 2013

Perl 5.19 and SIGUSR2

I've tested Triceps with Perl version 5.19. This required fixing some expected error messages that have changed, and now the patterns accept both the old and new error messages.

But the worst part is that the Perl 5.19 was crashing on SIGUSR2. If you're interested in the details, see https://rt.perl.org/rt3//Public/Bug/Display.html?id=118929. I've worked around this issue by overriding the Perl's signal handler for SIGUSR2 in the XS code.

The method is Triceps::sigusr2_setup(), and it gets called during the Triceps module loading. Internally it translates to the C++ method Sigusr2::setup() that sets the dummy handler on the first call.

This has a consequence that you can't set a real SIGUSR2 handler in Perl any more. But it stops Perl from crashing, and there probably isn't much reason to do a custom handler of SIGUSR2 anyway.

Sunday, July 14, 2013

BasicPthread reference (C++)

As you can see from the previous descriptions, building a new Triead is a serious business, containing many moving part. Doing it every time from scratch would be hugely annoying and error prone. The class BasicPthread, defined in app/BasicPthread.h, takes care of wrapping all that complicated logic.

It originated as a subclass of pw::pwthread, and even though it ended up easier to copy and modify the code (okay, maybe this means that pwthread can be made more flexible), the usage is still very similar to it. You define a new subclass of BasicPthread, and define the virtual function execute() in it. Then you instantiate the object and call the method start() with the App argument.

For a very simple example:

class MainLoopPthread : public BasicPthread
{
public:
    MainLoopPthread(const string &name):
        BasicPthread(name)
    {
    }

    // overrides BasicPthread::start
    virtual void execute(TrieadOwner *to)
    {
        to->readyReady();
        to->mainLoop();
    }
};

...

Autoref<MainLoopPthread> pt3 = new MainLoopPthread("t3");
pt3->start(myapp);

It will properly create the Triead, TrieadOwner, register the thread joiner and start the execution. The TrieadOwner will pass through to the execute() method, and its field fi_ will contain the reference to the FileInterrupt object. After execute() returns, it will take care of marking the thread as dead.

It also wraps the call of execute() into a try/catch block, so any Exceptions thrown will be caught and cause the App to abort. In short, it's very similar to the Triead management in Perl.

You don't need to keep the reference to the thread object afterwards, you can even do the construction and start in one go:

(new MainLoopPthread("t3"))->start(myapp);

The internals of BasicPthread will make sure that the object will be dereferenced (and thus, in the absence of other references, destroyed) after the thread gets joined by the harvester.

Of course, if you need to pass more arguments to the thread, you can define them as fields in your subclass, set them in the constructor (or by other means between constructing the object and calling start()), and then execute() can access them. Remember, execute() is a method, so it receives not only the TrieadObject as an argument but also the BasicPthread object as this.

BasicPthread is implemented as a subclass of TrieadJoin, and thus is an Mtarget. It provides the concrete implementation of the joiner's virtual methods, join() and interrupt(). Interrupt() calls the method of the base class, then sends the signal SIGUSR2 to the target thread.

And finally the actual reference:

BasicPthread(const string &name);

Constructor. The name of the thread is passed through to App::makeTriead(). The Triead will be constructed in start(), the BasicPthread constructor just collects together the arguments.

void start(Autoref<App> app);

Construct the Triead, create the POSIX thread, and start the execution there.

void start(Autoref<TrieadOwner> to);

Similar to the other version of start() but uses a pre-constructed TrieadOwner object. This version is useful mostly for the tests, and should not be used much in the real life.

virtual void execute(TrieadOwner *to);

Method that must be redefined by the subclass, containing the threads's logic.

virtual void join();
virtual void interrupt();

Methods inherited from TrieadJoin, providing the proper implementations for the POSIX threads.

And unless, I've missed something, this concludes the description of the Triceps threads API.

Saturday, July 13, 2013

FileInterrupt reference (C++)

FileInterrupt is the class that keeps track of a bunch of file descriptor and revokes them on demand, hopefully interrupting any ongoing operations on them (and if that doesn't do the job, a separately sent signal will). It's not visible in Perl, being intergrated into the TrieadOwner methods, but in C++ it's a separate class. It's defined in app/FileInterrupt.h, and is an Mtarget, since the descriptors are registered and revoked from different threads.

FileInterrupt();

The constructor, absolutely plain. Normally you would not want to construct it directly but use the object already constructed in TrieadJoin. The object keeps the state, whether the interruption had happened, and is obviously initialized to the non-interrupted state.

void trackFd(int fd);

Add a file descriptor to the tracked interruptable set. If the interruption was already done, the descriptor will instead be revoked right away by dupping over from /dev/null. If the attempt to open /dev/null fails, it will throw an Exception.

void forgetFd(int fd);

Remove a file descriptor to the tracked interruptable set. You must do it before closing the descriptor, or a race leading to the corruption of random file descriptors may occur. If this file descriptor was not registered, the call will be silently ignored.

void interrupt();

Perform the revocation of all the registered file descriptors by dupping over them from /dev/null. If the attempt to open /dev/null fails, it will throw an Exception.

This marks the FileInterrupt object state as interrupted, and any following trackFd() calls will lead to the immediate revocation of the file descriptors in them, thus preventing any race conditions.

bool isInterrupted() const;

Check whether this object has been interrupted.

TrieadJoin reference (C++)

TrieadJoin is the abstract base class that tells the harvester how to join a thread after it had finished. Obviously, it's present only in the C++ API and not in Perl (and by the way, the reference of the Perl classes in 2.0 has been completed, the remaining classes are only in C++).

Currently TrieadJoin has two subclasses: PerlTrieadJoin for the Perl threads and BasicPthread for the POSIX threads in C++. I won't be describing PerlTriedJoin, since it's in the internals of the Perl implementation, never intended to be directly used by the application developers, and if you're interested, you can always look at its source code. I'll describe BasicPthread later.

Well, actually there is not a whole lot of direct use for TrieadJoin either: you need to worry about it only if you want to define a joiner for some other kind of threads, and this is not very likely. But since I've started, I'll complete the write-up of it.

So, if you want to define a joiner for some other kind of threads, you define a subclass of it, with an appropriately defined method join().

TrieadJoin is an Mtarget, naturally referenced from multiple threads (at the very least it's created in the thread to be joined or its parent, and then passed to the harvester thread by calling App::defineJoin()). The methods of TrieadJoin are:

TrieadJoin(const string &name);

The constrcutor. The name is the name of the Triead, used for the error messages. Due to the various syncronization reasons, this makes the life of the harvester much easier, than trying to look up the name from the Triead object.

virtual void join() = 0;

The Most Important joining method to be defined by the subclass. The subclass object must also hold the identity of the thread in it, to know which thread to join. The harvester will call this method.

virtual void interrupt();

The method that interrupts the target thread when it's requested to die. It's called in the context of the thread that triggers the App shutdown (or otherwise requests the target thread to die). By default the TrieadJoin carries a FileInterrupt object in it (it gets created on TrieadJoin construction, and then TrieadJoin keeps a reference to it), that will get called by this method to revoke the files. But everything else is a part of the threading system, and the base class doesn't know how to do it, the subclasses must define their own methods, wrapping the base class.

Both PerlTrieadJoin and BasicPthread add sending the signal SIGUSR2 to the target thread. For that they use the same target thread identity kept in the object as used by the join() call.

FileInterrupt *fileInterrupt() const;

Get a pointer to the FileInterrupt object defined in the TrieadJoin. The most typical use is to pass it to the TrieadOwner object, so that it can be easily found later:

to->fileInterrupt_ = fileInterrupt();

Though of course it could be kept in a separate local Autoref instead.

const string &getName() const;

Get back the name of the joiner's thread.

SIGUSR2

When a thread is requested to die, its registered file descriptors become revoked, and the signal SIGUSR2 is sent to it to interrupt any ongoing system calls. For this to work correctly, there must be a signal handler defined on SIGUSR2, because otherwise the default reaction to it is to kill the process. It doesn't matter what signal handler, just some handler must be there. The Triceps library defines an empty signal handler but you can also define your own instead.

In Perl, the empty handler for SIGUSR2 is set when the module Triceps.pm is loaded. You can change it afterwards.

In C++ Triceps provides a class Sigusr2, defined in app/Sigusr2.h, to help with this. If you use the class BasicPthread, you don't need to deal with Sigusr2 directly: BasicPthread takes care of it. All the methods of Sigusr2 are static.

static void setup();

Set up an empty handler for SIGUSR2 if it hasn't been done yet. This class has a static flag (synchronized by a mutex) showing that the handler had been set up. On the first call it sets the handler and sets the flag. On the subsequent calls it checks the flag and does nothing.

static void markDone();

Just set the flag that the setup has been done. This allows to set your own handler instead and still cooperate with the logic of Sigusr2 and BasicPthread.

If you set your custom handler before any threads have been started, then set up your handler and then call markDone(), telling Sigusr2 that there is no need to set the handler any more.

If you set your custom handler when the Triceps threads are already running (not the best idea but still a possibility), there is a possibility of a race with another thread calling setup(). To work around that race, set up your handler, call markDone(), then set up your handler again.

static void reSetup();

This allows to replace the custom handler with the empty one. It always forcibly sets the empty handler (and also the flag).

odds and ends

While working on threads support, I've added a few small features here and there. Some of them have been already described, some will be described now. I've also done a few more small clean-ups.

First, the historic methods setName() are now gone everywhere. This means Unit and Label classes, and in C++ also the Gadget. The names can now only be specified during the object construction.

FnReturn has the new method:

$res = fret->isFaceted();

bool isFaceted() const;

It returns true (or 1 in Perl) if this FnReturn object is a part of a Facet.

Unit has gained a couple of methods:

$res = $unit->isFrameEmpty();

bool isFrameEmpty() const;

Check whether the current frame is empty. This is different from the method empty() that checks whether the the whole unit is empty. This method is useful if you run multiple units in the same thread, with some potentially complicated cross-unit scheduling. It's what nextXtray() does with a multi-unit Triead, repeatedly calling drainFrame() for all the units that are found not empty. In this situation the simple empty() can not be used because the current inner frame might not be the outer frame, and draining the inner frame can be repeated forever while the outer frame will still contain rowops. The more precise check of isFrameEmpty() prevents the possibility of such endless loops.

$res = $unit->isInOuterFrame();

bool isInOuterFrame() const;

Check whether the unit's current inner frame is the same as its outer frame, which means that the unit is not in the middle of a call.

In Perl the method Rowop::printP() has gained an optional argument for the printed label name:

$text = $rop->printP();
$text = $rop->printP($lbname);

The reason for that is to make the printing of rowops in the chained labels more convenient. A chained label's execution handler receives the original unchanged rowop that refers to the first label in the chain. So when it gets printed, it will print the name of the first label in the chain, which might be very surprising. The explicit argument allows to override it to the name of the chained label (or to any other value).

In C++ the Autoref has gained the method swap():

void swap(Autoref &other);

It swaps the values of two references without changing the reference counts in the referred values. This is a minor optimization for such a special situation. One or both references may contain NULL.

In C++ the Table has gained the support for sticky errors. The table internals contain a few places where the errors can't just throw an Exception because it will mess up the logic big time, most specifically the comparator functions for the indexes. The Triceps built-in indexes can't encounter any errors in the comparators but the user-defined ones, such as the Perl Sorted Index, can. Previously there was no way to report these errors other than print the error message and then either continue pretending that nothing happened or abort the program.

The sticky errors provide a way out of this sticky situation. When an index comparator encounters an error, it reports it as a sticky error in the table and then returns false. The table logic then unrolls like nothing happened for a while, but before returning from the user-initiated method it will find this sticky error and throw an Exception at a safe time. Obviously, the incorrect comparison means that the table enters some messed-up state, so all the further operations on the table will keep finding this sticky error and throw an Exception right away, before doing anything. The sticky error can't be unstuck. The only way out of it is to just discard the table and move on.

void setStickyError(Erref err);

Set the sticky error from a location where an exception can not be thrown, such as from the comparators in the indexes. Only the first error sticks, all the others are ignored since (a) the table will be dead and throwing this error in exceptions from this point on anyway and (b) the comparator is likely to report the same error repeatedly and there is no point in seeing multiple copies.

Errors *getStickyError() const;

Get the table's sticky error. Normally there is no point in doing this manually, but just in case.

void checkStickyError() const;

If the sticky error has been set, throw an Exception with it.

AutoDrain reference, C++

The scoped drain in C++ has more structure than in Perl. It consists of the base class AutoDrain and two subclasses: AutoDrainShared and AutoDrainExclusive. The base AutoDrain can not be created directly but is convenient for keeping a reference to any kind of drain:

Autoref<AutoDrain> drain = new AutoDrainShared(app);

The constructor of the subclass determines whether the drain is shared or exclusive, and the rest of the methods are defined in the base class.

The AutoDrain is an Starget, and naturally can be used in only one thread. It's also possible to use the a direct local variable:

{
AutoDrainShared drain(app);
...
}

Just remember not to mix the metaphors, if you create a local variable, don't try to create the references to it.

The constructors are:

AutoDrainShared(App *app, bool wait = true);
AutoDrainShared(TrieadOwner *to, bool wait = true);

Create a shared drain from an App or TrieadOwner. The wait flag determines if the constructor will wait for the drain to complete, otherwise it will return immediately. The shared drain requests the draining of all the Trieads, and multiple threads may have their shared drains active at the same time (the release will happen when all these drains become released). A shared drain will wait for all the preceding exclusive drains to be released before it gets created.

AutoDrainExclusive(TrieadOwner *to, bool wait = true);

Create an exclusive drain from a TrieadOwner. The wait flag makes the constructor wait for the completion of the drain. Only one exclusive drain at a time may be active, from only one thread, an exclusive drain will wait for all the preceding shared and exclusive drains to be released before it gets created.

The common method is:

void wait();

Wait for the drain to complete. Can be called repeatedly. If more data has been injected into the model through the excluded Triead, will wait for that data to drain.

I'm not sure if I mentioned this yet, but if the new Trieads are created while a drain is active, these Trieads will be notified of the drain. This means that the input-only Trieads won't be able to send any data until the drain is released. However the Trieads in the middle of the model will follow the normal protocol for such threads: the drain will become incomplete after the Triead is marked as ready and until it blocks on the following nextXtray(). Normally this should be a very short amount of time. However such Trieads should take care to never send any rowops on their own before reading from nextXtray(), if they find that TrieadOwner::isRdDrain() returns true, because this may introduce the data into the model at a very inconvenient moment, when some logic expects that no data is changing, and cause a corruption. This is the same caveat as for using nextXtray() varieties with the timeouts: if you want to send data on a timeout, always check isRqDrain(), and never send any data on timeouts if isRqDrain() returns true.

I've also realized that I might have used the word "undrain"/"undrained" for two different meanings: one is for the drain release, the other is for the condition when there is some data being processed in the model (whether a drain is active or not, but if it's active, it will not be completed). Perhaps I should make it less ambiguous. But for now just please keep this ambiguity in mind.

Friday, July 12, 2013

AutoDrain reference, Perl

The AutoDrain class creates the drains on an App with the automatic scoping. When the returned AutoDrain object gets destroyed, the drain becomes released. So placing the object into a lexically-scoped variable in a block with cause the undrain on the block exit. Placing it into another object will cause the undrain on deletion of that object. And just not storing the object anywhere works as a barrier: the drain gets completed and then immediately undrained, guaranteeing that all the previously sent data is processed and then continuing with the processing of the new data.

All the drain caveats described in the App class apply to the automatic drains too.

$ad = Triceps::AutoDrain::makeShared($app);
$ad = Triceps::AutoDrain::makeShared($to);

Create a shared drain and wait for it to complete. A drain may be created from either an App or a TrieadOwner object. Returns the AutoDrain object.

$ad = Triceps::AutoDrain::makeSharedNoWait($app);
$ad = Triceps::AutoDrain::makeSharedNoWait($to);

Same as makeShared() but doesn't wait for the drain to complete before returning. May still sleep if an exclusive drain is currently active.

$ad = Triceps::AutoDrain::makeExclusive($to);

Create an exclusive drain on a TrieadOwner and wait for it to complete. Returns the AutoDrain object. Normally the excluded thread should be input-only. Such an input-only thread is allowed to send more data in without blocking (to wait for the app become drained again after that, use the method wait()).

$ad = Triceps::AutoDrain::makeExclusiveNoWait($to);

Same as makeExclusive() but doesn't wait for the drain to complete before returning. May still sleep if a shared or another exclusive drain is currently active.

$ad->wait();

Wait for the drain to complete. Particularly useful after the NoWait creation, but can also be used to wait for the App to become drained again after injecting some rowops through the excluded Triead of the exclusive drain.

$ad->same($ad2);

Check that two AutoDrain references point to the same object.

Thursday, July 11, 2013

no more explicit confessions

It's official: all the code has been converted to the new error handling. Now if anything goes wrong, the Triceps Perl calls just confess right away. No more need for the pattern 'or confess "$!"' that was used throughout the code (though of course you can still use it for handling the other errors).

It also applies to the error checks done by the XS typemaps, these will also confess automatically.

I've also added one more method that doesn't confess: IndexType::getTabtypeSafe(). If the index type is not set into a table type, it will silently return an undef without any error indications.

On a related note, the construction of the Type subclasses has been made nicer in the C++: instead of calling abort() on the major errors, they now throw Exceptions. Mind you, these exceptions are thrown not in the constructors as such but in the chainable methods that set the contents of the types. And they try to be smart enough to preserve the reference count correctness: if the object was not assigned into any reference yet (as is typical for the chained calls), they take care to temporarily increase and decrease the reference count, thus freeing the object, before throwing. Of course, the default reaction to Exceptions is still to dump core, but need be, these exceptions can be caught.

Sunday, July 7, 2013

safe functions in RowHandle

As I'm updating the error reporting in the Perl methods, there is one more class that has grown the safe (non-confessing functions). In RowHandle now the method

$row = $rh->getRow();

confesses if the RowHandle is NULL. The method

$row = $rh->getRowSafe();

returns an undef in this situation, just like getRow() used to, only now it doesn't set the text in $! any more. A consequence is that some of the Aggregator examples that branch directly on checking whether a row handle contains NULL, now had to be changed to use getRowSafe().

The method

$result = $rh->isInTable();

has also been updated for the case when it contains a NULL: now it simply returns 0 (instead of undef) and doesn't set the text in $!.

Saturday, July 6, 2013

findSubIndexSafe

The scalar leakage in Carp::confess was causing an unpleasant issue with the functions that were trying to look up the nested indexes and catch when they went missing. So, similarly to the string conversions, I've added the method findSubIndexSafe() to the TableType and IndexType:

$ixt = $tt->findSubIndexSafe($name);
$ixt = $ixt_parent->findSubIndexSafe($name);

If the name is not found, it silently returns an undef, without setting any error codes.

Eventually the issue with the leakage in confess() would have to be fixed, but for now it's a good enough plug for the most typical cases. I'll need to think about other methods that could use the safe treatment.

Friday, July 5, 2013

carp carpity carp

I've found that calling Carp::confess (more exactly, even the lower-level function Carp::longmess that gets called by Carp::confess and by Triceps's error reporting from the C++ code) in a threaded program leaks the scalars, apparently by leaving garbage on the Perl stack.

The problem seems to be in the line "package DB;" in the middle of one of its internal functions. Perhaps changing the package in the middle of a function is not such a great idea, leaving some garbage on the stack. The most interesting part is that this line can be removed altogether, with no adverse effects, and then the leak stops.

Oh, well, looks like the homebrewn Triceps::confess will be coming soon. And I'd like to find some contact to get the stock Carp fixed.

Thursday, July 4, 2013

string-and-constant conversion errors

As outlined before, I'm going through the the Perl API, changing to the new way of the error reporting by confessing. On of the consequences is that the functions for parsing the string name of a constant into the value got converted too. They now confess on an invalid string. However sometimes it's still convenient to check the string for validity without causing an error, so for that I've added another version of them, with the suffix Safe attached, that still work the old way, returning an undef for an unknown string.

For example, &Triceps::humanStringTracerWhen and Triceps::humanStringTracerWhenSafe.

The same applies the other way around to: the functions converting the integer enumeration value to string will confess when the value is not in the proper enum. The functions with the suffix Safe will still return an undef. Such as &Triceps::tracerWhenHumanString and &Triceps::tracerWhenHumanStringSafe.

The only special case is &Triceps::opcodeString and &Triceps::opcodeStringSafe: neither of them confesses nor returns undef, they print the unknown strings as a combination of bits, but for consistency both names are present anyway, doing the same thing.

No more enqueueing mode for table creation

I've finally got around to get rid of that obsolete enqueuing mode argument for the table creation, which always ended up as EM_CALL nowadays anyway. So, now in Perl the call becomes:

$uint->makeTable($tabType, $name);

In C++ the Table constructor becomes:

Table(Unit *unit, const string &name, const TableType *tt, const RowType *rowt, const RowHandleType *handt);

And the convenience wrapper in the TableType:

Onceref<Table> makeTable(Unit *unit, const string &name) const;

Yeah, it's kind of weird that in Perl the method makeTable() is defined on Unit, and in C++ on TableType. But if I remember correctly, it has to do with avoiding the circular dependency in the C++ header files.

Tuesday, July 2, 2013

Facet reference, C++

The general functioning of a facet is the same in C++ as in Perl, so please refer to the part 1 of the Perl description for this information.

However the construction of a Facet is different in C++. The import is still the same: you call TrieadOwner::importNexus() method or one of its varieties and it returns a Facet. However the export is different: first you construct a Facet from scratch and then give it to TrieadOwner::exportNexus() to create a Nexus from it.

In the C++ API the Facet has a notion of being imported, very much like say the FnReturn has a notion of being initialized. When a Facet is first constructed, it's not imported. Then exportNexus() creates the nexus from the facet, exports it into the App, and also imports the nexus information back into the facet, marking the facet as imported. It returns back a reference to the exact same Facet object, only now that object becomes imported. Obviously, you can use either the original or returned reference, they point to the same object. Once a Facet has been imported, it can not be modified any more. The Facet object returned by the importNexus() is also marked as imported, so it can not be modified either. And also an imported facet can not be exported again.

However exportNexus() has an exception. If the export is done with the argument import set to false, the facet object will be left unchanged and not marked as imported. A facet that is not marked as imported can not be used to send or receive data. Theoretically, it can be used to export another nexus, but practically this would not work because that would be an attempt to export another nexus with the same name from the same thread. In reality such a Facet can only be thrown away, and there is not much use for it. You can read its components and use them to construct another Facet but that's about it.
It might be more convenient to use the TrieadOwner::makeNexus*() methods to build a Facet object rather than building it directly. In either case, the methods are the same and accept the same arguments, just the Facet methods return a Facet pointer while the nexus maker methods return a NexusMaker pointer.

The Facet class is defined in app/Facet.h. It inherits from Mtarget for an obscure reason that has to do with App topology analysis but it's intended to be used from one thread only.

You don't have to keep your own references to all your Facets. The TrieadOwner will keep a reference to all the imported Facets, and they will not be destroyed while the TrieadOwner exists (and this applies to Perl as well).

    enum {
        DEFAULT_QUEUE_LIMIT = 500,
    };

The default value used for the nexus queue limit (as a count of Xtrays, not rowops). Since the reading from the nexus involves double-buffering, the real queue size might grow up to twice that amount.

static string buildFullName(const string &tname, const string &nxname);

Build the full nexus name from its components.

Facet(Onceref<FnReturn> fret, bool writer);

Create a Facet, initially non-imported (and non-exported). The FnReturn object fret defines the set of labels in the facet (and nexus), and the name of FnReturn also becomes the name of the Facet, and of the Nexus. The writer flag determines whether this facet will become a writer (if true) or a reader (if false) when a nexus is created from it. If the nexus gets created without importing the facet back, the writer flag doesn't matter and can be set either way.

The FnReturn should generally be not initialized yet. The Facet constructor will check if FnReturn already has the labels _BEGIN_ and _END_ defined, and if either is missing, will add it to the FnReturn, then initialize it. If both _BEGIN_ and _END_ are already present, then the FnReturn can be used even if it's already initialized. But if any of them is missing, FnReturn must be not initialized yet, otherwise the Facet constructor will fail to add these labels.

The same FnReturn object may be used to create only one Facet object. And no, you can not import a Facet, get an FnReturn from it, then use it to create another Facet.

If anything goes wrong, the constructor will not throw but will remember the error, and later the exportNexus() will find it and throw an Exception from it.

static Facet *make(Onceref<FnReturn> fret, bool writer);

Same as the constructor, used for the more convenient operator priority for the chained calls.

static Facet *makeReader(Onceref<FnReturn> fret);
static Facet *makeWriter(Onceref<FnReturn> fret);

Syntactic sugar around the constructor, hardcoding the writer flag.

Normally the facets are constructed and exported with the chained calls, like:

Autoref<Facet> myfacet = to->exportNexus(
   Facet::makeWriter(FnReturn::make("My")->...)
   ->setReverse()
   ->exportTableType(Table::make(...)->...)
);

Because of this, the methods that are used for post-construction return the pointer to the original Facet object. They also almost never throw the Exceptions, to prevent the memory leaks through the orphaned Facet objects. The only way an Exception might get thrown is on an attempt to use these methods on an already imported Facet. Any errors get collected, and eventually exportNexus() will find them and properly throw an Exception, making sure that the Facet object gets properly disposed of.

Facet *exportRowType(const string &name, Onceref<RowType> rtype);

Add a row type to the Facet. May throw an Exception if the the facet is already imported. On other errors remembers them to be thrown on an export attempt.

Facet *exportTableType(const string &name, Autoref<TableType> tt);

Add a table type to the Facet. May throw an Exception if the the facet is already imported. On other errors remembers them to be thrown on an export attempt. The table type must also be deep-copyable and contain no errors. Not sure if I described this before, but if the deep copy can not proceed (say, a table type involves a Perl sort condition with a direct reference to the compiled Perl code) the deepCopy() method must still return a newly created object but remember the error inside it. Later when the table type is initialized, that object's initialization must return this error. The exportTableType() does a deep copy then initializes the copied table type. If this detects any errors, they get remembered and cause an Exception later in exportNexus().

Facet *setReverse(bool on = true);

Set (or clear) the nexus reverse flag. May throw an Exception if the the facet is already imported.

Facet *setQueueLimit(int limit);

Set the nexus queue limit. May throw an Exception if the the facet is already imported.

Erref getErrors() const;

Get the collected errors, so that they can be found without an export attempt.

bool isImported() const;

Check whether this facet is imported.

The rest of the methods are the same as in Perl. They can be used even if the facet is not imported.

bool isWriter() const;

Check whether this is a writer facet (or if returns false, a reader facet).

bool isReverse() const;

Check whether the underlying nexus is reverse.

int queueLimit() const;

Get the queue size limit of the nexus. Until the facet is exported, this will always return the last value set by setQueueLimit(). However if the nexus is reverse, on import the value will be changed to a very large integer value, currently INT32_MAX, and on all the following calls this value will be returned. Technically speaking, the queue size of the reverse nexuses is not unlimited, it's just very large, but in practice it amounts to the same thing.

FnReturn *getFnReturn() const;

Get the FnReturn object. If you plan to destroy the Facet object soon after this method is called, make sure that you put the FnReturn pointer into an Autoref first.

const string &getShortName() const;

Get the short name, AKA "as-name", which is the same as the FnReturn's name.

const string &getFullName() const;

Get the full name of the nexus imported through this facet. If the facet is not imported, will return an empty string.

typedef map<string, Autoref<RowType> > RowTypeMap;const RowTypeMap &rowTypes() const;

Get the map of the defined row types. Returns the reference to the Facet's internal map object.

typedef map<string, Autoref<TableType> > TableTypeMap;
const TableTypeMap &tableTypes() const;

Get the map of defined table types. Returns the reference to the Facet's internal map object.

RowType *impRowType(const string &name) const;

Find a single row type by name. If the name is not known, returns NULL.

TableType *impTableType(const string &name) const;

Find a single table type by name. If the name is not known, returns NULL.

Nexus *nexus() const;

Get the nexus of this facet. If the facet is not imported, returns NULL.

int beginIdx() const;
int endIdx() const;

Return the indexes (as in "integer offset") of the _BEGIN_ and _END_ labels in FnReturn.

bool flushWriter();

Flush the collected rowops into the nexus as a single Xtray. If there is no data collected, does nothing. Returns true on a successful flush (even if there was no data collected), false if the Triead was requested to die and thus all the data gets thrown away.

Facet reference, Perl, part 2

As mentioned before, a Facet object is returned from either a nexus creation or nexus import. Then the owner thread can work with it.

$result = $fa->same($fa2);

Check whether two references point to the same Facet object.

$name = $fa->getShortName();

Get the short name of the facet (AKA "as-name", with which it has been imported).

$name = $fa->getFullName();

Get the full name of the nexus represented by this facet. The name consists of two parts separated by a slash, "$thread/$nexus".

$result = $fa->isWriter();

Check whether this is a writer facet (i.e. writes to the nexus). Each facet is either a writer or a reader, so if this method returns 0, it means that this is a reader facet.

$result = $fa->isReverse();

Check whether this facet represents a reverse nexus.

$limit = $fa->queueLimit();

Get the queue size limit of the facet's nexus. I think I forgot to mention it in the Nexus reference, but for a reverse nexus the returned value will be a large integer (currently INT32_MAX but the exact value might change in the future). And if some different limit value was specified during the creation of the reverse nexus, it will be ignored.

$limit = &Triceps::Facet::DEFAULT_QUEUE_LIMIT();

The constant of the default queue size limit that is used for the nexus creation, unless explicitly overridden.

$fret = $fa->getFnReturn();

Get the FnReturn object of this facet. This FnReturn will have the same name as the facet's short name, and it has a special symbiotic relation with the Facet object. Its use depends on whether this is a reader or writer facet. For a writer facet, sending rowops to the labels in FnReturn (directly or by chaining them off the other labels) causes these rowops to be buffered for sending into the nexus. For a reader facet, you can either chain your logic directly off the FnReturn's labels, or push an FnBinding onto it as usual.

$nexus = $fa->nexus();

Get the facet's nexus. There is not a whole lot that can be done with the nexus object, just get the introspection information, and the same information can be obtained directly with the facet's methods.

$idx = $fa->beginIdx();

Index (as in "integer offset", not a table index) of the _BEGIN_ label in the FnReturn's set of labels. There probably isn't much use for this method, and its name is somewhat confusing.

$idx = $fa->endIdx();

Index (as in "integer offset", not a table index) of the _END_ label in the FnReturn's set of labels. There probably isn't much use for this method, and its name is somewhat confusing.

$label = $fa-> getLabel($labelName);

Get a label from FnReturn by name. This is a convenience method, equivalent to $fa->getFnReturn()->getLabel($labelName). Confesses if the label with this name is not found.

@rowTypes = $fa->impRowTypesHash();

Get ("import") the whole set of row types exported through the nexus. The result is an array containing the name-value pairs, values being the imported row types. This array can be assigned into a hash to populate it. As it happens, the pairs will be ordered by name in the lexicographical order but there are no future guarantees about it.

The actual import of the types is done only once, when the nexus is imported to create the facet, and the repeated calls of the imp* methods will return the same objects.

$rt = $fa->impRowType($rtName);

Get ("import") one row type by name. If the name is not known, will confess.

@tableTypes = $fa->impTableTypesHash();

Get ("import") the whole set of table types exported through the nexus. The result is an array containing the name-value pairs, values being the imported table types. This array can be assigned into a hash to populate it. As it happens, the pairs will be ordered by name in the lexicographical order but there are no future guarantees about it.

The actual import of the types is done only once, when the nexus is imported to create the facet, and the repeated calls of the imp* methods will return the same objects.

$tt = $fa->impTableType($ttName);

Get ("import") one table type by name. If the name is not known, will confess.

$result = $fa-> flushWriter();

Flush the collected buffered rowops to the nexus as a single Xtray. If there are no collected rowops, does nothing. Returns 1 if the flush succeeded (even if there was no data to send), 0 if this thread was requested to die and thus all the collected data gets thrown away, same as for the TrieadOwner::flushWriters(). The rules for when this method may be called is also the same: only after calling readyReady(), or it will confess.

If this facet is in an input-only Triead, this call may sleep if a drain is currently active, until the drain is released.

Sergey Babkin on CEP and stuff