Sergey Babkin on CEP and stuff: errors

Showing posts with label errors. Show all posts

Saturday, February 15, 2025

Asynchronous programming 10 - error handling

What if a computation fails? Then the future still gets completed but has to return an error. At the very least, if the future can return only one value, the returned object should have a place for error indication. But a better asynchronous library would have a place for error indication right in the futures. BTW, to do it right, it shouldn't be just an error code, it should be a proper error object that allows nested errors, and ideally also error lists, like Error in Triceps.

So suppose that our futures do have an error indication. How can these errors be handled?

Chaining between futures is easy: just the error chains through in the same way as the value. A nice consequence is that the asynchronous lock pattern and other patterns like that just work transparently, releasing the lock in any case, error or no error. However an error object may have references to the objects that we might not want to be stuck in the state of an asynchronous lock. And we don't want the unrelated code that locks the mutex next to start with an error. An error is applicable even to a void future, so it would get stuck even in one of those. So there should be a separated case of chaining to a void future that just passes through the completion but not the error. If your library doesn't have one, you can make one with a function chained to the first future and freshly completes the second future, ignoring the error.

Chaining functions is more complicated. In the simplest case we can just let the function check for error and handle it (and that's a good reason to have the whole input future as an argument and not just the value from it). But that means doing a lot of the same boilerplate error propagation code in a lot of functions.

The other option is to have the chaining code propagate the error directly form the input future to the result promise, ignoring the function in this case, and basically cancelling the chain. This is very much like how the exceptions work in normal programming, just skipping over the rest of function and returning an error, so this behavior should be the "normal" chaining, while a function that handles the error in its input future is more like a "catch" or "finally" statement. Note that if the function get skipped in case of an error, it doesn't really need to see the whole input future, it could as well get just the value from it. With this option, if you've prepared a million-long chain for a loop (not a great idea, better generate each iteration on by one), it will all get cancelled on the first error.

The third option (and these options are not mutually exclusive, they all can be available to use as needed) is to chain a function specifically as an error handler, to be run on error only. Which is even more like a "catch" statement than the previous case. But there is a catch: this makes the chain branch, which means that eventually the normal and error paths have to join back together with an AllOf. Which is not only a pain to always add explicitly but it also implies that the error path somehow has to complete even if there is no error, so again there has to be a chain cancellation logic for the error handlers but working in the opposite way, ignoring the functions on success. That's probably not worth the trouble, so the handling of errors by a separate function makes sense mostly as an one-off where depending on the success of the input future either the normal function or the error handler function get called, sending their result to the same promise, so the fork gets immediately joined back. This is just like doing an if-else in one function but has the benefit of allowing the composition, reusing the same error handling function with many normal processing functions, being truly like a "catch" statement. This pattern is particularly convenient for adding some higher-level details to the error object (as in building a stack trace in normal programming).

The next item is the error handling in AllOf, or in its generalization to map-reduce. How much do we care that some of the branches have failed? A typical case would probably be where we want all of them to succeed, so for that AllOf should collect all the errors from all the branches into one error object.

What about a semaphore? There in a natural way any returned error would cause the whole semaphore to be cancelled. If the semaphore represents a limited-parallelized loop, that's what we'd probably want anyway. Well, due to the parallelism, there might be hiccups where there might be some more iterations scheduled by the other futures completing normally at the same time as the one with the error. One possible race comes from the semaphore logic picking one future from the run queue, executing it, and then coming back when its result promise completes. The queue itself is unlocked after the head future got picked from it, so another completed future would pick the next head future from the queue before the first one completes the loop of error propagation. Another possible race comes from the part where it's OK to add more work to the semaphore while running on one of its own chains. So if instead of pre-generating all the iteration head futures in advance we just put into the semaphore one future with a function than on running generates the head of one iteration (but doesn't start it yet!), then reattaches itself to the semaphore, and then lets that iteration run by chaining it to the input future. And of course if this iteration generation function doesn't check for errors, the reattached copy can grab a successfully completed future and run an iteration. So it would help to also add explicit logic to the semaphore that just cancels all the outstanding incoming futures once one of them gets an error. And also could pay attention to errors in the iteration generation function to stop generating once an error is seen.

Of course, not every semaphore needs to be self-collapsing on error, some of them are used for general synchronization, and should ignore the errors.

The most complicated question of error handling is: can we stop the ongoing parallel iterations of a parallel loop when one iteration gets an error? This can be done by setting a flag in a common context, checking it in every function, and bailing out immediately with a special error if this flag is set. This is kind of a pain to do all over the place. So maybe this can be folded into the futures/promises themselves: create a cancellation object and attach it to the futures, so when completing a future with a cancellation object attached, it would check if the cancellation is true and replace the result with a cancellation error instead. Note that this would not happen everywhere but only on the futures where the cancellation object is attached. So when you call into a library, you can't attach the cancellation objects to the futures created inside the library. And you can't always quickly cancel the future that waits for the library call to return because the library might still be using some memory in your state (although this of course depends a lot on the library API, if all the memory is controlled by reference counters then we don't care, we can just let it run and ignore the result).

Can we propagate the cancellation object between futures, so that they would even go through the insides of a library? Generally, yes, we can do it on chaining, But that takes some care.

First, the propagation must stop once we reach the end of the whole logical operation, and also must stop when we go to the void futures for the patterns like the asynchronous mutex. And stop even for non-void futures in the patterns like the cache, where one caller asking to cancel the read shouldn't cancel the read for everyone.

Second, the functions that create intermediate promises form scratch must have a way to propagate the cancellations from their inputs to these newly created promises.

Third, the libraries need to be able to do their own cancellations too, so it's not a single cancellation object but a set of cancellation objects per future, with the overhead of checking them all on every step (and yes, also with overhead of attaching them all to every step). Although if the sets are not modified often, maybe an optimized version can be devised where the overhead is taken at the set creation time and then the set consolidates the state of all the cancellations in it, making necessary to attach only one set and check the state of only one set.

Fourth, what about the system calls to the OS, which on a microkernel OS would likely translate to calls in the other processes? The cancellation state cannot be read from other address spaces. Which basically means that as we cross the address space boundary, we need to create a matching cancellation object (and here treating the whole set of cancellation objects as one object helps too) on the other side, remember this match on our side, and then have a system call that would propagate the cancellation to the other side. Fairly complicated but I think doable. Of course, at some point this whole path will get down to the hardware, and there we won't be able to actually interrupt an ongoing operation, but we can arrange to ignore its result and return. And there are things that can't be ignored, for example an app might suddenly stop caring whether its buffer write has succeeded or not, but a filesystem can't ignore whether a metadata block write succeeded or not. However this filesystem shouldn't keep the app waiting, if the app has lost interest, the filesystem can sort out its metadata writes in the background.

Fifth, between this filesystem write example and the cache example, a cancellation flag also needs to have a future connected to it, that would get completed with a cancellation error when the cancellation is triggered. We can then chain from this future directly to the result future of the cache read or block write, "overtaking" the normal result to essentially do an "anyOf", with the first completion setting the result (including error) into the future and any following completion attempts to set the result getting ignored. A catch is that when one path completes, the other will still hold a reference on the result future, potentially causing the unpleasant memory leaks. And also the cancellation future would keep accumulating these chainings to it after each operation under it gets normally completed. Maybe the cancellation objects would be short-lived and this wouldn't be a real problem. Or maybe this will require to think of a way for un-chaining once it gets overtaken by completion of another path.

The final thing to say is that the C++ coroutines don't seem smart enough to translate the error handling in promises to look like exception handling at high level. And this is a very important subject, so maybe the coroutines are not the answer yet.

Friday, December 26, 2014

error reporting in the templates

When writing the Triceps templates, it's always good to make them report any usage errors in the terms of the template (though the extra detail doesn't hurt either). That is, if a template builds a construction out of the lower-level primitives, and one of these primitives fail, the good approach is to not just pass through the error from the primitive but wrap it into a high-level explanation.

This is easy to do if the primitives report the errors by returning them directly, as Triceps did in the version 1. Just check for the error in the result, and if an error is found, add the high-level explanation and return it further.

It becomes more difficult when the errors are reported like the exceptions, which means in Perl by die() or confess(). The basic handling is easy, there is just no need to do anything to let the exception propagate up, but adding the extra information becomes difficult. First, you've got to explicitly check for these errors by catching them with eval() (which is more difficult than checking for the errors returned directly), and only then can you add the extra information and re-throw. And then there is this pesky problem of the stack traces: if the re-throw uses confess(), it will likely add a duplicate of at least a part of the stack trace that came with the underlying error, and if it uses die(), the stack trace might be incomplete since the native XS code includes the stack trace only to the nearest eval() to prevent the same problem when unrolling the stacks mixed between Perl and Triceps scheduling.

Because of this, some of the template error reporting got worse in Triceps 2.0.

Well, I've finally come up with the solution. The solution is not even limited to Triceps, it can be used with any kind of Perl programs. Here is a small example of how this solution is used, from Fields::makeTranslation():

    my $result_rt = Triceps::wrapfess
        "$myname: Invalid result row type specification:",
        sub { Triceps::RowType->new(@rowdef); };

The function Triceps::wrapfess() takes care of wrapping the confessions. It's very much like the try/catch, only it has the hardcoded catch logic that adds the extra error information and then re-throws the exception.

Its first argument is the error message that describes the high-level problem. This message will get prepended to the error report when the error propagates up (and the original error message will get a bit of extra indenting, to nest under that high-level explanation).

The second argument is the code that might throw an error, like the try-block. The result from that block gets passed through as the result of wrapfess().

The full error message might look like this:

Triceps::Fields::makeTranslation: Invalid result row type specification:
Triceps::RowType::new: incorrect specification:
    duplicate field name 'f1' for fields 3 and 2
    duplicate field name 'f2' for fields 4 and 1
Triceps::RowType::new: The specification was: {
    f2 => int32[]
    f1 => string
    f1 => string
    f2 => float64[]
} at blib/lib/Triceps/Fields.pm line 209.
    Triceps::Fields::__ANON__ called at blib/lib/Triceps.pm line 192
    Triceps::wrapfess('Triceps::Fields::makeTranslation: Invalid result row type spe...', 'CODE(0x1c531e0)') called at blib/lib/Triceps/Fields.pm line 209
    Triceps::Fields::makeTranslation('rowTypes', 'ARRAY(0x1c533d8)', 'filterPairs', 'ARRAY(0x1c53468)') called at t/Fields.t line 186
    eval {...} called at t/Fields.t line 185

It contains both the high-level and the detailed description of the error, and the stack trace.

The stack trace doesn't get indented, no matter how many times the message gets wrapped. wrapfess() uses a slightly dirty trick for that: it assumes that the error messages are indented by the spaces while the stack trace from confess() is indented by a single tab character. So the extra spaces of indenting are added only to the lines that don't start with a tab.

Note also that even though wrapfess() uses eval(), there is no eval above it in the stack trace. That's the other part of the magic: since that eval is not meaningful, it gets cut from the stack trace, and wrapfess() also uses it to find its own place in the stack trace, the point from which a simple re-confession would dump the duplicate of the stack. So it cuts the eval and everything under it in the original stack trace, and then does its own confession, inserting the stack trace again. This works very well for the traces thrown by the XS code, which actually doesn't write anything below that eval; wrapfess() then adds the missing part of the stack.

Wrapfess() can do a bit more. Its first argument may be another code reference that generates the error message on the fly:

    my $result_rt = Triceps::wrapfess sub {
            "$myname: Invalid result row type specification:"
        },
        sub { Triceps::RowType->new(@rowdef); };

In this small example it's silly but if the error diagnostics is complicated and requires some complicated printing of the data structures, it will be called only if the error actually occurs, and the normal code path will avoid the extra overhead.

It gets even more flexible: the first argument of wrapfess() might also be a reference to a scalar variable that contains either a string or a code reference. I'm not sure yet if it will be very useful but it was easy to implement. The idea there is that it allows to write only one wrapfess() call and then change the error messages from inside the second argument, providing different error reports for its different sections. Something like this:

    my $eref;
    return Triceps::wrapfess \$eref,
        sub {
$eref = "Bad argument foo";
          buildTemplateFromFoo();
$eref = sub {
my $bardump = $argbar->dump();
$bardump =~ s/^/    /mg;
return "Bad argument bar:\n bar value is:\n$bardump";
   }
          buildTemplateFromBar();
...

       };

It might be too flexible, we'll see how it works.

Internally, wrapfess() uses the function Triceps::nestfess() to re-throw the error. Nestfess() can also be used directly, like this:

eval {
buildTemplatePart();
};
if ($@) {

Triceps::nestfess("High-level error message", $@);
}

The first argument is the high-level descriptive message to prepend, the second argument is the original error caught by eval. Nestfess() is responsible for all the smart processing of the indenting and stack traces, wrapfess() is really just a bit of syntactic sugar on top of it.

Thursday, July 11, 2013

no more explicit confessions

It's official: all the code has been converted to the new error handling. Now if anything goes wrong, the Triceps Perl calls just confess right away. No more need for the pattern 'or confess "$!"' that was used throughout the code (though of course you can still use it for handling the other errors).

It also applies to the error checks done by the XS typemaps, these will also confess automatically.

I've also added one more method that doesn't confess: IndexType::getTabtypeSafe(). If the index type is not set into a table type, it will silently return an undef without any error indications.

On a related note, the construction of the Type subclasses has been made nicer in the C++: instead of calling abort() on the major errors, they now throw Exceptions. Mind you, these exceptions are thrown not in the constructors as such but in the chainable methods that set the contents of the types. And they try to be smart enough to preserve the reference count correctness: if the object was not assigned into any reference yet (as is typical for the chained calls), they take care to temporarily increase and decrease the reference count, thus freeing the object, before throwing. Of course, the default reaction to Exceptions is still to dump core, but need be, these exceptions can be caught.

Sunday, July 7, 2013

safe functions in RowHandle

As I'm updating the error reporting in the Perl methods, there is one more class that has grown the safe (non-confessing functions). In RowHandle now the method

$row = $rh->getRow();

confesses if the RowHandle is NULL. The method

$row = $rh->getRowSafe();

returns an undef in this situation, just like getRow() used to, only now it doesn't set the text in $! any more. A consequence is that some of the Aggregator examples that branch directly on checking whether a row handle contains NULL, now had to be changed to use getRowSafe().

The method

$result = $rh->isInTable();

has also been updated for the case when it contains a NULL: now it simply returns 0 (instead of undef) and doesn't set the text in $!.

Friday, July 5, 2013

carp carpity carp

I've found that calling Carp::confess (more exactly, even the lower-level function Carp::longmess that gets called by Carp::confess and by Triceps's error reporting from the C++ code) in a threaded program leaks the scalars, apparently by leaving garbage on the Perl stack.

The problem seems to be in the line "package DB;" in the middle of one of its internal functions. Perhaps changing the package in the middle of a function is not such a great idea, leaving some garbage on the stack. The most interesting part is that this line can be removed altogether, with no adverse effects, and then the leak stops.

Oh, well, looks like the homebrewn Triceps::confess will be coming soon. And I'd like to find some contact to get the stock Carp fixed.

Tuesday, June 18, 2013

$! !!!!!!

Turns out, the more recent versions of Perl (starting with 5.16.4 or maybe even earlier) have changed the way they treat the error variable $!. This special can not be assigned any arbitrary error text any more, now they want it to be a proper integer OS error code. So all the Triceps error reporting through $! breaks on the newer versions of Perl. The code still works, except that the text of the errors can not be extracted,

Well, looks like it's about time to bite the bullet and finally convert everything to the newer and better error reporting with confessions.

Monday, March 4, 2013

more printf for the errors

I've been using strprintf() repeatedly for the error messages and exceptions, and I've come up with a better way for it.

First, Ive added a var-args version of strprintf(), in common/Strprintf.h:

string vstrprintf(const char *fmt, va_list ap);

You can use it to create strings from other functions taking the printf-like arguments.

Next go the extensions to the Errors class. The consistent theme there is "check if the Errors reference (Erref) is NULL, if it is, allocate a new Errors, and then add a formatted error message to it". So I've added the new methods not to Errors but to Erref. They check if the Erref object is NULL, allocate a new Errors object into it if needed, and then format the arguments. The simplest one is:

void f(const char *fmt, ...);

It adds a simple formatted message, always marked as an error. You use it like this:

Erref e; // initially NULL
...
e.f("a message with integer %d", n);

The message may be multi-line, it will be split appropriately, like in Errors::appendMultiline().

The next one is even more smart:

bool fAppend(Autoref<Errors> clde, const char *fmt, ...);

It first checks that the child errors object is not NULL and contains an error, and if it does then it does through the dance of allocating a new Errors object if needed, appends the formatted message, and the child errors. The message goes before the child errors, unlike the method signature. So you can use it blindly like this to do the right thing:

Autoref<Errors> checkSubObject(int idx);
...
for (int i = 0; i < sz; i++)
e.fAppend(checkSubObject(i), "error in the sub-object %d:", i);

Same as before, you can use the multi-line error messages.

Next goes the same thing for Exception:

static Exception f(const char *fmt, ...);
static Exception fTrace(const char *fmt, ...);

these are the static factory methods that create an Exception object with the message, and either without and with the stack trace. They are used like

throw Exception::f("a message with integer %d", n);

And the similar methods for construction with the nested errors:

static Exception f(Onceref<Errors> err, const char *fmt, ...);
static Exception fTrace(Onceref<Errors> err, const char *fmt, ...);

Unlike the Erref method, these work unconditionally (since their result is normally used in throw, and it's too late to do anything by that time), so you better make sure in advance that there is a child error. A typical usage would be like this:

try {
...
} catch (Exception e) {
throw Exception(e.getErrors(), "error at stage %d:", n);
}

Again, in the resulting exception the message goes before the nested errors.

Tuesday, January 1, 2013

Streaming function helper classes

A couple more of helper classes are defined in sched/FnReturn.h.

ScopeFnBind does a scoped pushing and popping of a binding on an FnReturn. Its only method is the constructor:

ScopeFnBind(Onceref<FnReturn> ret, Onceref<FnBinding> binding);

It's used as:

{
    ScopeFnBind autobind(ret, binding);
    ...
}

It will pop the binding at the end of the block. An unpleasant feature is that if the return stack get messed up, it will throw an Exception from a destructor, which is a big no-no in C++. However since normally in the C++ code the Triceps Exception is essentially an abort, this works good enough. If you make the Exception catchable, such as when calling the C++ code from an interpreter, you better make very sure that the stack can not get corrupted, or do not use ScopeFnBind.

AutoFnBind is a further extension of the scoped binding. It does three additional things: It allows to push multiple bindings on multiple returns as a group, popping them all on destruction. It's a reference-counted Starget object, which allows the scope to be more than one block. It also has a more controllable way of dealing with the exceptions. This last two properties allow to use it from the Perl code, making the scope of a Perl block, not C++ block, and to pass the exceptions properly back to Perl.

AutoFnBind();
AutoFnBind *make();

The constructor just creates an empty object which then gets filled with bindings.

AutoFnBind *add(Onceref<FnReturn> ret, Autoref<FnBinding> binding);

Add a binding, in a chainable fashion. The simple-minded of using the AutoFnBind is:

{
    Autoref<AutoFnBind> bind = AutoFnBind::make()
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
}

However if any of these add()s throw an Exception, this will leave an orphaned AutoFnBind object, since the throwing would happen before it has a chance to do the reference-counting. So the safer way to use it is:

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
}

Then the AutoFnBind will be reference-counted first, and if an add() throws later, this will cause a controlled destruction of the Autoref and of AutoFnBind.

But it's not the end of the story yet. The throws on destruction are still a possibility. To catch them, use an explicit clearing:

void clear();

Pops all the bindings. If any Exceptions get thrown, they can get caught nicely. It tries to be real smart, going through all the bindings in the backwards order and popping each one of them. If a pop() throws an exception, its information will be collected but clear() will then continue going through the whole list. At the end of the run it will make sure that it doesn't have any references to anything any more, and then will re-throw any collected errors as a single Exception. This cleans up the things as much as possible and as much as can be handled, but the end result will still not be particularly clean: the returns that got their stacks corrupted will still have their stacks corrupted, and some very serious application-level cleaning will be needed to continue. Probably a better choice would be to destroy everything and restart from scratch. But at least it allows to get safely to this point of restarting from scratch.

So, the full correct sequence will be:

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
    ...
bind->clear() ;
}

Or if any code in "..." can throw anything, then something like (not tested, so use with caution):

{
    Autoref<AutoFnBind> bind = new AutoFnBind;
    bind
        ->add(ret1, binding1)
        ->add(ret2, binding2);
try {
    ...
    } catch (Triceps::Exception e) {
try {
      bind->clear() ;
} catch (Triceps::Exception ee) {
e->getErrors()->append("Unbinding errors triggered by the last error:", ee->getErrors());
}
throw;
} catch (exception e) {
      bind->clear() ;
throw;

    }
}

It tries to be nice if the exception thrown from "..." was a Triceps one, and add nicely any errors from the binding clearing to it.

Finally, a little about how the Perl AutoFnBind translates to the C++ AutoFnBind:

The Perl constructor creates the C++-level object and adds the bindings to it. If any of them throw, it destroys everything nicely and translates the Exception to Perl. Otherwise it saves a reference to the AutoFnBind in a wrapper object that gets returned to Perl.

The Perl destructor then first clears the AutoFnBind and catches if there is any Exception. However there is just no way to return a Perl exception from a Perl destructor, so it juts prints the error on stderr and calls exit(1). If no exception was thrown, the AutoFnBind gets destroyed nicely by removing the last reference.

For the nicer handling, there is a Perl-level method clear() that does the clearing and translates the exception to Perl.

Saturday, August 11, 2012

The Exception

There are different ways to report the errors. Sometimes a function would return a false value. Sometime it would return an Erref with an error in it. And there is also a way to throw the exceptions.

In general I don't particularly like the exceptions. They tend to break the logic in the unexpected ways, and if not handled properly, mess up the multithreading. The safe way of working with exceptions is with the scope-based variables. This guarantees that all the allocated memory will be freed and all the locked data will be unlocked when the block exits, naturally or on an exception. However not everything can be done with the scopes, and this results in a whole mess of try-catch blocks, and a missed catch can mess up the program in horrible ways.

However sometimes the exceptions come handy. They have been a late addition to version 1.0. They are definitely here to stay for the communication in the XS code and for the user-defined handlers in C++ but other than that I'm not so sure about whether and how they would be used internally by Triceps. Not all the Triceps code works correctly with the exceptions yet, and the experience of converting it for the exception handling has not been entirely positive. So far the only part that can deal with the exceptions nicely is the scheduler and the user-defined labels. Not the aggregators nor user-defined indexes.

But for the user C++ code for the most part it doesn't matter. In Triceps the approach is that the exceptions are used for the substantially fatal events. If the user attempts to do something that can't be executed, this qualifies for an exception. Essentially, use the exceptions for the things that qualify for the classic C abort() or assert(). The idea is that at this point we want to print an error message, print the call stack the best we can, and dump the core for the future analysis.

Why not just use an abort() then? In the C++ code you certainly can if you're not interested in the extra niceties provided by the exceptions. In fact, that's what the Triceps exceptions do by default: when you construct an exception, they print a log message and the stack trace (using a nice feature of glibc) then abort. The error output gives the basic idea of what went wrong and the rest can be found from the core file created by abort().

However remember that Triceps is designed to be embedded into the interpreted (or compiled too) languages. When something goes wrong inside the Triceps program in Perl, you don't want to get a core dump of the Perl interpreter. An interpreted program must never ever crash the interpreter. You want to get the error reported in the Perl die() or its nicer cousin confess(), and possibly get intercepted in eval{}. So the Perl wrapper of Triceps changes the mode of Triceps exceptions to actually throw the C++ exceptions instead of aborting. Since the Perl code is not really interested in the details at the C++ level, the C++ stack trace is in this case configured to not be included into the text of the exception. However another interesting thing happens: if the exception happened in a label handler, the Triceps scheduler stack gets unwound and the information about it gets included. Eventually the XS interface does an analog of confess(), including the Perl stack trace. When the code goes through multiple layers of Perl and C++ code (Perl code calling the Triceps scheduler, calling the label handlers in Perl, calling the Triceps scheduler again etc.), the whole layered sequence gets nicely unwound and reported. However the state of the scheduler suffers along the way: all the scheduled rowops get freed when their stack frame is unwound, so prepare to repair the state of your model if you catch the exception.

If you are willing to handle the exceptions (for example, if you add elements dynamically by user description and don't want the whole program to abort because of one faulty description), you can do the same in C++. Just disable the abort mode for the exceptions and catch them. Of course, it's even better to catch your exceptions before they reach the Triceps scheduler, since then you won't have to repair the state.

The same feature comes handy in the unit test: when you test for the detection of a fatal error, you don't want you test to abort, you want it to throw a nice catchable exception.

After all this introductory talk, to the gritty details. The class is Exception (as usual, in the namespace Triceps or whatever custom namespace you define as TRICEPS_NS), defined in common/Exception.h. Inside it is an Erref with the errors. An Exception can be constructed in multiple ways:

explicit Exception(Onceref<Errors> err, bool trace);

The basic constructor. if trace==true, the C++ stack trace will be added to the messages, if it is otherwise permitted by the exception modes. If trace==false, the stack trace definitely won't be added. Why would you want to not add the stack trace? Generally, if you catch an exception, add some information to it and re-throw a new exception. The information from the original exception will contain the full stack trace, so there is no need to include the partial stack trace again. Also, if you throw an exception with high-level information (in Perl or such), you don't need to put any C++ stack info into it.

The Errors are remembered by reference, so changing them later will change the contents of the exception.

explicit Exception(const string &err, bool trace);

A convenience constructor to make a simple string with the error. Internally creates an Errors object with the string in it. The string gets usually created with strprintf().

explicit Exception(Onceref<Errors> err, const string &msg);
explicit Exception(Onceref<Errors> err, const char *msg);
explicit Exception(const Exception &exc, const string &msg);

Wrapping a nested error with a descriptive message and re-throwing it.

virtual const char *what();

The usual, returns the text of the error messages in the exception.

virtual Errors *getErrors() const;

Returns the Errors object from the exception.

The modes I've mentioned before are set with the class static variables:

static bool abort_;

Flag: when attempting to create an exception, instead print the message and abort. This behavior is more convenient for debugging of the C++ programs, and is the default one. Also forces the stack trace in the error reports. The interpreted language wrappers should reset it to get the proper exceptions. Default: true.

static bool enableBacktrace_;

Flag: enable the backtrace if the constructor requests it. The interpreted language wrappers should reset it to remove the confusion of the C stack traces in the error reports. Default: true.

Error reports

When building some kind of a language, the complicated errors often need to be reported. Often there are many errors at a time, or an error that needs to be tracked through multiple nested layers. And then these error messages need to be nicely printed, with the indentation by nested layers. Think of the errors from a C++ compiler. Triceps is a kind of a language, so it has a class to represent such errors. It hasn't propagated to the Perl layer yet and is available only in the C++ API.

The class is Errors, defined in common/Errors.h, and inheriting from Starget (for single-threaded reference counting). The references to it are used so often, that Autoref<Errors> is typedefed to have its own name Erref (yes, that's 2 "r"s, not 3).

In general it contains messages, not all of which have to be errors. Some might be warnings. But in practice is has turned out that without a special dedicated compile stage it's hard to report the warnings. Even when there is a special compile stage, and the code gets compiled before it runs, as in Aleri, with the warnings written to a log file, still people rarely pay attention to the warnings. You would not believe, how may people would be calling support while the source of their problem is clearly described in the warnings in the log file. Even in C/C++ it's difficult to pay attention to the warnings. I better like the approach of a separate lint tool for this purpose: at least when you run it, you're definitely looking for warnings.

Because of this, the current Triceps approach is to not have warnings. If something looks possibly right but suspicious, report it as an error but provide an option to override that error (and tell about that option in the error message).

In general, the Errors are built of two kinds of information:

the error messages
the nested Errors reports

More exactly, an Errors contains a sequence of elements, each of which may contain a string, a nested Errors object, or both. When both, the idea is that the string gives a high-level description and the location of the error in the high-level object while the nested Errors dig into the further detail. The string gets printed before the nested Errors. The nested Errors get printed with an indentation. The indentation gets added only when the errors get "printed", i.e. the top-level Errors object gets converted to a string. Until then the elements may be nested every which way without incurring any extra overhead.

Obviously, you must not try to nest an Errors object inside itself, directly or indirectly. Not only will it create a memory reference cycle, but also an endless recursion when printing.

The basic way to create an Errors is:

Errors(bool e = false);

Where "e" is an indicator than it contains an actual error. It will also be set when an error message is added, or whan a nested Errors with an error in it is added.

There also are a number of convenience constructors that make one-off Errors from one element:

Errors(const char *msg);
Errors(const string &msg);
Errors(const string &msg, Autoref<Errors> clde);

In all of them the error flag is always set, and the message is checked for being multi-line (that is, containing '\n' in the middle of it), and if so, it gets broken up into multiple messages, one per line.

When an Errors object is constructed, more elements can be added to it:

void appendMsg(bool e, const string &msg);
void appendMultiline(bool e, const string &msg);
bool append(const string &msg, Autoref<Errors> clde);

The "e" shows whether the message is an error message. In append() the indication of the error presence is take from the child element clde. The appendMsg() expects a single-liner message, don't use a '\n' in it! The appendMultiline() will safely break a multi-line message into multiple single-liners and will ignore the '\n' at the end.

In all the cases of adding a nested child element, it's safe to pass a NULL. If it's a NULL or contains no data in it, the parent will ignore it, except for the error indication that would be processed anyway. Moreover, if the clde is empty, append() will also ignore the string part, and will add nothing. The return value of append() will be true if the child element contained any data in it or an error indication flag. This can be used together with another method

void replaceMsg(const string &msg);

to add a complex high-level description if a child element has reported an error:

Erref clde = someThing(...);
if (e.append("", clde)) {
    string msg;
    // ... generate msg in some complicated way
    e.replaceMsg(msg);
}

The replaceMsg() replaces the string portion of the last element, which owns the last added child error.

The typical way to create the messages is with strprintf(), which is like sprintf() but returns a C++ std::string. It's defined in common/Strprintf.h, or as a part of the typical collection in common/Common.h.

It's also possible to append the contents of another Errors directly, without nesting it:

bool absorb(Autoref<Errors> clde);

The return value has the same meaning as with append(). Finally, an Errors object can be cleared to its empty state:

void clear();

To get the number of elements in Errors, use

size_t size() const;

However the more typical methods are:

bool isEmpty();
bool hasError();

They check whether there is nothing at all or whether there is an error. The special convenience of these methods is that they can be called on NULL pointers. Quite a few Triceps methods return a NULL Erref if there was no error.Even if er is NULL, calling

er->isEmpty()
er->hasError()
parent_er->append(msg, er)
parent_er->absorb(er)

is still safe and officially supported. But NOT er->size().

The data gets extracted from Erref by converting it to a string, either appending to an existing string, or adding a new one.

void printTo(string &res, const string &indent = "", const string &subindent = " ");
string print(const string &indent = "", const string &subindent = " ");

The indent argument specifies the initial indentation, subindent the additional indentation for each level.

Sergey Babkin on CEP and stuff