Sergey Babkin on CEP and stuff: options

Showing posts with label options. Show all posts

Tuesday, June 11, 2013

options passing through

I've already shown it in the examples, but here is also the official description: you can accept the arbitrary options, typically if your function is a wrapper to another function, and you just want to process a few options and let the others through. The Triead::start() is a good example, passing the options through to the main function of the thread.

You specify the acceptance of the arbitrary options by using "*" in the Opt::parse() arguments. For example:

&Triceps::Opt::parse($myname, $opts, {
    app => [ undef, \&Triceps::Opt::ck_mandatory ],
    thread => [ undef, \&Triceps::Opt::ck_mandatory ],
    fragment => [ "", undef ],
    main => [ undef, sub { &Triceps::Opt::ck_ref(@_, "CODE") } ],
    '*' => [],
}, @_);

The specification array for "*" is empty. The unknown options will be collected in the array referred to from $opts->{'*'}, that is @{$opts->{'*'}}.

From there on your wrapper has the choice of either passing through all the options to the wrapped function, using @_, or explicitly specifying a few options and passing through the rest from @{$opts->{'*'}}.

There is also the third possibility: filter out only some of the incoming options. This can be done with Opt::drop(). For example, Triead::startHere() works like this:

our @startOpts = (
app => [ undef, \&Triceps::Opt::ck_mandatory ],
thread => [ undef, \&Triceps::Opt::ck_mandatory ],
fragment => [ "", undef ],
main => [ undef, sub { &Triceps::Opt::ck_ref(@_, "CODE") } ],
);

sub startHere # (@opts)
{
my $myname = "Triceps::Triead::start";
my $opts = {};
my @myOpts = ( # options that don't propagate through
    harvest => [ 1, undef ],
    makeApp => [ 1, undef ],
);

&Triceps::Opt::parse($myname, $opts, {
    @startOpts,
    @myOpts,
    '*' => [],
}, @_);

my @args = &Triceps::Opt::drop({
    @myOpts
}, \@_);
@_ = (); # workaround for threads leaking objects

# no need to declare the Triead, since all the code executes synchronously anyway
my $app;
if ($opts->{makeApp}) {
    $app = &Triceps::App::make($opts->{app});
} else {
    $app = &Triceps::App::resolve($opts->{app});
}
my $owner = Triceps::TrieadOwner->new(undef, undef, $app, $opts->{thread}, $opts->{fragment});
push(@args, "owner", $owner);
eval { &{$opts->{main}}(@args) };
...

The @startOpts are both used by the startHere() and passed through. The @myOpts are only used in startHere() and do not pass through. And the rest of the options pass through without baing used in startHere(). So the options from @myOpts get dropped from @_, and the result goes to the main thread.

The Opt::drop() takes the specification of the options to drop as a hash reference, the same as Opt::parse(). The values in the hash are not important in this case, only the keys are used. But it's simpler to store the same specification of the options and reuse it for both parse() and drop() than to write it twice.

There is also an opposite function, Opt::dropExcept(). It passes through only the listed options and drops the rest. It can come handy if your wrapper wants to pass different subsets of its incoming options to multiple functions.

The functions drop() and dropExcept() can really be used on any name-value arrays, not just the options as such. And the same goes for the Fields::filter() and friends. So you can use them interchangeably: you can use Opt::drop() on the row type specifications and Fields::filter() on the options if you feel that it makes your code simpler.

Tuesday, April 30, 2013

ThreadedServer, part 2

The next part of the ThreadedServer is listen(), the function that gets called from the listener thread and takes care of accepting the connections and spawning the per-client threads.

sub listen # ($optName => $optValue, ...)
{
    my $myname = "Triceps::X::ThreadedServer::listen";
    my $opts = {};
    my @myOpts = (
        owner => [ undef, sub { &Triceps::Opt::ck_mandatory(@_); &Triceps::Opt::ck_ref(@_, "Triceps::TrieadOwner") } ],
        socket => [ undef, sub { &Triceps::Opt::ck_mandatory(@_); &Triceps::Opt::ck_ref(@_, "IO::Socket") } ],
        prefix => [ undef, \&Triceps::Opt::ck_mandatory ],
        handler => [ undef, sub { &Triceps::Opt::ck_mandatory(@_); &Triceps::Opt::ck_ref(@_, "CODE") } ],
        pass => [ undef, sub { &Triceps::Opt::ck_ref(@_, "ARRAY") } ],
    );
    &Triceps::Opt::parse($myname, $opts, {
        @myOpts,
        '*' => [],
    }, @_);
    my $owner = $opts->{owner};
    my $app = $owner->app();
    my $prefix = $opts->{prefix};
    my $sock = $opts->{socket};

The first part is traditional, the only thing to note is that it also saves the list of options in a variable for the future use (startServer() did that too but I forgot to mention it). And the option "*" is the pass-through: it gets all the unknown options collected in $opts->{"*"}.

    my $clid = 0; # client id

    while(!$owner->isRqDead()) {
        my $client = $sock->accept();
        if (!defined $client) {
            my $err = "$!"; # or th etext message will be reset by isRqDead()
            if ($owner->isRqDead()) {
                last;
            } elsif($!{EAGAIN} || $!{EINTR}) { # numeric codes don't get reset
                next;
            } else {
                confess "$myname: accept failed: $err";
            }
        }

The accept loop starts. It runs until the listening socket gets revoked by shutdown, the revocation is done by dup2() of a file descriptor from /dev/null over it. Note that this function itself doesn't request to track the file descriptor for revocation. That's the caller's responsibility.

Oh, and by the way, if you've been tracing the possible ways of thread execution closely, you might be wondering: what if the shutdown happens after the socket is opened but before it is tracked? This can't really happen to the listening socket but can happen with the accepted client sockets. The answer is that the tracking enrollment will check whether the shutdown already happened, and if so, it will revoke the socket right away, before returning. So the reading loop will find the socket revoked right on its first iteration.

The revocation also sends the signal SIGUSR2 to the thread. This is done because the calls like accept() do not return on a simple revocation by dup2() but a signal does interrupt them. Triceps registers an empty handler for SIGUSR2 that just immediately returns, but the accept() system call gets interrupted, and the thread gets a chance to check for shutdown, and even if it calls accept() again, this will return an error and force it check for shutdown again.

And by the way, the Perl's threads::kill doesn't send a real signal, it just sets a flag for the interpreter. If you try it on your own, it won't interrupt the system calls, and now you know why. Instead Triceps gets the POSIX thread identity from the Perl thread and calls the honest ptherad_kill() from the C++ code.

So, the main loop goes by the shutdown condition, isRdDead() tells if this thread has been requested to die. After accept(), it checks for errors. The first thing to check for is again the isRdDead(), because the revocation will manifest as a socket operation error, and there is no point in reporting this spurious error. However, like other Triceps calls, isRdDead() will clear the error text, and the text has to be saved first. If the shutdown is found, the loop exits. Then the check for the spurious interruptions is done, and for them the loop continues. Funny enough, the $!{} uses the numeric part of $! that is independent from its text part and doesn't get cleared by the Triceps calls. And on any other errors the thread confesses. This will unroll the stack, eventually get caught by the Triceps threading code, abort the App, and propagate the error message to the harvester.

        $clid++;
        my $cliname = "$prefix$clid";
        $app->storeCloseFile($cliname, $client);

        Triceps::Triead::start(
            app => $app->getName(),
            thread => $cliname,
            fragment => $cliname,
            main => $opts->{handler},
            socketName => $cliname,
            &Triceps::Opt::drop({ @myOpts }, \@_),
        );

        # Doesn't wait for the new thread(s) to become ready.
    }
}

If a proper connection has been received, the socket gets stored into the App with a unique name, for later load by the per-client thread. And then the per-client thread gets started.

The drop passes through all the original options except for the ones handled by this thread. In retrospect, this is not the best solution for this case. It would be better to just use @{$opts->{"*"}} instead. The drop call is convenient when not all the explicitly recognized options but only a part of them has to be dropped.

After starting the thread, the loop doesn't call readyReady() but goes for the next iteration. This is basically because it doesn't care about the started thread and doesn't ever send anything to it. And waiting for the threads to start will make the loop slower, possibly overflowing the socket's listening queue and dropping the incoming connections if they arrive very fast.

And the last part of the ThreadedServer is the printOrShut:

sub printOrShut # ($app, $fragment, $sock, @text)
{
    my $app = shift;
    my $fragment = shift;
    my $sock = shift;

    undef $!;
    print $sock @_;
    $sock->flush();

    if ($!) { # can't write, so shutdown
        Triceps::App::shutdownFragment($app, $fragment);
    }
}

Nothing too complicated. Prints the text into the socket, flushes it and checks for errors. On errors shuts down the fragment. In this case there is no need for draining. After all, the socket leading to the client is dead and there is no way to send anything more through it, so there is no point in worrying about any unsent data. Just shut down as fast as it can, before the threads have generated more data that can't be sent any more. Any data queued in the nexuses for the shut down threads will be discarded.

Saturday, May 19, 2012

More option checking

Some motifs in checking the options for the method calls have been coming up repeatedly, to I've added more Triceps::Opt methods that encapsulate them.

The first one deals with the mutually exclusive options. Triceps::Opt::parse() doesn't know how to check the mutual exclusivity correctly. For it the option is either mandatory or optional. And rather than complicate it with some convoluted specification of the option exclusivity groups, I've just added a separate method to check that:

$optName = &Triceps::Opt::checkMutuallyExclusive(
  $callerDescr, $mandatoryFlag,
  $optName1 => optValue1, ...);

You call parse() and then you call checkMutuallyExclusive(). If it finds an error, it confesses. It returns the name of the only option that has been defined (or undef if none of them were defined). For example, this is what the JoinTwo constructor does:

&Triceps::Opt::checkMutuallyExclusive("Triceps::JoinTwo::new", 0,
  by => $self->{by},
  byLeft => $self->{byLeft});

$callerDescr is some string that describes the caller for the error message. The names of the options are also used in the error messages. $mandatoryFlag is 1 if exactly one option must be defined, or 0 if having none of them defined is also OK. The "defined" here means that the value passed in the arguments is not undef.

The second method is more specialized. It deals with the triangle of (Unit, RowType, Label). It turns out quite convenient to either let a template define its own input label and then manually connect it or just give it another label and let it automatically chain the input to that label. In the first case the template has to be told, what Unit it belongs to, and what is the RowType of the input data. In the second case they can be found from the Label. The method

&Triceps::Opt::handleUnitTypeLabel($callerDescr, $nameUnit, \$refUnit,
  $nameRowType, \$refRowType, $nameLabel, \$refLabel);

encapsulates this finding-out and other checks. Its rules are:

The label option and the row type option are mutually exclusive.
The unit option may be specified together with the label option, but it must be the same unit as in the label.
If the label option is used, the unit and row type option values will be populated from the label.
On any error it confesses, using $callerDescr for the caller description in the error message. The option name arguments are slao used for the error messages.
It always returns 1.

The values are passed by reference because they may be computed by this method from the other values.

Here is a usage example:

&Triceps::Opt::handleUnitTypeLabel("Triceps::LookupJoin::new",
  unit => \$self->{unit},
  leftRowType => \$self->{leftRowType},
  leftFromLabel => \$self->{leftFromLabel});

The label object doesn't strictly have to be a label object. It may be any object that supports the methods getUnit() and getRowType().

Here you might remember that a Label doesn't have the method getRowType(), its method for getting the row type is called getType(). Well, I've added it now. You can use now either of

$lb->getType()
$lb->getRowType()

with the same effect.

Monday, March 19, 2012

Options

The SimpleAggregator was one of the examples that uses the class Triceps::Opt to parse its arguments formatted as options. There is actually a similar option parser in CPAN but it didn't do everything I wanted, and considering how tiny it is, it's easier to write a new one from scratch than to extend that one. I also like to avoid the extra dependencies.

The heart of it is the method Triceps::Opt::parse(). Normally it would be called from a constructor of another class to parse the constructor's option, the SimpleAggregator was somewhat abusing it. It does the following:

Checks that all the options are known.
Checks that the values are acceptable.
Copies the values into the instance hash of the target class.
Provides the default values for the unspecified options.

If anything goes wrong, it dies with a reasonable message. The arguments tell the class name for the messages (since, remember, it normally is expected to be called from the class constructor), the reference to object instance hash where to copy the options, the descriptions of the supported options, and the actual key-value pairs. A normal call looks like this:

package MyClass;

sub new() # (class, option => value, ...)
{
  my $class = shift;
  my $self = {};

  &Triceps::Opt::parse($class, $self, { 
      opt1 => [ 0 ],
      opt2 => [ undef, \&Triceps::Opt::ck_mandatory ],
      opt3 => [ undef, sub { &Triceps::Opt::ck_mandatory(@_); &Triceps::Opt::ck_ref(@_, "ARRAY") } ],
    }, @_);

...
  bless $self, $class;
  return $self;
}

At the end of it, if all went well, the hash in $self would have the values at keys "opt1" and so on.

The options descriptions go in pairs of option name and an array reference with description. The array contains the default value and the checking function, either of which may be undefined. The checking function returns if everything went fine or dies on any errors. To die happily with a proper message, it gets not only the value to check but more, altogether:

The value to check.
The name of the option.
The name of the class.
The object instance ($self), just in case.

If you want to do multiple checks, you just make a closure and call all the checks in sequence, passing @_ to them all, like shown here for opt3. If more arguments need to be passed to the checking function, just add them after @_ (or, if you prefer, before it).

You can create any checking functions, but a few ready ones are provided:

Triceps::Opt::ck_mandatory checks that the value is defined.
Triceps::Opt::ck_ref checks that the value is a reference to a particular class. Just give the class name as the extra argument. Or, to check that the reference is to array or hash, make the argument "ARRAY" or "HASH". Or an empty string "" to check that it's not a reference at all. For the arrays and hashes it can also check the values in there for being references to the correct types: give that type as the second extra argument. But it doesn't go deeper than that, just one nesting level.
Triceps::Opt::ck_refscalar checks that the value is a reference to a scalar. This is designed to check the arguments which are used to return data back to the caller, and if would accept any previous value in that scalar: an actual scalar value, an undef or a reference.

The ck_ref and ck_refscalar allow the value to be undefined, so they can safely be used on the optional options.When I come up with more of the usable check functions, I'll add them.

Sergey Babkin on CEP and stuff

Tuesday, June 11, 2013

options passing through

Tuesday, April 30, 2013

ThreadedServer, part 2

Saturday, May 19, 2012

More option checking

Monday, March 19, 2012

Options

Links

About Me

Labels

Blog Archive