Sergey Babkin on CEP and stuff: recursion

Showing posts with label recursion. Show all posts

Sunday, July 13, 2014

monads

I've recently been bullshitting onna internets with some fans of the functional programming, and the said fans are all ablaze about the monads. Yet they have a very hard time explaining what's the benefit of the monads, and even what these monads are. So I've got around and read up about this stuff, and bullshitted some more, and here it is: the short explanation of the monads in programming in all their glory. This is the third version of it that I've figured out, and even though it might happen that I'd have more, I think that I've got to the bottom of it this time. The first version of my understanding was gained from Wikipedia, and as it always happens with Wikipedia, it turned out to be crap. Not all crap, only half-crap, but still.

If you're interested in the benefits, let me tell it up front: they seem to be that the glorious community of functional programming can now honestly use the loops and imbibe all their goodness with break and continue statements, and even exceptions, without losing the cherry of the strict mathematical theory. For the rest of us, nothing new.

Now, to the details. What the monads are really about is the iteration on containers, and nesting these iterations. For use in a monad, a container of objects has to have a couple of properties.

First, if you have a container, you must be able to iterate over it and create another container with exactly the same structure (i.e. if you can find an element in the first container in some way, you must be able to find an element in the second container in the same way) but different data (difference may be both in the value and the type of the data). The data is supposed to be the result of some operation of the data from the first container at the same position but the concept of operation is quite wide-open.

This property means that if your container is a list, you need to know how to create another list of the same number of elements, if it's an associative array, you need to know how to create another associative array with the same keys, if your container is a tree, you need to know how to create another tree with the exact same structure.

Second, if you have two containers, you need to have some rule on how to "combine" them into the third container of the same type. The structure of that third container can be any, as long as it follows some consistent rule, and the data elements represent the results of some operation on one element from the first container and one element from the second container. For example, for the lists you can pick the rule that concatenates them, or produces every possible pair of elements from the first list and the second list, or combines the elements at the same position in both lists. Just as in the first property, the data placed in the result would be the result of some operation of the data from the two incoming containers.

What the monad does is apply that combination rule over a sequence of containers, combining it into one. The first two get combined together, then that one is combined with the third one, and so on. The monad is really a bunch of nested loops. In the texts about monads they like to discuss the examples monads on Haskell kinds of containers, like lists, and Maybe (a value that may be NULL), and IO (a strange container that performs input/output), but for all I can tell, the most entertaining and straightforward container would be an associative array, or as its variety, a database table.

For example, the following Perl loop represents a monad (remember, the data in the result may be produced by any operation on the data extracted from the argument container).

my %result;
my %a = ...; my %b = ...; my %c = ...;

while (($ka, $va) = each %a) {
while (($kb, $vb) = each %b) {
while (($kc, $vc) = each %c) {
$result{"$ka,$kb,$kc"} = printf("%d" , $vc? $va: $vb);
}
}
}

And if A, B and C were tables, that would be an unconditional join. The loops don't have to go over the whole container, they may even pick only one or even zero elements, and if such is the rule, they may as well be ifs:

while (($ka, $va) = each %a) {
if (exists $b{$ka} ) {
if (exists $c{$ka} ) {
$result{"$ka"} = [ $va, $b{$ka}, $c{$ka} ];
}

}

}

And it's also a monad! If these were tables, it would be a left join.

The first property of the container lets you do a loop without nesting, and the second property allows to do any number of nestings by applying it recursively.

The rules as stated above say that each nesting must follow the same rule of iteration on the same container type. But there is a weaseling way around it: we can always say that the container is a union that also contains the indication of what operation needs to be done with it, and then the rule that combines two containers would look at this indication in the argument containers and act accordingly. The type Maybe from Haskell does exactly this (and a monad on it provides the Haskell version of what you can think of as exceptions or break/continue). We can also weasel further and say that oh, we know this indication hardcoded at the compilation time, so instead of putting it into the container and then extracting it, we'll just hardcode the loop appropriately (and yes, Haskell does this). The end result is that ANY nested loops can form a monad. And since the if is a special case of a loop, and the unconditional execution is a special case of an if, any procedural code can be a monad.

Not everything would be a monad, but guess what, you can nest monads in monads, and that would give everything. And the returned container doesn't have to be anything like any of the containers involved in the loop, it's the same weasely thing.

The only thing that is left separating the monads from the usual procedural code is that all the data defined outside the loops is read-only (the local variables in the loop blocks can be read-write, this also can be explained in the weasely ways, however the variables outside the current block also can't be changed), and the only result produced is the returned container. However the result can contain a description of how the outside data needs to be modified, and the caller of the monad is then free to apply that description. It's just the ass-backwards ways of the functional programming. So with enough nested and recursive monads (not just nested loops but nested monads) and this apply-description business, all this monkeying can properly simulate the procedural programming.

That description thing can also be though of as the redo log of a transaction: first the redo log is collected and then the whole transaction is applied. Obviously, for the operations inside the transaction to see the state as modified by the previous operations in the transaction, they would have to dig through the redo log.

The definition of the container is also pretty wide. Containers may among other things be infinite - "for (;;)" or "for (i=0;; i++)" are examples of iteration over an infinite container.

This brings us to the point of how the monads are related to CEP. Each nesting level in the monad gets an item of key and data from the previous level, uses it somehow to select zero or more values for the container it has, and sends to the next nesting level a newly combined item of key and data information for every value selected. It's a pipeline! The pipeline approach was actually the second version of my understanding of the monads.

What kind of a pipeline it is? It's a straight one, no branching, nor god forbid, joining. And it's a read-only one: each stage of the pipeline contains no modifiable state, it's all read-only. All the updates have to be carried in the data traveling through the pipeline. And each stage of the pipeline essentially sends out a copy of its incoming data with its own section added to it.

Quite boring and useless. By the way, note that the join as described before would work only on the static tables, not on streams.

Now we get a bit off the monad subject as such and get into the depths of Haskell's dealing with pipelines.

Of course there are weaseling ways around the boring limitations. First, each stage of the pipeline may contain the nested pipelines (i.e. nested monads), and may decide, which of the nested pipelines to take. That gives the fork-join topology (but as you may notice no "fork-and-no-join"). Second, these nested pipelines can be nested further to infinity. Third, the input point itself is a data object that can be passed through the pipeline.

This third item is something that takes some time getting the mind around. Basically, the first stage in the pipeline says "I'm not going to get the input from the outside world any more, instead I'm passing this functionality to the second stage". Obviously, after that the first stage becomes dead and can as well be discarded. Well, not always: there may be MULTIPLE data input points, either within one stage (remember, they are just data, and can be held together), or sprinkled throughout the pipeline. The first stage will become dead only after it gets rid of all its inputs. But let's just say for the sake of an example that there is one input point.

So, with all this weaseling a Haskell program simulates the changing states with two pipelines (i.e. monads):

The pipeline A implements the computation of the next application state.

The pipeline B consists of two stages:
1. Read one item of data, call pipeline A, collect the result form it, and send its result along with the input point to the stage 2.
2. Instantiate a nested copy of the pipeline B, then send the input point and the state to that pipeline, and let it do its work.

It's an infinite number of pipelines, created as long at there is more data coming. And being the tail recursion, Haskell optimizes it out by replacing the second stage of pipeline B with a new copy of pipeline B, instead of nesting it. It discards the old stage 1 after it becomes dead. This really becomes a dynamic pipeline, endlessly growing at the back and dying out at the front. Of course, when things get to the actual running, that gets optimized out too. Instead of creating a new pipeline they just turn back to the old pipeline, say "now it's the new pipeline", give it the new state and let it run along.

As far as the parallelism is concerned, the stages of the pipeline A can work in parallel, but then when the time comes to apply the result and get the next input, it hiccups and becomes single-threaded.

Is there an important lesson in this? I don't see any, at least yet. Other than an interesting mental exercise in the ass-backwardness. And from all the asking of the monad enthusiasts on the internets, what exactly is the benefit they get from the monads, I didn't get any answer other than the hand-waving about the Mathematical Foundations. For all I can tell, there isn't one.

If you're interested in learning more about Haskell and its monads, I can recommend http://learnyouahaskell.com/

Wednesday, October 24, 2012

Streaming functions and recursion, part 6

The combination of the two previous examples (the one with the trays and the one with the forks) doesn't work. They could be combined but the combination just doesn't work right.

The problem is that the example with trays relies on the recursive function being executed before the tray gets called. But if both of them are forked, things break.

Well, if there is only one recursive call, it still works because the execution frame looks like this:

arg1
pop1

The rowop arg1 executes, places the result into the tray (provided that it calls the FnReturn label, not forks to it!). Then the rowop pop1 executes and calls the tray. So far so good.

Now let's do the recursion with depth two. The first level starts the same:

arg1
pop1

Then arg1 executes and forks the second level of recursion:

pop1
arg2
pop2

Do you see what went wrong? The unit execution frames are FIFO. So the second level of recursion got queued after the popping of the first level. That pop1 executes next, doesn't get any return values, and everything goes downhill from there.

Streaming functions and recursion, part 5

And there is also a way to run the recursive calls without even the need to increase the recursion depth limit. It can be left at the default 1, without setMaxRecursionDepth(). The secret is to fork the argument rowops to the functions instead of calling them.

###
# A streaming function that computes a Fibonacci number.

# Input:
#   $lbFibCompute: request to compute the number.
# Output (by FnReturn labels):
#   "result": the computed value.
# The opcode is preserved through the computation.

my @stackFib; # stack of the function states
my $stateFib; # The current state

my $frFib = Triceps::FnReturn->new(
    name => "Fib",
    unit => $uFib,
    labels => [
        result => $rtFibRes,
    ],
    onPush => sub { push @stackFib, $stateFib; $stateFib = { }; },
    onPop => sub { $stateFib = pop @stackFib; },
);

my $lbFibResult = $frFib->getLabel("result");

# Declare the label & binding variables in advance, to define them sequentially.
my ($lbFibCompute, $fbFibPrev1, $fbFibPrev2);
$lbFibCompute = $uFib->makeLabel($rtFibArg, "FibCompute", undef, sub {
    my $row = $_[1]->getRow();
    my $op = $_[1]->getOpcode();
    my $idx = $row->get("idx");

    if ($idx <= 1) {
        $uFib->fork($frFib->getLabel("result")->makeRowopHash($op,
            idx => $idx,
            fib => $idx < 1 ? 0 : 1,
        ));
    } else {
        $stateFib->{op} = $op;
        $stateFib->{idx} = $idx;

        $frFib->push($fbFibPrev1);
        $uFib->fork($lbFibCompute->makeRowopHash($op,
            idx => $idx - 1,
        ));
    }
}) or confess "$!";
$fbFibPrev1 = Triceps::FnBinding->new(
    unit => $uFib,
    name => "FibPrev1",
    on => $frFib,
    labels => [
        result => sub {
            $frFib->pop($fbFibPrev1);

            $stateFib->{prev1} = $_[1]->getRow()->get("fib");

            # must prepare before pushing new state and with it new $stateFib
            my $rop = $lbFibCompute->makeRowopHash($stateFib->{op},
                idx => $stateFib->{idx} - 2,
            );

            $frFib->push($fbFibPrev2);
            $uFib->fork($rop);
        },
    ],
);
$fbFibPrev2 = Triceps::FnBinding->new(
    unit => $uFib,
    on => $frFib,
    name => "FibPrev2",
    labels => [
        result => sub {
            $frFib->pop($fbFibPrev2);

            $stateFib->{prev2} = $_[1]->getRow()->get("fib");
            $uFib->fork($frFib->getLabel("result")->makeRowopHash($stateFib->{op},
                idx => $stateFib->{idx},
                fib => $stateFib->{prev1} + $stateFib->{prev2},
            ));
        },
    ],
);

# End of streaming function
###

This is a variety of the pre-previous example, with the split push and pop. The split is required for the fork to work: when the forked rowop executes, the calling label has already returned, so obviously the scoped approach won't work.

In this version the unit stack depth required to compute the 6th (and any) Fibonacci number reduces to 2: it's really only one level on top of the outermost frame.

Streaming functions and recursion, part 4

Following up on the previous installment, here is the example that uses the bindings with tray:

###
# A streaming function that computes a Fibonacci number.

# Input:
#   $lbFibCompute: request to compute the number.
# Output (by FnReturn labels):
#   "result": the computed value.
# The opcode is preserved through the computation.

my @stackFib; # stack of the function states
my $stateFib; # The current state

my $frFib = Triceps::FnReturn->new(
    name => "Fib",
    unit => $uFib,
    labels => [
        result => $rtFibRes,
    ],
    onPush => sub { push @stackFib, $stateFib; $stateFib = { }; },
    onPop => sub { $stateFib = pop @stackFib; },
);

my $lbFibResult = $frFib->getLabel("result");

# Declare the label & binding variables in advance, to define them sequentially.
my ($lbFibCompute, $fbFibPrev1, $fbFibPrev2);
$lbFibCompute = $uFib->makeLabel($rtFibArg, "FibCompute", undef, sub {
    my $row = $_[1]->getRow();
    my $op = $_[1]->getOpcode();
    my $idx = $row->get("idx");

    if ($idx <= 1) {
        $uFib->makeHashCall($frFib->getLabel("result"), $op,
            idx => $idx,
            fib => $idx < 1 ? 0 : 1,
        );
    } else {
        $stateFib->{op} = $op;
        $stateFib->{idx} = $idx;

        {
            my $ab = Triceps::AutoFnBind->new(
                $frFib => $fbFibPrev1
            );
            $uFib->makeHashCall($lbFibCompute, $op,
                idx => $idx - 1,
            );
        }
        $fbFibPrev1->callTray();
    }
}) or confess "$!";
$fbFibPrev1 = Triceps::FnBinding->new(
    unit => $uFib,
    name => "FibPrev1",
    on => $frFib,
    withTray => 1,
    labels => [
        result => sub {
            $stateFib->{prev1} = $_[1]->getRow()->get("fib");

            # must prepare before pushing new state and with it new $stateFib
            my $rop = $lbFibCompute->makeRowopHash($stateFib->{op},
                idx => $stateFib->{idx} - 2,
            );

            {
                my $ab = Triceps::AutoFnBind->new(
                    $frFib => $fbFibPrev2
                );
                $uFib->call($rop);
            }
            $fbFibPrev2->callTray();
        },
    ],
);
$fbFibPrev2 = Triceps::FnBinding->new(
    unit => $uFib,
    on => $frFib,
    name => "FibPrev2",
    withTray => 1,
    labels => [
        result => sub {
            $stateFib->{prev2} = $_[1]->getRow()->get("fib");
            $uFib->makeHashCall($frFib->getLabel("result"), $stateFib->{op},
                idx => $stateFib->{idx},
                fib => $stateFib->{prev1} + $stateFib->{prev2},
            );
        },
    ],
);

# End of streaming function
###

The stack depth is now greatly reduced because the unit stack pops the frames before pushing more of them. For the 2nd Fibonacci number the trace is:

unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }
unit 'uFib' before label 'FibPrev1.result' op OP_DELETE {
unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }
unit 'uFib' before label 'FibPrev2.result' op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' before label 'FibCall.result' (chain 'Fib.result') op OP_DELETE {
unit 'uFib' after label 'FibCall.result' (chain 'Fib.result') op OP_DELETE }
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibPrev2.result' op OP_DELETE }
unit 'uFib' after label 'FibPrev1.result' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }

For the 6th number the maximal required stack depth now gets reduced to only 9 instead of 51.

Streaming functions and recursion, part 3

FnBinding:call() with closures is easy to use but it creates a closure and an FnBinding object on each run. Can things be rearranged to reuse the same objects? With some effort, they can:

###
# A streaming function that computes a Fibonacci number.

# Input:
#   $lbFibCompute: request to compute the number.
# Output (by FnReturn labels):
#   "result": the computed value.
# The opcode is preserved through the computation.

my @stackFib; # stack of the function states
my $stateFib; # The current state

my $frFib = Triceps::FnReturn->new(
    name => "Fib",
    unit => $uFib,
    labels => [
        result => $rtFibRes,
    ],
    onPush => sub { push @stackFib, $stateFib; $stateFib = { }; },
    onPop => sub { $stateFib = pop @stackFib; },
);

my $lbFibResult = $frFib->getLabel("result");

# Declare the label & binding variables in advance, to define them sequentially.
my ($lbFibCompute, $fbFibPrev1, $fbFibPrev2);
$lbFibCompute = $uFib->makeLabel($rtFibArg, "FibCompute", undef, sub {
    my $row = $_[1]->getRow();
    my $op = $_[1]->getOpcode();
    my $idx = $row->get("idx");

    if ($idx <= 1) {
        $uFib->makeHashCall($frFib->getLabel("result"), $op,
            idx => $idx,
            fib => $idx < 1 ? 0 : 1,
        );
    } else {
        $stateFib->{op} = $op;
        $stateFib->{idx} = $idx;

        $frFib->push($fbFibPrev1);
        $uFib->makeHashCall($lbFibCompute, $op,
            idx => $idx - 1,
        );
    }
}) or confess "$!";
$fbFibPrev1 = Triceps::FnBinding->new(
    unit => $uFib,
    name => "FibPrev1",
    on => $frFib,
    labels => [
        result => sub {
            $frFib->pop($fbFibPrev1);

            $stateFib->{prev1} = $_[1]->getRow()->get("fib");

            # must prepare before pushing new state and with it new $stateFib
            my $rop = $lbFibCompute->makeRowopHash($stateFib->{op},
                idx => $stateFib->{idx} - 2,
            );

            $frFib->push($fbFibPrev2);
            $uFib->call($rop);
        },
    ],
);
$fbFibPrev2 = Triceps::FnBinding->new(
    unit => $uFib,
    on => $frFib,
    name => "FibPrev2",
    labels => [
        result => sub {
            $frFib->pop($fbFibPrev2);

            $stateFib->{prev2} = $_[1]->getRow()->get("fib");
            $uFib->makeHashCall($frFib->getLabel("result"), $stateFib->{op},
                idx => $stateFib->{idx},
                fib => $stateFib->{prev1} + $stateFib->{prev2},
            );
        },
    ],
);

# End of streaming function
###

The rest of the code stays the same, so I won't copy it here.

The computation still needs to keep the intermediate results of two recursive calls. With no closures, these results have to be kept in a global object $stateFib (which is a hash that keeps multiple values).

But it can't just be a single object! The recursive calls would overwrite it. So it has to be built into a stack of objects, a new one pushed for each call and popped after it. This pushing and popping can be tied to the pushing and popping of the bindings on an FnReturn. When the FnReturn is defined, the options onPush and onPop define the custom Perl code to execute, which is used here for the management of the state stack.

The whole logic is then split into the sections around the calls:

before the first call
between the first and second call
after the second call

The first section goes as a normal label and the rest are done as bindings.

A tricky moment is that a simple scoped AutoFnBind can't be used here. The pushing of the binding happens in the calling label (such as FibCompute) but then the result is processed in another label (such as FibPrev1.result). The procedural control won't return to FibCompute until after FibPrev1.result has been completed. But FibPrev1.result needs the state popped before it can do its work! So the pushing and popping of the binding is done explicitly in two split steps: push() called in FibCompute() and pop() called in FibPrev1.result. And of course then after FibPrev1.result saves the result, it pushes the binding again, which then gets popped in FibPrev2.result.

The popping can also be done without arguments, as pop(), but if it's given an argument, it will check that the binding popped is the same as its argument. This is helpful for detecting the call stack corruptions.

Now, can you guess, what depth of the unit call stack is required to compute and print the 2nd Fibonacci number? It's 7. If the tracing is enabled, it will produce this trace:

unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' before label 'FibPrev1.result' (chain 'Fib.result') op OP_DELETE {
unit 'uFib' before label 'FibCompute' op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' before label 'FibPrev2.result' (chain 'Fib.result') op OP_DELETE {
unit 'uFib' before label 'Fib.result' op OP_DELETE {
unit 'uFib' before label 'FibCall.result' (chain 'Fib.result') op OP_DELETE {
unit 'uFib' after label 'FibCall.result' (chain 'Fib.result') op OP_DELETE }
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibPrev2.result' (chain 'Fib.result') op OP_DELETE }
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }
unit 'uFib' after label 'FibPrev1.result' (chain 'Fib.result') op OP_DELETE }
unit 'uFib' after label 'Fib.result' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }
unit 'uFib' after label 'FibCompute' op OP_DELETE }

9 labels get called in a sequence, all the way from the initial call to the result printing. And only then the whole sequence unrolls back. 3 of them are chained through the bindings, so they don't push the stack frames onto the stack, and there is always the outermost stack frame, with the result of 9-3+1 = 7. This number grows fast. For the 6th number the number of labels becomes 75 and the frame count 51.

It happens because all the calls get unrolled into a single sequence, like what I've warned against in the section on the loops. The function return does unroll its FnReturn stack but doesn't unroll the unit call stack, it just goes even deeper by calling the label that processes it.

There are ways to improve it. The simplest one is to use the FnBinding with a tray, and call this tray after the function completely returns. This works out quite conveniently in two other ways too: First, AutoFnBind with its scoped approach can be used again. And second, it allows to handle the situations where a function returns not just one row but multiple of them. That will be the next example.

Sunday, October 21, 2012

Streaming functions and recursion, part 2

Now that the recursion is permitted, let's look at a basic example: the same Fibonacci function as before, only computed in a dumb recursive way. In a real dumb recursive way, with two recursive calls, to show how they are done. The simplest and most straightforward way goes like this:

my $uFib = Triceps::Unit->new("uFib");
$uFib->setMaxRecursionDepth(100);

# Type the data going into the function
my $rtFibArg = Triceps::RowType->new(
    idx => "int32", # the index of Fibonacci number to generate
) or confess "$!";

# Type of the function result
my $rtFibRes = Triceps::RowType->new(
    idx => "int32", # the index of Fibonacci number
    fib => "int64", # the generated Fibonacci number
) or confess "$!";

###
# A streaming function that computes a Fibonacci number.

# Input:
#   $lbFibCompute: request to compute the number.
# Output (by FnReturn labels):
#   "result": the computed value.
# The opcode is preserved through the computation.

my $frFib = Triceps::FnReturn->new(
    name => "Fib",
    unit => $uFib,
    labels => [
        result => $rtFibRes,
    ],
);

my $lbFibResult = $frFib->getLabel("result");

my $lbFibCompute; # must be defined before assignment, for recursion
$lbFibCompute = $uFib->makeLabel($rtFibArg, "FibCompute", undef, sub {
    my $row = $_[1]->getRow();
    my $op = $_[1]->getOpcode();
    my $idx = $row->get("idx");
    my $res;

    if ($idx < 1) {
        $res = 0;
    } elsif($idx == 1) {
        $res = 1;
    } else {
        my ($prev1, $prev2);
        Triceps::FnBinding::call(
            name => "FibCompute.call1",
            on => $frFib,
            unit => $uFib,
            labels => [
                result => sub {
                    $prev1 = $_[1]->getRow()->get("fib");
                }
            ],
            rowop => $lbFibCompute->makeRowopHash($op,
                idx => $idx - 1,
            ),
        );
        Triceps::FnBinding::call(
            name => "FibCompute.call2",
            on => $frFib,
            unit => $uFib,
            labels => [
                result => sub {
                    $prev2 = $_[1]->getRow()->get("fib");
                }
            ],
            rowop => $lbFibCompute->makeRowopHash($op,
                idx => $idx - 2,
            ),
        );
        $res = $prev1 + $prev2;
    }
    $uFib->makeHashCall($frFib->getLabel("result"), $op,
        idx => $idx,
        fib => $res,
    );
}) or confess "$!";

# End of streaming function
###

# binding to call the Fibonacci function and print the result
my $fbFibCall = Triceps::FnBinding->new(
    name => "FibCall",
    on => $frFib,
    unit => $uFib,
    labels => [
        result => sub {
            my $row = $_[1]->getRow();
            print($row->get("fib"), " is Fibonacci number ", $row->get("idx"), "\n");
        }
    ],
);

while(<STDIN>) {
    chomp;
    my @data = split(/,/);
    {
        my $ab = Triceps::AutoFnBind->new(
            $frFib => $fbFibCall,
        );
        $uFib->makeArrayCall($lbFibCompute, @data);
    }
    $uFib->drainFrame(); # just in case, for completeness
}

The calling sequence has became different than in the looping version but the produced result is exactly the same. The streaming function now receives an argument row and produces a result row. The unit's recursion depth limit had to be adjusted to permit the recursion.

The recursive calls are done through the FnBinding::call(), with a closure for the result handling label. That closure can access the scope of its creator and place the result into its local variable. After both intermediate results are computed, the final result computation takes place and sends out the result row.

Saturday, October 13, 2012

Recursion is now permitted

I've started talking about recursion, and it's much more convenient to do when the recursion can actually be used. The recursive calls (when a label calls itself, directly or indirectly) have been forbidden in Triceps. Mind you, the recursive calling still can be done with the help of trays and forking, as the previous example has shown and more examples will show. And it's probably the best way too from the standpoint of correctness. However it's not the most straightforward way.

So I've decided to allow the recursion in its direct way. Especially that it doesn't have to be all-or-nothing, it can be done in a piecemeal and controlled fashion.

Now it's controlled per-unit. Each unit has two adjustable limits:

Maximal stack depth: Limits the total depth of the unit's call stack. That's the maximal length of the call chain, whether it goes straight or in loops.
Maximal recursion depth: Limits the number of times each particular label may appear on the call stack. So if you have a recursive code fragment (a streaming function, or now you can do it with a loop too), this is the limit on its recursive re-entrances.

Both these limits accept the 0 and negative values to mean "unlimited".

The default is as it has been before: unlimited stack depth, recursion depth of 1 (which means that each label may be called once but it may not call itself). But now you can change them with the calls:

$unit-> setMaxStackDepth($n);
$unit->setMaxRecursionDepth($n);

You can change them at any time, even when the unit is running (but they will be enforced only on the next attempt to execute a rowop).

You can also read the current values:

$n = $unit->maxStackDepth();
$n = $unit->maxRecursionDepth();

Another thing about the limits is that even if you set them to "unlimited" or to some very large values, thee still are the system limits. The calls use the C++ process (or thread) stack, and if you make too many of them, the stack will overflow and the whole process will crash and possibly dump core. Keeping the call depths within reason is still a good idea.

Now you can do the direct recursion. However as with the procedural code, not all the labels are re-entrant. Some of them may work with the static data structures that can't be modified in a nested fashion. Think for example of a table: when you modify a table, it sends rowops to its "pre" and "out" labels. You can connect the other labels there, and react to the table modifications. However these labels can't attempt to modify the same table, because the table is already in the middle of a modification, and it's not re-entrant.

The table still has a separate logic to check for non-re-entrance, and no matter what is the unit's general recursion depth limit, for the table it always stays at 1. Moreover, the table enforces it across both the input label interface and the procedural interface.

If you make your own non-re-entrant labels, Triceps can make this check for you. Just mark the first label of the non-re-entrant sequence with

$label->setNonReentrant();

And it will have its own private recursion limit of 1. Any time it's attempted to execute recursively, it will confess. There is no way to unset this flag: when a label is known to be non-re-entrant, it can not suddenly become re-entrant until its code is rewritten.

You can read this flag with

$val = $label->isNonReentrant();

The next installment will show some more examples.

Thursday, October 11, 2012

Streaming functions and recursion, part 1

Let's look again at the pipeline example. Suppose we want to do the encryption twice (you know, maybe we have a secure channel to a semi-trusted intermediary who can can read the envelopes and forward the encrypted messages he can't read to the final destination). The pipeline becomes

decrypt | decrypt | process | encrypt | encrypt

Or if you want to think about it in a more function-like notation, rather than a pipeline, the logic can also be expressed as:

encrypt(encrypt(process(decrypt(decrypt(data)))))

However it would not work directly: a decrypt function has only one output and it can not have two bindings at the same time, it would not know which one to use at any particular time.

Instead you can make decrypt into a template, instantiate it twice, and connect into a pipeline. It's very much like what the Unix shell does: it instantiates a new process for each part of its pipeline.

But there is also another possibility: instead of collecting the whole pipeline in advance, do it in steps.

Start by adding in every binding:

withTray => 1,

This will make all the bindings collect the result on a tray instead of sending it on immediately. Then modify the main loop:

while(<STDIN>) {
chomp;

# receive
my $abReceive = Triceps::AutoFnBind->new(
    $retReceive => $bindDecrypt,
);
$unit->makeArrayCall($lbReceive, "OP_INSERT", $_);

# 1st decrypt
my $abDecrypt1 = Triceps::AutoFnBind->new(
    $retDecrypt => $bindDecrypt,
);
$bindDecrypt->callTray();

# 2nd decrypt
my $abDecrypt2 = Triceps::AutoFnBind->new(
    $retDecrypt => $bindDispatch,
);
$bindDecrypt->callTray();

# processing
my $abProcess = Triceps::AutoFnBind->new(
    $retOutput => $bindEncrypt,
);
$bindDispatch->callTray();

# 1st encrypt
my $abEncrypt1 = Triceps::AutoFnBind->new(
    $retEncrypt => $bindEncrypt,
);
$bindEncrypt->callTray();

# 2nd encrypt
my $abEncrypt2 = Triceps::AutoFnBind->new(
    $retEncrypt => $bindSend,
);
$bindEncrypt->callTray();

# send
$bindSend->callTray();
}

Here I've dropped the encrypted-or-unencrypted choice to save the space, the data is always encrypted twice. The drainFrame() call has been dropped because it has nothing to do anyway, and actually with the way the function calls work here. The rest of the code stays the same.

The bindings have been split in stages. In each stage the next binding is set, and the data from the previous binding gets sent into it. The binding method callTray() replaces the tray in the binding with an empty one, and then calls all the rowops collected on the old tray (and then if you wonder, what happens to the old tray, it gets discarded). Because of this the first decryption stage with binding

    $retDecrypt => $bindDecrypt,

doesn't send the data circling forever. It just does one pass through the decryption and prepares for the second pass.

Every time AutoFnBind->new() runs, it doesn't replace the binding of the return but pushes a new binding onto the return's stack. Each FnReturn has its own stack of bindings (this way it's easier to manage than a single stack). When an AutoFnBind gets destroyed, it pops the binding from the return's stack. And yes, if you specify multiple bindings in one AutoFnBind, all of them get pushed on construction and popped on destruction. In this case all the auto-binds are in the same block, so they will all be destroyed at the end of block in the opposite order. Which means that in effect the code is equivalent to the nested blocks, and this version might be easier for you to think of:

while(<STDIN>) {
chomp;

# receive
my $abReceive = Triceps::AutoFnBind->new(
    $retReceive => $bindDecrypt,
);
$unit->makeArrayCall($lbReceive, "OP_INSERT", $_);

{
    # 1st decrypt
    my $abDecrypt1 = Triceps::AutoFnBind->new(
      $retDecrypt => $bindDecrypt,
    );
    $bindDecrypt->callTray();

    {
      # 2nd decrypt
      my $abDecrypt1 = Triceps::AutoFnBind->new(
        $retDecrypt => $bindDispatch,
      );
      $bindDecrypt->callTray();

      {
        # processing
        my $abProcess = Triceps::AutoFnBind->new(
          $retOutput => $bindEncrypt,
        );
        $bindDispatch->callTray();

        {
          # 1st encrypt
          my $abEncrypt1 = Triceps::AutoFnBind->new(
            $retEncrypt => $bindEncrypt,
          );
          $bindEncrypt->callTray();

          {
            # 2nd encrypt
            my $abEncrypt1 = Triceps::AutoFnBind->new(
              $retEncrypt => $bindSend,
            );
            $bindEncrypt->callTray();

            # send
            $bindSend->callTray();
          }
        }
      }
    }
}
}

An interesting consequence of all this nesting, pushing and popping is that you can put the inner calls into the procedural loops if you with. For example, if you want to process every input line thrice:
while(<STDIN>) {
chomp;

# receive
my $abReceive = Triceps::AutoFnBind->new(
    $retReceive => $bindDecrypt,
);

for (my $i = 0; $i < 3; $i++) {
    $unit->makeArrayCall($lbReceive, "OP_INSERT", $_);

    {
      # 1st decrypt
      my $abDecrypt1 = Triceps::AutoFnBind->new(
        $retDecrypt => $bindDecrypt,
      );
      $bindDecrypt->callTray();

      {
        # 2nd decrypt
        my $abDecrypt1 = Triceps::AutoFnBind->new(
          $retDecrypt => $bindDispatch,
        );
        $bindDecrypt->callTray();

        {
          # processing
          my $abProcess = Triceps::AutoFnBind->new(
            $retOutput => $bindEncrypt,
          );
          $bindDispatch->callTray();

          {
            # 1st encrypt
            my $abEncrypt1 = Triceps::AutoFnBind->new(
              $retEncrypt => $bindEncrypt,
            );
            $bindEncrypt->callTray();

            {
              # 2nd encrypt
              my $abEncrypt1 = Triceps::AutoFnBind->new(
                $retEncrypt => $bindSend,
              );
              $bindEncrypt->callTray();

              # send
              $bindSend->callTray();
            }
          }
        }
      }
    }
}
}

This code will run the whole pipeline three times for each input line, and print out three output lines, such as:

>3639366536333263346635303566343934653533343535323534326336313632363332633332
37323635373337353663373432303466353035663439346535333435353235343230366536313664363533643232363136323633323232303633366637353665373433643232333332323230
37323635373337353663373432303466353035663439346535333435353235343230366536313664363533643232363136323633323232303633366637353665373433643232333332323230
37323635373337353663373432303466353035663439346535333435353235343230366536313664363533643232363136323633323232303633366637353665373433643232333332323230

If you wonder, what is the meaning of these lines, they are the same as before. The input is :

inc,OP_INSERT,abc,2

And each line of output is:

result OP_INSERT name="abc" count="3"

I suppose, it would be more entertaining if the processing weren't just incrementing a value in the input data but incrementing some static counter, then the three output lines would be different.

However this is not the only way to do the block nesting. The contents of the FnBinding's tray is not affected in any way by the binding being pushed or popped. It stays there throughout, until it's explicitly flushed by callTray(). So it could use the blocks formed in a more pipeline fashion (as opposed to the more function-call-like fashion shown before):

while(<STDIN>) {
chomp;

# receive
{
    my $abReceive = Triceps::AutoFnBind->new(
      $retReceive => $bindDecrypt,
    );
    $unit->makeArrayCall($lbReceive, "OP_INSERT", $_);
}

# 1st decrypt
{
    my $abDecrypt1 = Triceps::AutoFnBind->new(
      $retDecrypt => $bindDecrypt,
    );
    $bindDecrypt->callTray();
}

# 2nd decrypt
{
    my $abDecrypt1 = Triceps::AutoFnBind->new(
      $retDecrypt => $bindDispatch,
    );
    $bindDecrypt->callTray();
}

# processing
{
    my $abProcess = Triceps::AutoFnBind->new(
      $retOutput => $bindEncrypt,
    );
    $bindDispatch->callTray();
}

# 1st encrypt
{
    my $abEncrypt1 = Triceps::AutoFnBind->new(
      $retEncrypt => $bindEncrypt,
    );
    $bindEncrypt->callTray();
}

# 2nd encrypt
{
    my $abEncrypt1 = Triceps::AutoFnBind->new(
      $retEncrypt => $bindSend,
    );
    $bindEncrypt->callTray();
}

# send
$bindSend->callTray();
}

After each stage its binding is popped, but the tray is carried through to the next stage.

Which way of blocking is better? I'd say they're pretty equivalent in functionality, and your preference would depend on what style you prefer to express.

Sergey Babkin on CEP and stuff