Thursday, November 29, 2012

A snapshot pre-release of 1.1.0

Now the Streaming Functions look wrapped up, and before embarking on the next big feature, it looks like a good time to publish a snapshot. This one is named 1.0.91-20121129, and is available for download now.

As usual with the snapshots, the Developer's Guide in the package has not been updated, instead this blog serves as the interim documentation. The posts with the label 1_1_0 up to now describe all the new features.

Monday, November 26, 2012

Streaming functions and unit boundaries (and TQL guts)

Now let's take a look at the insides of the Tql module. I'll be skipping over the code that is less interesting, you can find the full version in the source code of perl/Triceps/lib/Triceps/X/Tql.pm as always. The constructor is one of these things to be skipped. The initialization part is more interesting:

sub initialize # ($self)
{
    my $myname = "Triceps::X::Tql::initialize";
    my $self = shift;

    return if ($self->{initialized});

    my %dispatch;
    my @labels;
    for (my $i = 0; $i <= $#{$self->{tables}}; $i++) {
        my $name = $self->{tableNames}[$i];
        my $table = $self->{tables}[$i];

        confess "$myname: found a duplicate table name '$name', all names are: "
                . join(", ", @{$self->{tableNames}})
            if (exists $dispatch{$name});

        $dispatch{$name} = $table;
        push @labels, $name, $table->getDumpLabel();
    }

    $self->{dispatch} = \%dispatch;
    $self->{fret} = Triceps::FnReturn->new(
        name => $self->{name} . ".fret",
        labels => \@labels,
    );

    $self->{initialized} = 1;
}

It creates a dispatch table of name-to-table and also an FnReturn that contains the dump labels of all the tables.

Each query will be created as its own unit. It will run, and then get cleared and disposed of, very convenient. By the way, that is the answer to the question of why would someone want to use multiple units in the same thread: for modular disposal.

But the labels in the main unit and the query unit can't be directly connected. A direct connection would create the stable references, and the disposal won't work. That's where the streaming function interface comes to the rescue: it provides a temporary connection. Build the query unit, build a binding for it, push the binding onto the FnReturn of the main unit, run the query, pop the binding, dispose of the query unit.

And the special capacity (or if you will, superpower) of the streaming functions that allows all that is that the FnReturn and FnBinding don't have to be of the same unit. They may be of the different units and will still work together fine.

The query() method then handles the creation of the unit and stuff:

sub query # ($self, $argline)
{
    my $myname = "Triceps::X::Tql::query";

    my $self = shift;
    my $argline = shift;

    confess "$myname: may be used only on an initialized object"
        unless ($self->{initialized});

    $argline =~ s/^([^,]*)(,|$)//; # skip the name of the label
    my $q = $1; # the name of the query itself
    #&Triceps::X::SimpleServer::outCurBuf("+DEBUGquery: $argline\n");
    my @cmds = split_braced($argline);
    if ($argline ne '') {
        # Presumably, the argument line should contain no line feeds, so it should be safe to send back.
        &Triceps::X::SimpleServer::outCurBuf("+ERROR,OP_INSERT,$q: mismatched braces in the trailing $argline\n");
        return
    }

    # The context for the commands to build up an execution of a query.
    # Unlike $self, the context is created afresh for every query.
    my $ctx = {};
    # The query will be built in a separate unit
    $ctx->{tables} = $self->{dispatch};
    $ctx->{fretDumps} = $self->{fret};
    $ctx->{u} = Triceps::Unit->new("${q}.unit");
    $ctx->{prev} = undef; # will contain the output of the previous command in the pipeline
    $ctx->{actions} = []; # code that will run the pipeline
    $ctx->{id} = 0; # a unique id for auto-generated objects

    # It's important to place the clearing trigger outside eval {}. Otherwise the
    # clearing will erase any errors in $@ returned from eval.
    my $cleaner = $ctx->{u}->makeClearingTrigger();
    if (! eval {
        foreach my $cmd (@cmds) {
            #&Triceps::X::SimpleServer::outCurBuf("+DEBUGcmd, $cmd\n");
            my @args = split_braced($cmd);
            my $argv0 = bunquote(shift @args);
            # The rest of @args do not get unquoted here!
            die "No such TQL command '$argv0'\n" unless exists $tqlDispatch{$argv0};
            $ctx->{id}++;
            &{$tqlDispatch{$argv0}}($ctx, @args);
            # Each command must set its result label (even if an undef) into
            # $ctx->{next}.
            die "Internal error in the command $argv0: missing result definition\n"
                unless (exists $ctx->{next});
            $ctx->{prev} = $ctx->{next};
            delete $ctx->{next};
        }
        if (defined $ctx->{prev}) {
            # implicitly print the result of the pipeline, no options
            &{$tqlDispatch{"print"}}($ctx);
        }

        # Now run the pipeline
        foreach my $code (@{$ctx->{actions}}) {
            &$code;
        }

        # Now run the pipeline
        1; # means that everything went OK
    }) {
        # XXX this won't work well with the multi-line errors
        &Triceps::X::SimpleServer::outCurBuf("+ERROR,OP_INSERT,$q: error: $@\n");
        return
    }
}

Each TQL command is defined as its own method, all of them collected in the %tqlDispatch. query() splits the pipeline and then lets each command build its part of the query, connecting them through $ctx. A command may also register an action to be run later. After everything is built, the actions run and produce the result.

The functions split_braced() and bunquote() are imported from the package Triceps::X::Braced that handles the parsing of the braced nested lists.

Another interesting part is the error reporting, done as a special label "+ERROR". It's actually one of the sticky points of why the code is not of production quality: because the errors may be multi-line, and the SimpleServer protocol really expects everything to be single-line. Properly, some quoting would have to be done.

Moving on, here is the "read" command handler:

sub _tqlRead # ($ctx, @args)
{
    my $ctx = shift;
    die "The read command may not be used in the middle of a pipeline.\n"
        if (defined($ctx->{prev}));
    my $opts = {};
    &Triceps::Opt::parse("read", $opts, {
        table => [ undef, \&Triceps::Opt::ck_mandatory ],
    }, @_);

    my $fret = $ctx->{fretDumps};
    my $tabname = bunquote($opts->{table});

    die ("Read found no such table '$tabname'\n")
        unless (exists $ctx->{tables}{$tabname});
    my $unit = $ctx->{u};
    my $table = $ctx->{tables}{$tabname};
    my $lab = $unit->makeDummyLabel($table->getRowType(), "lb" . $ctx->{id} . "read");
    $ctx->{next} = $lab;

    my $code = sub {
        Triceps::FnBinding::call(
            name => "bind" . $ctx->{id} . "read",
            unit => $unit,
            on => $fret,
            labels => [
                $tabname => $lab,
            ],
            code => sub {
                $table->dumpAll();
            },
        );
    };
    push @{$ctx->{actions}}, $code;
}

It's the only command that registers an action, which sends data into the query unit. The rest of commands just add more handlers to the pipeline in the unit, and get the data that flows from "read". The action sets up a binding and calls the table dump, to send the data into that binding.

The reading of the tables could have also been done without the bindings, and without the need to bind the units at all: just iterate through the table procedurally in the action. But this whole example has been built largely to showcase that the bindings can be used in this way, so naturally it uses bindings.

The bindings come more useful when the query logic has to react to the normal logic of the main unit, such as in the subscriptions: set up the query, read its initial state, and then keep reading as the state gets updated. But guess what, the subscriptions can't be done with the FnReturns as shown because the FnReturn only sends its data to the last binding pushed onto it. This means, if multiple subscriptions get set up, only the last one will be getting the data. There will be a separate mechanism for that.

running the TQL query server

The code that produced the query output examples from the previous post looks like this:

# The basic table type to be used for querying.
# Represents the trades reports.
our $rtTrade = Triceps::RowType->new(
  id => "int32", # trade unique id
  symbol => "string", # symbol traded
  price => "float64",
  size => "float64", # number of shares traded
) or confess "$!";

our $ttWindow = Triceps::TableType->new($rtTrade)
  ->addSubIndex("bySymbol",
    Triceps::SimpleOrderedIndex->new(symbol => "ASC")
      ->addSubIndex("last2",
        Triceps::IndexType->newFifo(limit => 2)
      )    
  )
  or confess "$!";
$ttWindow->initialize() or confess "$!";

# Represents the static information about a company.
our $rtSymbol = Triceps::RowType->new(
  symbol => "string", # symbol name
  name => "string", # the official company name
  eps => "float64", # last quarter earnings per share
) or confess "$!";

our $ttSymbol = Triceps::TableType->new($rtSymbol)
  ->addSubIndex("bySymbol",
    Triceps::IndexType->newHashed(key => [ "symbol" ])
  )
  or confess "$!";
$ttSymbol->initialize() or confess "$!";

my $uTrades = Triceps::Unit->new("uTrades");
my $tWindow = $uTrades->makeTable($ttWindow, "EM_CALL", "tWindow")
  or confess "$!";
my $tSymbol = $uTrades->makeTable($ttSymbol, "EM_CALL", "tSymbol")
  or confess "$!";

# The information about tables, for querying.
my $tql = Triceps::X::Tql->new(
  name => "tql",
  tables => [
    $tWindow,
    $tSymbol,
  ],
);

my %dispatch;
$dispatch{$tWindow->getName()} = $tWindow->getInputLabel();
$dispatch{$tSymbol->getName()} = $tSymbol->getInputLabel();
$dispatch{"query"} = sub { $tql->query(@_); };
$dispatch{"exit"} = \&Triceps::X::SimpleServer::exitFunc;

Triceps::X::DumbClient::run(\%dispatch);

It's very much like the example shown before in the section 7.8 "Main loop with a socket", with a few differences. Obviously, Tql has been added, and we'll get to that part just in a moment. But the other differences are centered around the way the server and client code has been restructured.

The Triceps::X::DumbClient is a module for testing that starts the server, then starts the client that sends the data to it and reads the result back. Its run method is:

sub run # ($labels)
{
    my $labels = shift;

    my ($port, $pid) = Triceps::X::SimpleServer::startServer(0, $labels);
    my $sock = IO::Socket::INET->new(
        Proto => "tcp",
        PeerAddr => "localhost",
        PeerPort => $port,
    ) or confess "socket failed: $!";
    while(<STDIN>) {
        $sock->print($_);
        $sock->flush();
    }
    $sock->print("exit,OP_INSERT\n");
    $sock->flush();
    $sock->shutdown(1); # SHUT_WR
    while(<$sock>) {
        print($_);
    }
    waitpid($pid, 0);
}

It's really intended only for the very small examples that fit into the TCP buffer, since it sends the whole input before it starts reading the output.

The interesting server things happen inside startServer() which now also stayed almost the same but became a part of a module. The "almost the same" part is about the server loop being able to dispatch not only to the labels but also to the arbitrary Perl functions, citing from the example:

$dispatch{"query"} = sub { $tql->query(@_); };
$dispatch{"exit"} = \&Triceps::X::SimpleServer::exitFunc;


It recognizes automatically whether the entry in the dispatch table is a Label or a function, and handles them appropriately. In the server it's implemented with:


...
        my $label = $labels->{$lname};
        if (defined $label) {
          if (ref($label) eq 'CODE') {
            &$label($line);
          } else { 
            my $unit = $label->getUnit();
            confess "label '$lname' received from client $id has been cleared"
              unless defined $unit;
            eval {     
              $unit->makeArrayCall($label, @data);
              $unit->drainFrame();
            };         
            warn "input data error: $@\nfrom data: $line\n" if $@;
          }        
        } else {
          warn "unknown label '$lname' received from client $id: $line "
        }      
...

And the exitFunc() method is another way to trigger the server exit, instead of makeExitLabel():
sub exitFunc # ($line)
{
    $srv_exit = 1;
}

As you can see, the dispatched functions receive the whole argument line as the client had sent it,  including the label name, rather than having it split by commas. The functions can then do the text parsing in their own way, which comes real handy for TQL. It's convenient for the exit function too, as now there is no need to send the opcode with the "exit" (although X::DumbClient::run() still does send the opcode, to be compatible with the exit label approach, and the extra information doesn't hurt the exit function).


And now, the TQL  definition. The TQL object gets created with the definition of a table, and then the TQL handler function shown above calls the method query() on it:

# The information about tables, for querying.
my $tql = Triceps::X::Tql->new(
  name => "tql",
  tables => [
    $tWindow,
    $tSymbol,
  ],
);


There are multiple ways to create the Tql objects. By default the option "tables" lists all the queryable tables, and their "natural" names will be used in the queries. It's possible to specify the names explicitly as well:

my $tql = Triceps::X::Tql->new(
  name => "tql",
  tables => [
    $tWindow,
    $tSymbol,
    $tWindow,
    $tSymbol,
  ],
  tableNames => [
    "window",
    "symbol",
    $tWindow->getName(),
    $tSymbol->getName(),
  ],
);

This version defines each table under two synonymous names. It's also possible to create a Tql object without tables, and add tables to it later as they are created:

my $tql = Triceps::X::Tql->new(name => "tql");
$tql->addNamedTable(
  window => $tWindow,
  symbol => $tSymbol,
);
# add 2nd time, with different names
$tql->addTable(
  $tWindow,
  $tSymbol,
);
$tql->initialize();

The tables can be added with explicit names or with "natural" names. After all the tables are added, the Tql object has to be initialized. The two ways of creation are mutually exclusive: if the option "tables" is used, the object will be initialized right away in the constructor. If it's not used, the explicit initialization has to be done later. The methods addTable() and addNamedTable() can not be used on an initialized table, and query() can not be used on an uninitialized table.

Sunday, November 25, 2012

TQL: the Trivial Query Language

In the Developer's Guide section 7.8. "Main loop with a socket" I've been showing the execution of the simple queries. I've wanted to use the queries to demonstrate a feature of the streaming functions, so I've substantially extended that example.

Now the query example has grown to have its own language, TQL. You can think of it as a Trivial Query Language or Triceps Query Language. It's trivial, and so far it's of only an example quality, but it's extensible and it already can do some interesting things.

Why not SQL, after all, there are multiple parser building tools available in Perl? Partially, because I wanted to keep it trivial and to avoid introducing extra dependencies, especially just for the examples. Partially, because I don't like SQL. I think that the queries can be expressed much more naturally in the form of shell-like pipelines. Back at DB when I wrote a simple toolkit for querying and comparison of the CSV files (yeah, I didn't find the DBD::CSV module), I've used a pipeline semantics and it worked pretty well. It also did things that are quite difficult with SQL, like mass renaming and reordering of fields, and diffing. Although TQL is not a descendant of the language I've used in that query tool, it is a further development of the pipeline idea.

Syntactically, TQL is very simple: its query is a represented as a nested list, similar to Tcl (or if you like Lisp better, you can think that it's similar to Lisp but with different parentheses). A list is surrounded by curly braces "{}". The elements of a list are either other lists or words, consisting of non-space characters.

{word1 {word21 word22} word3}

Unlike Tcl, there are no quotes in the TQL syntax, the quote characters are just the normal word characters. If you want to include spaces into a word, you use the curly braces instead of the quotes.

{   this is a {brace-enquoted} string with spaces and nested braces  }

Note that the spaces inside a list are used as delimiters and thrown away but within a brace-quoted word-string they are significant. How do you know, which way they will be treated in a particular case? It all depends on what is expected in this case. If the command expects a string as an argument, it will treat it as a string. If the command expects a list as an argument, it will treat it as a list.

What if you need to include an unbalanced brace character inside a string? Escape it with a backslash, "\{". The other usual Perl backslash sequences work too (though in the future TQL may get separated from Perl and then only the C sequences will work, that is to be seen). Any non-alphanumeric characters (including spaces) can be prepended with a backslash too. An important point is that when you build the lists, unlike shell, and like Tcl, you do the backslash escaping only once, when accepting a raw string. After that you can include into the lists of any depth without any extra escapes (and you must not add any extra escapes in the lists).

Unlike shell, you can't combine a single string out of the quoted and unquoted parts. Instead the quoting braces work as implicit separators. For example, if you specify a list as {a{b}c d}, you don't get two strings "abc" and "d", you get four strings "a", "b", "c", "d".

A TQL query is a list that represents a pipeline. Each element of the list is a command. The first command reads the data from a table, and the following commands perform transformations on that data. For example:

{read table tWindow} {project fields {symbol price}} {print tokenized 0}

If the print command is missing at the end of the pipeline, it will be added implicitly, with the default arguments: {print}.

The arguments of each TQL command are always in the option name-value format, very much like the Perl constructors of many Triceps objects. There aren't any arguments in TQL that go by themselves without an option name.

So for example the command "read" above has the option "table" with value "tWindow". The command "project" has an option "fields" with a list value of two elements. In this case the elements are simple words and don't need the further quoting. But the extra quoting won't hurt. Say, if you wanted to rename the field "price" to "trade_price", you use the Triceps::Fields::filter() syntax for it, and even though the format doesn't contain any spaces and can be still used just as a word, it looks nicer with the extra braces:

{project fields {symbol {price/trade_price} }}

I'm sure that the list of commands and their options will expand and change over time. So far the supported commands are:

read
Defines a table to read from and starts the command pipeline.
Options:
table - name of the table to read from.

project
Projects (and possibly renames) a subset of fields in the current pipeline.
Options:
fields - an array of field definitions in the syntax of Triceps::Fields::filter() (same as in the joins).

print
The last command of the pipeline, which prints the results. If not used explicitly, the query adds this command implicitly at the end of the pipeline, with the default options.
Options:
tokenized (optional) - Flag: print in the name-value format, as in Row::printP(). Otherwise prints only the values in the CSV format. (default: 1)

join
Joins the current pipeline with another table.This is functionally similar to LookupJoin, although the options are closer to JoinTwo.
Options:
table - name of the table to join with. The current pipeline is considered the "left side", the table the "right side". The duplicate key fields on the right side are always excluded from the result, like JoinTwo option (fieldsUniqKey => "left").
rightIdxPath - path name of the table's index on which to join. At the moment there is no way to join without knowing the name of the index. (As usual, the path is an array of nested names).
by (semi-optional) - the join equality condition specified as pairs of fields. Similarly to JoinTwo, it's a single-level array with the fields logically paired:{leftFld1 rightFld1 leftFld2 rightFld2 ... }.  Options "by" and "byLeft" are mutually exclusive, and one of them must be present.
byLeft (semi-optional) - the join equality condition specified as a transformation on the left-side field set in the syntax of Triceps::Fields::filter(), with an implicit element {!.*} added at the end. Options "by" and "byLeft" are mutually exclusive, and one of them must be present.
leftFields (optional) - the list of patterns for the left-side fields to pass through and possibly rename, in the syntax of  Triceps::Fields::filter(). (default: pass all, with the same name)
rightFields (optional) - the list of patterns for the right-side fields to pass through and possibly rename, in the syntax of Triceps::Fields::filter(). The key fields get implicitly removed before. (default: pass all, with the same name)
type (optional) - type of the join, "inner" or "left". (default: "inner")

where
Filters/selects the rows.
Options:
istrue - a Perl expression, the condition for the rows to pass through. The particularly dangerous constructions are not allowed in the expression, including the loops and the general function calls. The fields of the row are referred to as $%field, these references get translated before the expression is compiled.

Here are some examples of the Tql queries, with results produced from the output of the code examples I'll show in a moment.

> query,{read table tSymbol}
lb1read OP_INSERT symbol="AAA" name="Absolute Auto Analytics Inc" eps="0.5"
+EOD,OP_NOP,lb1read

Reads the stock symbol information table and prints it in the default tokenized format. The result format is a bit messy for now, a mix of tokenized and CSV data. In the previous examples in the chapter 7 I've been marking the end-of-data either by a row with opcode OP_NOP or not marking it at all. For the TQL queries I've decided to try out a different approach: send a CSV row on the pseudo-label "+EOD" with the value equal to the name of the label that has been completed. The labels with names starting with "+" are special in this convention, they represent some kind of metadata.

The name "lb1read" in the result rows is coming from an auto-generated label name in TQL. It will probably become less random-looking in the future, but for now I haven't yet figured out the best way to to it.

 > query,{read table tWindow} {project fields {symbol price}}
 lb2project OP_INSERT symbol="AAA" price="20"
lb2project OP_INSERT symbol="AAA" price="30"
+EOD,OP_NOP,lb2project

Reads the trade window rows and projects the fields "symbol" and "price" from them.

> query,{read table tWindow} {project fields {symbol price}} {print tokenized 0}
lb2project,OP_INSERT,AAA,20
lb2project,OP_INSERT,AAA,30
+EOD,OP_NOP,lb2project

The same, only explicitly prints the data in the CSV format.

 > query,{read table tWindow} {where istrue {$%price == 20}}
 lb2where OP_INSERT id="3" symbol="AAA" price="20" size="20"
+EOD,OP_NOP,lb2where

Selects the trade window row with price equal to 20.

> query,{read table tWindow} {join table tSymbol rightIdxPath bySymbol byLeft {symbol}}
join2.out OP_INSERT id="3" symbol="AAA" price="20" size="20" name="Absolute Auto Analytics Inc" eps="0.5"
join2.out OP_INSERT id="5" symbol="AAA" price="30" size="30" name="Absolute Auto Analytics Inc" eps="0.5"
+EOD,OP_NOP,join2.out

Reads the trade window and enriches it by joining with the symbol information.

A nice feature of TQL is that it allows to combine the operations in the pipeline in any order, repeated any number of times. For example, you can read a table, filter it, join with another table, filter again, join with the third table, filter again and so on. SQL in the same situation has to resort to specially named clauses, for example WHERE filters before grouping and HAVING filters after grouping.

Of course, a typical smart SQL compiler would determine the earliest application point for each WHERE sub-expression and build a similar pipeline. But TQL allows to keep the compiler trivial, following the explicit pipelining in the query. And nothing really prevents a smart TQL compiler either, it could as well analyze, split and reorder the pipeline stages.

Saturday, November 24, 2012

broken blog label

I've found out that the label "c++" doesn't work correctly. Weird thing, it does work on the post list in the blog editing mode, but not in the blog reader. I guess, the handling of the non-word characters got screwed up somewhere in the reader.

To work around this issue, I'll be using the label "cpp" instead from now on.

Tuesday, November 20, 2012

Table dump

Another intermediate step for the example I'm working on is the table dumping. It allows to iterate on a table in a functional manner.

A new label "dump" is added to the table and its FnReturn. Whenever the method dumpAll() is called, it sends the whole contents of the table to that label. Then you can set a binding on the table's FnReturn, call dumpAll(), and the binding will iterate through the whole table's contents.

The grand plan is also to add the dumping by a a condition that selects a sub-index, but it's not implemented yet.

It's also possible to dump in an alternative order: dumpAllIdx() can send the rows in the order of any index, rather than the default first leaf index.

If you want to get the dump label explicitly, you can do it with

my $dlab = $table->getDumpLabel();

Normally the only reason to do that would be to add it to another FnReturn (besides the table's FnReturn). Chaining anything else directly to this label would not make much sense, because the dump of the table can be called from many places, and the directly chained label will receive data every time the dump is called.

The typical usage looks like this:

    Triceps::FnBinding::call(
        name => "iterate",
        on => $table->fnReturn(),
        unit => $unit,
        labels => [
            dump => sub { ... },        ],
        code => sub {
            $table->dumpAll();
        },
    );

It's less efficient than the normal iteration but sometimes comes handy.

Normally the rowops are sent with the opcode OP_INSERT. But the opcode can also be specified explicitly:

$table->dumpAll($opcode);

The alternative order can be achieved with:

$table->dumpAllIdx($indexType);
$table->dumpAllIdx($indexType, $opcode);

As usual, the index type must belong to the exact type of this table. For example:

$table->dumpAllIdx($table->getType()->findIndexPath("cb"), "OP_NOP");

And some more interesting examples will be forthcoming later.

Thursday, November 15, 2012

test code restructuring

There is one more feature of the streaming functions I want to show but it requires a bit of work. The feature itself is small and easy but a good way to show it an example is in the user queries on a socket, which is a little involved.

With that goal in mind, so far I've done some restructuring. It's been an inconvenience to not share the code between the example files, requiring to either put everything that uses a certain code fragment into one file, or to copy that fragment around.

Now there is a place for such code, collected under the namespace Triceps::X. X can be thought of as a mark of eXperimental, eXample, eXtraneous code. This code is not exactly of production quality but is good enough for the examples, and can be used as a starting point for development of the better code. Quite a few fragments of Triceps went this way: the joins have been done as an example first, and then solidified for the main code base, and so did the aggregation.

The socket-handling examples discussed in the section 7.8. "Main loop with a socket" of the manual have been moved there. The server part became Triceps::X::SimpleServer, and the client part became Triceps::X::DumbClient. More to be added soon.

Another module that got extracted is Triceps::X::TestFeed. It's a small infrastructure to run the examples, pretending that it gets the input from stdin and sends output to stdout, while actually doing it all in memory. I haven't been discussing it much, but all of the more complicated examples have been written to use it. It also shows once in a while in the blog when I forget to edit the code to pretend that it uses stdin/stdout, and then a &readLine shows instead of <STDIN>, and a &send instead of print (and for the manual I have a script that does these substitutions automatically when I insert the code examples into it).

Saturday, November 10, 2012

Streaming functions and template results

The same way as the FnReturns can be used to get back the direct results of the operations on the tables, can be also used on the templates in general. Indeed, it's a good idea to have a method that would create an FnReturn in all the templates. So I went ahead and added it to the LookupJoin, JoinTwo and Collapse.

For the joins, the resulting FnReturn has one label "out". It's created similarly to the table's:

my $fret = $join->fnReturn();

And then it can be used as usual. The implementation of this method is fairly simple:

sub fnReturn # (self)
{
    my $self = shift;
    if (!defined $self->{fret}) {
        $self->{fret} = Triceps::FnReturn->new(
            name => $self->{name} . ".fret",
            labels => [
                out => $self->{outputLabel},
            ],
        );
    }
    return $self->{fret};
}

All this kind of makes the method lookup() of LookupJoin redundant, since now pretty much all the same can be done with the streaming function API, and even better, because it provides the opcodes on rowops, can handle the full processing, and calls the rowops one by one without necessarily creating an array. But it could happen yet that the lookup() has some more convenient uses too, so I didn't remove it yet.

For Collapse the interface is a little more complicated: the FnReturn contains a label for each data set, named the same as the data set. The order of labels follows the order of the data set definitions (though right now it's kind of moot, because only one data set is supported). The implementation is:

sub fnReturn # (self)
{
    my $self = shift;
    if (!defined $self->{fret}) {
        my @labels;
        for my $n (@{$self->{dsetnames}}) {
            push @labels, $n, $self->{datasets}{$n}{lbOut};
        }
        $self->{fret} = Triceps::FnReturn->new(
            name => $self->{name} . ".fret",
            labels => \@labels,
        );
    }  
    return $self->{fret};
}   

It uses the new element $self->{dsetnames} that wasn't present in the code shown before. I've added it now to keep the array of data set names in the order they were defined.

Use these examples to write the fnReturn() in your templates.

Wednesday, November 7, 2012

Streaming functions and tables

The Copy Tray used in the tables in the version 1.0 was really a precursor to the streaming functions. Now when the full-blown streaming functions became worked out, there is no sense in keeping the copy trays any more, so I've removed them.

Instead, I've added a Table method that gets the FnReturn for that table:

$fret = $table->fnReturn();

The return contains the labels "pre", "out", and the named labels for each aggregators. The FnReturn object is created on the first call of this method and is kept in the table. All the following calls return the same object. This has some interesting consequences for the "pre" label: the rowop for the "pre" label doesn't get created at all if there is nothing chained from that label. But when the FnReturn gets created, one of its labels gets chained from the "pre" label. Which means that once, you call $table->fnReturn() for the first time, you will see that table's "pre" label called in all the traces. It's not a huge extra overhead, but still something to keep in mind and not be surprised when calling fnReturn() changes all your traces.

The produced FnReturn then gets used as any other one. If you use it with an FnBinding that has withTrace => 1, you get an improved equivalent of the Copy Tray. For example:

$fret2 = $t2->fnReturn();
$fbind2 = Triceps::FnBinding->new(
    unit => $u1,
    name => "fbind2",
    on => $fret2,
    withTray => 1,
    labels => [
        out => sub { }, # another way to make a dummy
    ],
);

$fret2->push($fbind2);
$t2->insert($r2);
$fret2->pop($fbind2);

# $ctr is the Copy Tray analog
$ctr = $fbind2->swapTray(); # get the updates on an insert

Of course, most of the time you would not want to make a dummy label and then iterate manually through the copy tray. You would want to create bindings to the actual next logical labels and simply execute them, immediately or delayed with a tray.