Tuesday, July 29, 2014

compact vs detailed

A few days ago I've been editing the Developer's Guide chapter on TQL when an idea had occurred to me:

Perhaps, what makes the typical CEP model so hard to read is that they try to represent both the topology and all the details in one go. Yes, I'm a fan of reading the program text sequentially, and I've been saying that the problem with SQL is that it's written backwards: each statement specifies the results first and then the computation; if the results went after computation, the whole flow would be much easier to read sequentially. But it still doesn't solve the problem of branching, and maybe doesn't solve the fundamental difficulty at all.

Maybe the right way to write the CEP programs is to write out the pipelines in a short notation, and then define the elements of these pipelines in all detail separately. So the main program might look something like:

inputTrades | checkTrades -> storeTrades
inputSymbols -> storeSymbols
(storeTrades, storeSymbols) | enrichTrades | outputData

and then each of these stages can be defined in detail separately. No arguments, no details at all in the pipeline  specification. All the details would go into the detailed definitions. So each module would consist of these two parts: the pipeline outline and the detailed definitions of the pipeline elements. And then these modules can be combined into the higher-level modules.

Though it's still not clear, how to show say the maintenance with the expiration of the data in this notation? Would it make sense in the more complicated cases?

Monday, July 28, 2014

Triceps 2.0.0 released

It's a grand day for Triceps: the version 2.0 is finally released. There aren't many code changes since the last snapshot but the documentation has been collected and edited together. The documentation has been a huge work that took over a year.

As a reminder, the code features of 2.0 include such things as the multithreading support, the streaming functions, TQL, and great many small improvements.

Sunday, July 20, 2014

a minor TQL clean-up

I've been working on the documentation chapter about TQL, and noticed that its single-threaded version has not been updated to use the query name as the label in the response. So I've fixed it. The previous output like

query,{read table tSymbol}
lb1read OP_INSERT symbol="AAA" name="Absolute Auto Analytics Inc" eps="0.5"
+EOD,OP_NOP,lb1read

now becomes

query,{read table tSymbol}
query OP_INSERT symbol="AAA" name="Absolute Auto Analytics Inc" eps="0.5"
+EOD,OP_NOP,query

I've also added the optional argument for name override in SimpleServer::makeServerOutLabel().

Sunday, July 13, 2014

monads

I've recently been bullshitting onna internets with some fans of the functional programming, and the said fans are all ablaze about the monads. Yet they have a very hard time explaining what's the benefit of the monads, and even what these monads are. So I've got around and read up about this stuff, and bullshitted some more, and here it is: the short explanation of the monads in programming in all their glory. This is the third version of it that I've figured out, and even though it might happen that I'd have more, I think that I've got to the bottom of it this time. The first version of my understanding was gained from Wikipedia, and as it always happens with Wikipedia,  it turned out to be crap. Not all crap, only half-crap, but still.

If you're interested in the benefits, let me tell it up front: they seem to be that the glorious community of functional programming can now honestly use the loops and imbibe all their goodness with break and continue statements, and even exceptions, without losing the cherry of the strict mathematical theory. For the rest of us, nothing new.

Now, to the details. What the monads are really about is the iteration on containers, and nesting these iterations. For use in a monad, a container of objects has to have a couple of properties.

First, if you have a container, you must be able to iterate over it and create another container with exactly the same structure (i.e. if you can find an element in the first container in some way, you must be able to find an element in the second container in the same way) but different data (difference may be both in the value and the type of the data). The data is supposed to  be the result of some operation of the data from the first container at the same position but the concept of operation is quite wide-open.

This property means that if your container is a list, you need to know how to create another list of the same number of elements, if it's an associative array, you need to know how to create another associative array with the same keys, if your container is a tree, you need to know how to create another tree with the exact same structure.

Second, if you have two containers, you need to have some rule on how to "combine" them into the third container of the same type. The structure of that third container can be any, as long as it follows some consistent rule, and the data elements represent the results of some operation on one element from the first container and one element from the second container. For example, for the lists you can pick the rule that concatenates them, or produces every possible pair of elements from the first list and the second list, or combines the elements at the same position in both lists. Just as in the first property, the data placed in the result would be the result of some operation of the data from the two incoming containers.

What the monad does is apply that combination rule over a sequence of containers, combining it into one. The first two get combined together, then that one is combined with the third one, and so on. The monad is really a bunch of nested loops.  In the texts about monads they like to discuss the examples monads on Haskell kinds of containers, like lists, and Maybe (a value that may be NULL), and IO (a strange container that performs input/output), but for all I can tell, the most entertaining and straightforward container would be an associative array, or as its variety, a database table.

For example, the following Perl loop represents a monad (remember, the data in the result may any operation on the data extracted from the argument container).

my %result;
my %a = ...; my %b = ...; my %c = ...;

while (($ka, $va) = each %a) {
  while (($kb, $vb) = each %b) {
    while (($kc, $vc) = each %c) {
      $result{"$ka,$kb,$kc"} = printf("%d" , $vc? $va: $vb);
    }
  }
}

And if A, B and C were tables, that would be an unconditional join. The loops don't have to go over the whole container, they may even pick only one or even zero elements, and if such is the rule, they may as well be ifs:

while (($ka, $va) = each %a) {
  if (exists $b{$ka} ) {
    if (exists $c{$ka} ) {
      $result{"$ka"} = [ $va, $b{$ka}, $c{$ka} ];
    }

  }

}

And it's also a monad! If these were tables, it would be a left join.

The first property of the container lets you do a loop without nesting, and the second property allows to do any number of nestings by applying it recursively.


The rules as stated above say that each nesting must follow the same rule of iteration on the same container type. But there is a weaseling way around it: we can always say that the container is a union that also contains the indication of what operation needs to be done with it, and then the rule that combines two containers would look at this indication in the argument containers and act accordingly. The type Maybe from Haskell does exactly this (and a monad on it provides the Haskell version of what you can think of as exceptions or break/continue). We can also weasel further and say that oh, we know this indication hardcoded at the compilation time, so instead of putting it into the container and then extracting it, we'll just hardcode the loop appropriately (and yes, Haskell does this). The end result is that ANY nested loops can form a monad. And since the if is a special case of a loop, and the unconditional execution is a special case of an if, any procedural code can be a monad.


Not everything would be a monad, but guess what, you can nest monads in monads, and that would give everything. And the returned container doesn't have to be anything like any of the containers involved in the loop, it's the same weasely thing.


The only thing that is left separating the monads from the usual procedural code is that all the data defined outside the loops is read-only (the local variables in the loop blocks can be read-write, this also can be explained in the weasely ways, however the variables outside the current block also can't be changed), and the only result produced is the returned container. However the result can contain a description of how the outside data needs to be modified, and the caller of the monad is then free to apply that description. It's just the ass-backwards ways of the functional programming. So with enough nested and recursive monads (not just nested loops but nested monads) and this apply-description business, all this monkeying can properly simulate the procedural programming.

That description thing can also be though of as the redo log of a transaction: first the redo log is collected and then the whole transaction is applied. Obviously, for the operations to see the state as modified by the previous operations in a transaction, they would have to dig through the redo log.


The definition of the container is also pretty wide. Containers may among other things be infinite - "for (;;)" or "for (i=0;; i++)"  are examples of iteration over an infinite container.


This brings us to the point of how the monads are related to CEP.  Each nesting level in the monad gets an item of key and data from the previous level, uses it somehow to select zero or more values for the container it has, and sends to the next nesting level a newly combined item of key and data information for every value selected. It's a pipeline! The pipeline approach was actually the second version of my understanding of the monads.


What kind of a pipeline it is? It's a straight one, no branching, nor god forbid, joining. And it's a read-only one: each stage of the pipeline contains no modifiable state, it's all read-only. All the updates have to be carried in the data traveling through the pipeline. And each stage of the pipeline essentially sends out a copy of its incoming data with its own section added to it.


Quite boring and useless. By the way, note that the join as described before would work only on the static tables, not on streams.

Now we get a bit off the monad subject as such and get into the depths of Haskell's dealing with pipelines.

Of course there are weaseling ways around the boring limitations. First, each stage of the pipeline may contain the nested pipelines (i.e. nested monads), and may decide, which of the nested pipelines to take. That gives the fork-join topology (but as you may notice no "fork-and-no-join"). Second, these nested pipelines can be nested further to infinity. Third, the input point itself is a data object that can be passed through the pipeline.


This third item is something that takes some time getting the mind around. Basically, the first stage in the pipeline says "I'm not going to get the input from the outside world any more, instead I'm passing this functionality to the second stage". Obviously, after that the first stage becomes dead and can as well be discarded. Well, not always: there may be MULTIPLE data input points, either within one stage (remember, they are just data, and can be held together), or sprinkled throughout the pipeline. The first stage will become dead only after it get rids of all its inputs. But let's just say that there is one input point.


So, with all this weaseling a Haskell program simulates the changing states with two pipelines (i.e. monads):


The pipeline A implements the computation of the next application state.


The pipeline B consists of two stages:
1. Read one item of data, call pipeline A, collect the result form it, and send its result along with the input point to the stage 2.
2. Instantiate a nested copy of the pipeline B,  then send the input point and the state to that pipeline, and let it do its work.


It's an infinite number of pipelines, created as long at there is more data coming. And being the tail recursion, Haskell optimizes it out by replacing the second stage of pipeline B with a new copy of pipeline B, instead of nesting it. And it discards the old stage 1 after it becomes dead. It really becomes a dynamic pipeline, endlessly growing at the back and dying out at the front. Of course, when things get to the actual running, that gets optimized out too. Instead of creating a new pipeline they just turn back to the old pipeline, say "now it's the new pipeline", give it the new state and let it run along.


As far as the parallelism is concerned, the stages of the pipeline A can work in parallel, but then when the time comes to apply the result and get the next input, it hiccups and becomes single-threaded.


Is there an important lesson in this? I don't see any, at least yet. Other than an interesting mental exercise in the ass-backwardness. And from all the asking of the monad enthusiasts on the internets, what exactly is the benefit they get from the monads, I didn't get any answer other than the hand-waving about the Mathematical Foundations. For all I can tell, there isn't one.

If you're interested in learning more about Haskell and its monads, I can recommend http://learnyouahaskell.com/

Saturday, June 21, 2014

status update

In case if you wonder, Triceps is still alive, and I'm getting close to finishing the documentation for 2.0. There is only about 30 posts left to fold in. But unfortunately I've been very busy recently, so this keeps dragging on and on.

Saturday, May 3, 2014

shared_ptr-2

I've figured out, what was my problem with shared_ptr. The problems there start with using reset(pte) to store a newly constructed object in an existing shared_ptr. Which provokes into using reset(ptr) for storing any pointers. The solution is to never use reset(ptr). Instead either always turn the pointers into shared_ptr first, then assign them. Such as one of:

something->shared_ = make_shared<Object>(args);
something->shared_ = shared_ptr<Object>(new Object(args));

or even make the constructors protected and wrap them into the static methods that would return a shared_ptr.

Which puts shared_ptr close enough to Autoptr. It's still more awkward and less efficient but a potential advantage is that it can handle any object, without the reference counting built into the objects themselves.

A consequence of this approach is that they absolutely need weak_ptr, they can't just have the naked pointers sitting around pointing up the trees, that break sthe whole paradigm. Having the weak references is also convenient if you really need them. But as my experience with Triceps shows, they are generally overrated. Never had to use them for Triceps, and the ptwrap library I use (and wrote) has them!

If used to point up the tree, there is not that much advantage compared to a simple pointer because it's typically easy for the code to just hold that up-the-tree object separately while dealing with it. If used between the peer objects, one of the objects can usually be split into two parts. I.e. If there are objects A and B, B is usually interested not in the whole A but in a part of it. So separate that part into the object C. Then A would refer to B and C, and B would refer to C.

Friday, May 2, 2014

std::shared_ptr

I've tried recently to use the C++11 version of the reference counting, the class std::shared_ptr. And I must say, it's crap. Utter and complete crap.

Their problem is that they are trying to do reference counting to any structures at all, without storing the reference counter in the structure itself. The consequence is that you absolutely can't mix the shared_ptr and pointers. Once you store a pointer in a shared_ptr , you have to refer to it by shared_ptr, and assign the shared_ptr values to each other. If you take a pointer and store it in another shared_ptr, you end up with two reference counters to the same structure, and end up with the memory corruption. And it's way to easy to make this mistake. It's really not usable.

The end result, my Autoref template is way, way, way better and safer to use.

Thursday, May 1, 2014

auto-detecting the file handle class when passing it through the App

I've been editing the documentation for the passing of the file descriptors between threads, and I've realized that there is no need to specify the class of the Perl file handle when it gets loaded from the App. Instead the name of the class can be easily stored along with the file descriptor and then extracted back. So I went and changed the code to do that. The modifications are:

In the App class the method storeFd() takes an extra argument:

$app->storeFd($name, $fd, $className);

The $className specifies the class of the file object. The empty string can be used as a synonym for "IO::Handle", since ref() returns an empty string for the globs of the old-fashioned file handles.

The methods loadFd() and loadDupFd() now return two values:

($fd, $fclass) = $app->loadFd($name);
($fd, $fclass) = $app->loadDupFd($name);

The second returned value is the class name, as it was stored by storeFd().

And the methods App::loadDupSocket(), TrieadOwner::trackDupSocket() and TrieadOwner::trackGetSocket() have been removed. Instead App::loadDupFile(), TrieadOwner::trackDupFile() and TrieadOwner::trackGetFile() have been updated to get the stored file handle class and use it transparently, so they just work correctly for the sockets now.

The methods App::loadDupFileClass(), trieadOwner::trackDupClass() and TrieadOwner::trackGetClass() are still present, in case if you would want to override the class name, but now they should be pretty much never needed, since any class names should be handled automatically without the need for overrides.

And the C++ App class got changed a little bit as well, with the extended versions of storeFd() and loadFd():

void storeFd(const string &name, int fd);
void storeFd(const string &name, int fd, const string &className);
int loadFd(const string &name, string *className = NULL) const;

The new interface is backwards-compatible with the old one but also has the provision for storing and loading the file class name.

Sunday, April 27, 2014

code snippets conversions

I've got to editing the description of how the Perl API classes with threading support allow to specify the source code snippets instead of code references for their arguments, and it made me think that for consistency probably the same behavior should be supported throughout the whole API. So I went and added this support. Now you can use the source code snippets like this everywhere, say in defining a Label.

And to let the templates have the same feature, I've added the method Triceps::Code::compile() that will do the same support in Perl. It's used as:

$code = Triceps::Code::compile($code_ref_or_source);
$code = Triceps::Code::compile($code_ref_or_source, $description);

It takes either a code reference or a source code string as an argument and returns the reference to the compiled code. If the argument was a code reference, it just passes through unchanged. If it was a source code snippet, it gets compiled.

If the argument was an undef, it also passes through unchanged. This is convenient in case if the code is optional. But if it isn't then the caller should check for undef.

If the compilation fails, the method confesses, and includes the error and the source code into the message, in the same way as the XS methods do.

The optional argument $description can be used to provide information about the meaning of the code for the error messages. If it's undefined then the default is "Code snippet".

Friday, April 11, 2014

callBound

I've found that I've missed documenting yet another way to call a streaming function in Perl, the method Unit::callBound().

$unit->callBound($rowop_or_tray, $fnreturn => $fnbinding, ...);
$unit->callBound([@rowops], $fnreturn => $fnbinding, ...);

It's an encapsulation of a streaming function call, a great method if you have all the rowops for the call available upfront.. The first argument is a rowop or a tray or a reference to an array of rowops (but the trays are not allowed in the array). The rest are the pairs of FnReturns and FnBindings. The bindings are pushed onto the FnReturns, then the rowops are called, then the bindings are popped. It replaces a whole block that would contain an AutoFnBind and the calls:

{
  my $ab = Triceps::AutoFnBind->new(
    $fnreturn => $fnbinding, ...
  );
  $unit->call($rowop_or_tray);
}

Only callBound() does its work in C++, so it's more efficient than a Perl block, and it's shorter to write too.

Sunday, March 23, 2014

debugging

I've recently attended a training on debugging on Windows (pretty good one, by Wintellect, if you're interested). And all along I've been impressed how the features of WinDbg are similar to the features I wrote back at Aleri into the debugger for the Aleri CEP virtual machine. (Okay, gdb is probably similar too but I've never learned it in depth, I've never been a big fan of debuggers). I guess the fundamentals of debugging are the same, no matter what is the platform. Only I also wrote into it a way to go backwards and backtrace after hitting a breakpoint, what led to the current situation.

Braced

I've found that the Braced package was undocumented. First of all, I've added the proper unit tests for it, and thus renamed it from Triceps::X::Braced to Triceps::Braced (the X namespace is for the packages with limited testing). I've also renamed the methods, replacing the word "quote" with "escape", to make their meaning clearer.  And here goes the documentation section that I wrote for the upcoming manual:


The package Braced is designed to parse the Tcl-like nested lists where the elements are separated by whitespace, and braces are used to enquote the elements with spaces in them. These lists are used to write the pipelines that form the Tql queries. For example:

{read table tWindow} {project fields {symbol price}} {print tokenized 0}

These lists can then be parsed into elements, and the elements might be also lists that could be parsed into elements and so on. The spaces between the braces are optional, braces also serve as separators. For example, the following lines are equivalent:

a b c
{a} {b} {c}
{a}{b}{c}
{a}b{c}

In case if a brace character needs to be included into one of the strings, they can be escaped by backslashes, for example:

{a\{} b\}c

Any other Perl backslash escapes, such as \n or \x20, work too. The quote characters have no special meaning, they don't need to be escaped and they don't group the words. For example, the following two are equivalent:

"a b c"
{"a} {b} {c"}

Escaping the spaces (\ ) provides another way to combine the words into one element. The following two are equivalent:

{a b c}
a\ b\ c

 There is no need for the nested escaping. The characters need to be escaped only once, and then the resulting strings can be wrapped into any number of brace levels.

All the methods in this module are static, there are no objects.

$string = $data;
@elements = Triceps::Braced::raw_split_braced($string)
confess "Unbalanced braces around '$string'" if $string;

Split the string into the braced elements. If any of the elements were enclosed into their own braces, these braces are left in place, the element string will still contain them. For example, a {b} {c d} will be split into a, {b}, {c d}. No unescaping is done, the escaped characters are passed through as-is. This method of splitting is rarely used, it's present as a baseline.

The original string argument will be fully consumed. If anything is left unconsumed, this is an indication of a syntax error, with unbalanced braces. The argument may not be a constant because it gets modified.

$string = $data;
@elements = Triceps::Braced::split_braced($string)
confess "Unbalanced braces around '$string'" if $string;

Split the string into the braced elements. If any of the elements were enclosed into their own braces, these braces will be removed from the results. For example, a {b} {c d} will be split into a, b, c d. No unescaping is done, the escaped characters are passed through as-is. This is the normal method of splitting, it allows the elements to be split further recursively.

The original string argument will be fully consumed. If anything is left unconsumed, this is an indication of a syntax error, with unbalanced braces. The argument may not be a constant because it gets modified.

$result = Triceps::Braced::bunescape($string);

Un-escape a string by processing all the escape characters in it. This step is normally done last, after all the splitting is done. The result will become unsuitable for the future splitting because the escaped characters will lose their special meaning. If any literal braces are present in the argument, they will pass through to the result as literals. For example, {a \{b } will become {a {b }.

@results = Triceps::Braced::bunescape_all(@strings);

Perform the un-escaping on a whole array of strings. The result array will contain the same number of elements as the argument.

$ref_results = Triceps::Braced::split_braced_final($string);
confess "Unbalanced braces around '$string'" if $string;

The combined functionality of splitting a string and un-escaping the result elements. That's why it's final: no further splits must be done after un-escaping. The return value is different from the other split methods. It is a reference to the array of result strings. The difference has been introduced to propagate the undef from the argument to the result: if the argument string is undef, the result will be also undef, not a reference to an empty array. The string gets consumed in the same way as for the other split methods, and anything left in it indicates an unbalanced brace.

Thursday, March 6, 2014

automatic views for the normalized databases

I'm still working on the editing of the docs for 2.0. In the meantime, I want to bullshit on a general subject.

When designing a database schema, there is always this fine line between normalization and denormalization. Denormalize too much and you get the same data in lots of places, with the possibility of them getting dissynchronized. Normalize too much and every query becomes a join of a dozen tables. I've had to deal with a highly normalized database, and I must tell you, it's been quite a pain.

Well, views to the rescue, right? But the views need to be defined, and then their capacity of inserts and updates are quite limited. Let's see if it can be done better.

When the foreign keys are explicitly defined, they can be used to implicitly define a view. For example, suppose that we have a set of tables:

TABLE Employee (
  EmpId INTEGER PRIMARY KEY,
  EmpFirstName VARCHAR,
  EmpLastName VARCHAR,
  EmpStartDate DATE,
  EmpSalary FLOAT
);

TABLE Department (
  DeptId INTEGER PRIMARY KEY,
  DeptName VARCHAR PRIMARY KEY
);

TABLE Title (
  TitleId INTEGER PRIMARY KEY,
  TitleName VARCHAR PRIMARY KEY,
  TitleLevel INTEGER
);

TABLE Position (
  EmpId INTEGER FOREIGN KEY (Employee.EmpId),
  DeptId INTEGER FOREIGN KEY (Department.DeptId),
  TitleId INTEGER FOREIGN KEY (Title.TitleId)
);

The knowledge of the foreign keys allows the DBMS to create implicitly a view

CREATE VIEW Position_EXP
AS SELECT *
FROM Position, Employee, Department, Title
WHERE Position.EmpId = Employee.EmpId
  AND Position.DeptId = Department.DeptId
  AND Position.TitleId = Title.TitleId;

Okay, I've made things a bit easier by making sure that the fields names don't overlap but in general nothing stops the DBMS from doing this automatically, adding prefixes to the field names if needed. The queries like

SELECT EmpStartDate
FROM Position_EXP
WHERE DeptName='Marketing' AND TitleLevel > 5;

become much more compact with this view.

Better yet, it's pretty straightforward to use this view for updates. For example,

UPDATE Position_EXP
SET TitleName = 'Engineer 2'
WHERE TitleName = 'Engineer 1' AND EmpStartDate > 2010-10-10;

What it translates to is: first find the TitleId for TitleName 'Engineer 2' then set it into the Position rows that match the conditions. It's not something a normal RDBMS allows to do with a view, and normally this would require two separate SQL statements, but the translation is fairly straightforward. And there is no reason why RDBMS can't translate it automatically.

We can do inserts too. For example:

INSERT INTO Employee_EXP, Position_EXP
VALUES (
  EmpId = EmpIdSequence(),
  EmpFirstName = 'John,
  EmpLastName = 'Doe',
  EmpStartDate = TODAY(),
  EmpSalary = 123456,
  DeptName = 'Engineering',
  TitleName = 'Senior Engineer'
);

Some values would go directly into the Employee table, some into the Position table, and some would be looked up from the helper tables to create the binding by ids. I've listed two table names in INSERT to give an explicit hint to what tables under the views are getting inserts, as opposed to just being used for look-ups, since I think it would be too dangerous to insert things that weren't meant to be inserted.

The statement is small and short, just as for the denormalized tables (maybe even shorter) but handles all the underlying normalization. And again, it's pretty straightforward to deduce the meaning of the statement automatically in the SQL parser.

Saturday, January 18, 2014

performance variations

I've had an opportunity to run the performance tests on a few more laptops. Al of the same Core2 generation, but with the different CPU frequencies. The 2GHz version showed expectedly an about 10% lower performance, except for the inter-thread communication through the nexus: it went up. On a 3GHz CPU all the performance went up about proportionally. So I'm not sure, what was up with the 2.2GHz CPU, maybe the timing worked out just wrong to add more overhead.

Here are the result from a 3GHz CPU:

Performance test, 100000 iterations, real time.
Empty Perl loop 0.006801 s, 14702927.05 per second.
Empty Perl function of 5 args 0.031918 s, 3133046.99 per second.
Empty Perl function of 10 args 0.027418 s, 3647284.30 per second.
Row creation from array and destruction 0.277232 s, 360708.19 per second.
Row creation from hash and destruction 0.498189 s, 200727.04 per second.
Rowop creation and destruction 0.155996 s, 641043.69 per second.
Calling a dummy label 0.098480 s, 1015437.21 per second.
Calling a chained dummy label 0.110546 s, 904601.83 per second.
  Pure chained call 0.012066 s, 8287664.25 per second.
Calling a Perl label 0.512559 s, 195099.61 per second.
Row handle creation and destruction 0.195778 s, 510781.66 per second.
Repeated table insert (single hashed idx, direct) 0.772934 s, 129377.12 per second.
Repeated table insert (single hashed idx, direct & Perl construct) 1.109781 s, 90107.89 per second.
  RowHandle creation overhead in Perl 0.336847 s, 296871.05 per second.
Repeated table insert (single sorted idx, direct) 2.122350 s, 47117.59 per second.
Repeated table insert (single hashed idx, call) 0.867846 s, 115227.88 per second.
Table insert makeRowArray (single hashed idx, direct) 1.224588 s, 81660.14 per second.
  Excluding makeRowArray 0.947355 s, 105557.02 per second.
Table insert makeRowArray (double hashed idx, direct) 1.443053 s, 69297.51 per second.
  Excluding makeRowArray 1.165821 s, 85776.47 per second.
  Overhead of second index 0.218466 s, 457738.04 per second.
Table insert makeRowArray (single sorted idx, direct) 29.880962 s, 3346.61 per second.
  Excluding makeRowArray 29.603730 s, 3377.95 per second.
Table lookup (single hashed idx) 0.287407 s, 347938.44 per second.
Table lookup (single sorted idx) 9.160540 s, 10916.39 per second.
Lookup join (single hashed idx) 3.940388 s, 25378.21 per second.
Nexus pass (1 row/flush) 0.618648 s, 161642.86 per second.
Nexus pass (10 rows/flush) 2.417818 s, 413596.01 per second.


I've added a few more tests: the table look-ups, and the passing of rows through the nexus with 10 rows per batch. With the batching, the inter-thread communications work quite decently fast.

Sunday, January 5, 2014

added a little missing method

When editing the docs, I've noticed an incompleteness in AggregatorGadget, so I've added the method that was missing:

const IndexType *getIndexType() const;

Not that there was any use for it (and the subclasses could just read the field directly), but still.

Friday, December 20, 2013

PowerShell

First, a status update: I've finally resumed the work on the docs for 2.0, albeit it's proceeding slowly yet.

I've been reading recently on PowerShell. A pretty cool tool. I've tried it before and it didn't work well for me then because I didn't understand its purpose. It's not a normal OS shell. Instead, it's the shell for the .NET virtual machine. Exactly the thing that Java is missing, and the gap that it tries to plug with the crap like Ant and Maven, unsuccessfully. PowerShell lets you run all the .NET methods interactively from the command line, and build the pipelines of them. It has some very cool syntax that lets you automatically apply the pipeline input in the same way as the command-line input. It also has the remote execution functionality, so it serves as an analog of the rsh/ssh (more advanced in some ways, less advanced in the others) in the Microsoft ecosystem.

But here is the CEP-related part: The Triceps TQL is not unique in handling the SQL-like queries in the form of pipelines. PowerShell does that too. I guess, it's a fairly obvious idea. You can even treat PowerShell as a rudimentary CEP system, and write the processing in the form of CEP pipelines. There is a major limitation that the pipelines are all linear, with no forking and joining, but on the other hand the pipelines can be used to pass the arbitrarily complex objects, and also a mix of objects of different types, so with some creativity the more complex topologies can be simulated (still no loops though).

Another catch with the pipelines is very similar to a current limitation of TQL: each stage of the pipeline works all by itself. In a SQL statement the query optimizer can turn a WHERE clause into an iteration by a small subset of an index. In TQL and PowerShell there will be an iteration on everything, followed by filtering in a WHERE. The grand plan for TQL is to add a query optimizer to it eventually, that could combine multiple sequential stages of the pipeline into one optimized stage. The other, simpler, alternative that I was considering for the short term is to specify the optimized selection manually as an option to the command that reads the table contents. PowerShell takes that one in many cases, so that say the command that pulls the data from a SqlServer database gets a full SQL query as an argument and does its filtering on the server. But I guess theoretically nothing really stops them from doing some pipeline optimization by pulling the WHERE conditions from a "where" command into the SQL statement for the database selection command. If would save the trouble of the SQL and "where" having different syntax.

Sunday, November 10, 2013

Triceps performance

I've finally got interested enough in Triceps performance to write a little test, Perf.t. By default it runs only one thousand iterations, to be fast and not delay the run of the full tests suite. But the number can be increased by setting an environment variable, like:

$ TRICEPS_PERF_COUNT=100000 perl t/Perf.t

An important caveat, the test is of the Perl interface, so it includes all the overhead of constructing the Perl objects. I've tried to structure it so that some of the underlying performance can be deduced, but it's still approximate. I haven't done the performance testing of just the underlying C++ implementation yet, it will be better.

Here are the numbers I've got on my 6-year old laptop (dual-CPU Intel Core2 T7600 2.33GHz) with explanations. The time in seconds for each value is for the whole test loop. The "per second" number shows, how many loop iterations were done per second.

The computations are done with the real elapsed time, so if the machine is not idle, the time of the other processes will get still counted against the tests, and the results will show slower than they really are.

Performance test, 1000 iterations, real time.

The first thing it prints is the iteration count, to set the expectations for the run length and precision.

Empty Perl loop 0.000083 s, 11983725.71 per second. 

A calibration to see, how much overhead is added by the execution of the loop itself. As it turns out, not much.

Row creation from array and destruction 0.003744 s, 267085.07 per second. 

The makeRowArray() for a row of 5 fields. Each created row gets destroyed before the next one gets created.

Row creation from hash and destruction 0.006420 s, 155771.52 per second.

The makeRowHash() for a row of 5 fields.


Rowop creation and destruction 0.002067 s, 483716.30 per second.

The makeRowop() from an existing row. Same thing, each rowop gets destroyed before constructing the next one.

Calling a dummy label 0.001358 s, 736488.85 per second.

Repeated calls of a dummy label with the same rowop object.

Calling a chained dummy label 0.001525 s, 655872.40 per second.
  Pure chained call 0.000167 s, 5991862.86 per second.



Repeated calls of a dummy label that has another dummy label chained to it. The "pure" part is the difference from the previous case that gets added by adding another chained dummy label.


Calling a Perl label 0.006669 s, 149946.52 per second.

Repeated calls of a Perl label with the same rowop object. The Perl label has an empty sub but that empty sub still gets executed, along with all the support functionality.

Row handle creation and destruction 0.002603 s, 384234.52 per second.

The creation of a table's row handle from a single row, including the creation of the Perl wrapper for the row handle object.

Repeated table insert (single hashed idx, direct) 0.010403 s, 96126.88 per second.

Insert of the same row into a table. Since the row is the same, it keeps replacing the previous one, and the table size stays at 1 row. Even though the row is the same, a new row handle gets constructed for it every time by the table, the code is $tSingleHashed->insert($row1). "Single hashed idx" means that the table has a single Hashed index, on an int32 field. "Direct" means the direct insert() call, as opposed to using the table's input label.

Repeated table insert (single hashed idx, direct & Perl construct) 0.014809 s, 67524.82 per second.
  RowHandle creation overhead in Perl 0.004406 s, 226939.94 per second.


The same, only the row handles are constructed in Perl before inserting them: $tSingleHashed->insert($tSingleHashed->makeRowHandle($row1)). And the second line shows that the overhead of wrapping the row handles for Perl is pretty noticeable (it's the difference from the previous test case).

Repeated table insert (single sorted idx, direct) 0.028623 s, 34937.39 per second.

The same thing, only for a table that uses a Sorted index that executes a Perl comparison on the same int32 field. As you can see, it gets 3 times slower.

Repeated table insert (single hashed idx, call) 0.011656 s, 85795.90 per second.

The same thing, again the table with a single Hashed index, but this time by sending the rowops to its input label.

Table insert makeRowArray (single hashed idx, direct) 0.015910 s, 62852.02 per second.
  Excluding makeRowArray 0.012166 s, 82194.52 per second.


Now the different rows get inserted into the table, each row having a different key. At the end of this test the table contains 1000 rows (or however many were requested by the environment variable). Naturally, this is slower than the repeated insertions of the same row, since the tree of the table's index becomes deeper and requires more comparisons and rebalancing. This performance will be lower in the tests with more rows, since the index will become deeper and will create more overhead. Since the rows are all different, they are created on the fly, so this row creation overhead needs to be excluded to get the actual Table's performance.

Table insert makeRowArray (double hashed idx, direct) 0.017231 s, 58033.37 per second.
  Excluding makeRowArray 0.013487 s, 74143.61 per second.
  Overhead of second index 0.001321 s, 756957.95 per second.


Similar to previous but on a table that has two Hashed indexes (both on the same int32 field). The details here compute also the overhead contributed by the second index.

Table insert makeRowArray (single sorted idx, direct) 0.226725 s, 4410.64 per second.
  Excluding makeRowArray 0.222980 s, 4484.70 per second.


Similar but for a table with a Sorted index with a Perl expression. As you can see, it's about 20 times slower (and it gets even worse for the larger row sets).

Nexus pass 0.034009 s, 29403.79 per second.

The performance of passing the rows between threads through a Nexus. This is a highly pessimistic case, with only one row per nexus transaction. The time also includes the draining and stopping of the app.

And here are the numbers for a run with 100 thousand iterations, for comparison:

Performance test, 100000 iterations, real time.
Empty Perl loop 0.008354 s, 11970045.66 per second.
Row creation from array and destruction 0.386317 s, 258854.76 per second.
Row creation from hash and destruction 0.640852 s, 156042.16 per second.
Rowop creation and destruction 0.198766 s, 503105.38 per second.
Calling a dummy label 0.130124 s, 768497.20 per second.
Calling a chained dummy label 0.147262 s, 679062.46 per second.
  Pure chained call 0.017138 s, 5835066.29 per second.
Calling a Perl label 0.652551 s, 153244.80 per second.
Row handle creation and destruction 0.252007 s, 396813.99 per second.
Repeated table insert (single hashed idx, direct) 1.053321 s, 94937.81 per second.
Repeated table insert (single hashed idx, direct & Perl construct) 1.465050 s, 68257.07 per second.
  RowHandle creation overhead in Perl 0.411729 s, 242878.43 per second.
Repeated table insert (single sorted idx, direct) 2.797103 s, 35751.28 per second.
Repeated table insert (single hashed idx, call) 1.161150 s, 86121.54 per second.
Table insert makeRowArray (single hashed idx, direct) 1.747032 s, 57239.94 per second.
  Excluding makeRowArray 1.360715 s, 73490.78 per second.
Table insert makeRowArray (double hashed idx, direct) 2.046829 s, 48856.07 per second.
  Excluding makeRowArray 1.660511 s, 60222.41 per second.
  Overhead of second index 0.299797 s, 333559.51 per second.
Table insert makeRowArray (single sorted idx, direct) 38.355396 s, 2607.20 per second.
  Excluding makeRowArray 37.969079 s, 2633.72 per second.
Nexus pass 1.076210 s, 92918.63 per second.

As you can see, the table insert performance got worse due to the added depth of the index trees while the nexus performance got better because the drain overhead got spread over a larger number of rows.

Wednesday, November 6, 2013

redundancy

Here is another entry on the general things.

Fairly often we want things to be redundant: run two or more copies of a system, and if one of them fails, another one picks up after it. Two is the minimal number of the instances, and it's a pretty unstable one: besides a chance of one instance failing, there is also a chance of the network partitioning. In the case of the network partitioning you don't want two copies to start messing with the data independently, instead you really want to still keep only one copy as the master and shut down the other one. But the failure and partitioning are pretty hard to tell apart, which makes the 2-instance configurations quite unstable.

Even if you have a 3-instance configuration (a good idea overall), you're still not immune. If one instance goes down for the scheduled maintenance, you're back to the 2-instance situation.

How can the situation be made more stable?

One obvious but expensive option is to just create more instances. Not only it's expensive but it also adds its own problems. Suppose, there are 4 instances to start with, and one instance finds that two other instances went down. Does it mean that these two instances just died (possibly from some common reason) or that a network partitioning had occurred?

An improvement on that would be create the extra instances not as the full ones but just as "beacons", only responding for the purpose of watching the partitioning and not actually running the code. Then the load on these beacon instances will be low, and they can be combined with machines running some other systems. Or a dedicated machine can serve as a beacon for many systems in the company. Well, once you ask "what if a beacon goes down", this starts growing a bit into a system of redundant beacons and their own problems of partitioning. And it could actually get stupid with both instances seeing the beacon but not seeing each other, but that can be resolved by the communication through the beacon.

But a typical situation is a pipeline of systems. Each system is connected to one or more source systems and/or one or more sink systems (sou it could be not only a straight pipeline but technically a tree). And each system in the pipeline would be duplicated or triplicated and each instance cross-connected to all the duplicates of each source and sink. In this situation the master node of  a source can be used as such a beacon! This would make a 2-instance configuration still stable.

Thursday, October 10, 2013

unusual aggregations

In the meantime, while I'm busy with the other things, I want to write up some short notes for the future.

In the world of monitoring there is a concept of time series: the sequence of values observed at certain moments of time. And then the common thing is to subtract the previous value from each one. For example, if the total number of events processed by a server is by time stamps:

0 20 30 50

then by subtracting the previous value we get the number of events processed during each time period:

20 10 30

(naturally, there is one less value, since, the first one in the original series has nothing to subtract).

This can really be thought of as an aggregation. A highly unusual one, producing not just one result row per group but many rows per group, but still a variety of aggregation.

The other similar  example is with the timestamps. Suppose, we have a multi-stage process, and a timestamped record of when each stage started:

11:30 Stage1
11:45 Stage2
12:10 Stage3
12:20 End

This record can be converted into the length of each stage by subtracting the next time value from the starting timestamp:

Stage1 0:15
Stage2 0:25
Stage3 0:10

And the other way around, if we have the starting time and the record of the length of each stage, we can convert them back to the absolute timestamps.

Monday, September 9, 2013

status update

I've started on editing the manual for version 2.0 and then got distracted by the Real Life. Things should quiet down a bit in a couple more months and I'll get back to making the official release 2.0.