Consider a simple example: some incoming financial information may identify the securities by either RIC (Reuters identifier) or SEDOL or ISIN, and before processing it further we want to convert them all to ISIN (since the fundamentally same security may be identified in multiple ways when it's traded in multiple countries).
This can be expressed in CCL approximately like this (no guarantees about the correctness of this code, since I don't have a compiler to try it out):
// the incoming data create schema s_incoming ( id_type string, // identifier type: RIC, SEDOL or ISIN id_value string, // the value of the identifier // add another 90 fields of payload... ); // the normalized data create schema s_normalized ( isin string, // the identity is normalized to ISIN // add another 90 fields of payload... ); // schema for the identifier translation tables create schema s_translation ( from string, // external id value (RIC or SEDOL) isin string, // the translation to ISIN ); // the windows defining the translations from RIC and SEDOL to ISIN create window w_trans_ric schema s_translation keep last per from; create window w_trans_sedol schema s_translation keep last per from; create input stream i_incoming schema s_incoming; create stream incoming_ric schema s_incoming; create stream incoming_sedol schema s_incoming; create stream incoming_isin schema s_incoming; create output stream o_normalized schema s_normalized; insert when id_type = 'RIC' then incoming_ric when id_type = 'SEDOL' then incoming_sedol when id_type = 'ISIN' then incoming_isin select * from i_incoming; insert into o_normalized select w.isin, i. ... // the other 90 fields from incoming_ric as i join w_tranc_ric as w on i.id_value = w.from; insert into o_normalized select w.isin, i. ... // the other 90 fields from incoming_sedol as i join w_tranc_sedol as w on i.id_value = w.from; insert into o_normalized select i.id_value, i. ... // the other 90 fields from incoming_isin;
Not exactly easy, is it, even with the copying of payload data skipped? You may notice that what it does could also be expressed as procedural pseudo-code:
// the incoming data struct s_incoming ( string id_type, // identifier type: RIC, SEDOL or ISIN string id_value, // the value of the identifier // add another 90 fields of payload... ); // schema for the identifier translation tables struct s_translation ( string from, // external id value (RIC or SEDOL) string isin, // the translation to ISIN ); // the windows defining the translations from RIC and SEDOL to ISIN table s_translation w_trans_ric key from; table s_translation w_trans_sedol key from; s_incoming i_incoming; string isin; if (i_incoming.id_type == 'RIC') { isin = lookup(w_trans_ric, w_trans_ric.from == i_incoming.id_value ).isin; } elsif (i_incoming.id_type == 'SEDOL') { isin = lookup(w_trans_sedol, w_trans_sedol.from == i_incoming.id_value ).isin; } elsif (i_incoming.id_type == 'ISIN') { isin = i_incoming.id_value; } if (isin != NULL) { output o_ normalized(isin, i_incoming.(* except id_type, id_value) ); }
Basically, writing in CCL feels like programming in Fortran in the 50s: lots of labels, lots of GOTOs. Each stream is essentially a label, when looking from the procedural standpoint. It's actually worse than Fortran, since all the labels have to be pre-defined (with types!). And there isn't even the normal sequential flow, each statement must be followed by a GOTO, like on those machines with magnetic-drum main memory.
This is very much like the example in my book, in section 6.4. Queues as the sole synchronization mechanism. You can alook at the draft text online. This similarity is not accidental: the CCL streams are queues, and they are the only communication mechanism in CCL.
The SQL statement structure also adds to the confusion: each statement has the destination followed by the source of the data, so each statement reads like it flows backwards.
In Triceps I aim to do better. It is not as smooth as the shown pseudo-code yet, but things are moving in this direction. I have a few ideas about improving this pseudo-code too but they would have to wait until another day.
P.S. I don't seem to be able to post comments. I'm not sure, what is wrong with the Blogspot engine. But answering the comment, yeah, I don't know much about Esper. Both Coral8 and Streambase also have the .* syntax, and Aleri has a similar ExtendStream. However that copies all the fields, without dropping any of them (like id_type and id_value here).
In Esper you won't have the issue of copying 90 fields, the "stream.*" syntax takes care of forwarding the fields without even the engine performing any field copys, internally the engine wraps or forwards existing data structures.
ReplyDeleteAnonymous: StreamBase has the same properties as Esper here; we try to make it as easy as possible to say "this stream gets the same data as that one" or "this stream gets the same data as that one except for these specific modifications"
ReplyDelete