Sergey Babkin on CEP and stuff: type

Showing posts with label type. Show all posts

Thursday, July 11, 2013

no more explicit confessions

It's official: all the code has been converted to the new error handling. Now if anything goes wrong, the Triceps Perl calls just confess right away. No more need for the pattern 'or confess "$!"' that was used throughout the code (though of course you can still use it for handling the other errors).

It also applies to the error checks done by the XS typemaps, these will also confess automatically.

I've also added one more method that doesn't confess: IndexType::getTabtypeSafe(). If the index type is not set into a table type, it will silently return an undef without any error indications.

On a related note, the construction of the Type subclasses has been made nicer in the C++: instead of calling abort() on the major errors, they now throw Exceptions. Mind you, these exceptions are thrown not in the constructors as such but in the chainable methods that set the contents of the types. And they try to be smart enough to preserve the reference count correctness: if the object was not assigned into any reference yet (as is typical for the chained calls), they take care to temporarily increase and decrease the reference count, thus freeing the object, before throwing. Of course, the default reaction to Exceptions is still to dump core, but need be, these exceptions can be caught.

Sunday, December 30, 2012

RowSetType

RowSetType, defined in types/RowSetType.h, is another item that is not visible in Perl. Maybe it will be in the future but at the moment things look good enough without it. It has been added for 1.1.0 and expresses the type ("return type" if you want to be precise) of a streaming function (FnReturn and FnBinding classes). Naturally, it's a sequence of the row types, and despite the word "set", the order matters.

A RowSetType is one of these objects that gets assembled from many parts and then initialized, like this:

Autoref<RowSetType> rst = initializeOr Throw(RowSetType::make()
->addRow("name1", rt1)
->addRow("name2", rt2)
);

The function, or actually template, initializeOrThrow() itself is also a new addition, that I'll describe in detail later.

Of course, nothing stops you from adding the row types one by one, in a loop or in some other way, and then calling initialize() manually. And yes, of course you can keep a reference to a row set type as soon as it has been constructed, not waiting for initialization. You could do instead:

Autoref<RowSetType> rst = new RowSetType();
rst->addRow("name1", rt1);
rst->addRow("name2", rt2);
rst->initialize();
if (rst->getErrors()->hasError()) {
...
}

You could use the initializeOrThrow() template here as well, just I also wanted to show the way for the manual handling of the errors. And you can use the new or make() interchangeably too.

All that the initialization does is fixate the row set, forbid the addition of the further row types to it. Which kind of makes sense at the moment but I'm not so sure about the future, in the future the dynamically expandable row sets might come useful. We'll see when we get there.
RowSetType();
static RowSetType *make();

Construct a row set type. The method make() is just a wrapper around the constructor that is more convenient to use with the following ->addRow(), because of the way the operator priorities work in C++. Like any other type, RowSetType is unnamed by itself, and takes no constructor arguments. Like any other type, RowSetType is an Mtarget and can be shared between multiple threads after it has been initialized.

RowSetType *addRow(const string &rname, const_Autoref<RowType>rtype);

Add a row type to the set. All the row types are named, and all the names must be unique within the set. The order of the addition matters too. See the further explanation of why it does in the description of the FnReturn. If this method detects an error (such as duplicate names), it will append the error to the internal Errors object, that can be read later by getErrors(). A type with errors must not be used.

The row types may not be added after the row set type has been initialized.

void initialize();

Initialize the type. Any detected errors can be read afterwards with getErrors(). The repeated calls of initialize() are ignored.

bool isInitialized() const;

Check whether the type has been initialized.

typedef vector<string> NameVec;
const NameVec &getRowNames() const;
typedef vector<Autoref<RowType> > RowTypeVec;
const RowTypeVec &getRowTypes() const;

Read back the contents of the type. The elements will go in the order they were added.

int size() const;

Read the number of row types in the set.

int findName(const string &name) const;

Translate the row type name to index (i.e. the order in which it was added, starting from 0).Returns -1 on an invalid name.

RowType *getRowType(const string &name) const;

Find the type by name. Returns NULL on an invalid name.

const string *getRowTypeName(int idx) const;
RowType *getRowType(int idx) const;

Read the data by index. These methods check that the index is in the valid range, and otherwise return NULL.

The usual methods inherited from Type also work: getErrors(), equals(), match(), printTo().

The row set types are considered equal if they contain the equal row types with equal names going in the same order. They are considered matching if they contain matching row types going in the same order, with any names. If the match condition seems surprising to you, think of it as "nothing will break if one type is substituted for another at execution time".

void addError(const string &msg);
Erref appendErrors();

The ways to add extra errors to the type's errors. It's for convenience of the users of this type, the thinking being that since we already have one Errors object, can as well use it for everything, and also keep all the errors reported in the order of the fields, rather than first all the errors from the type then all the errors from its user. The FnReturn and FnBinding use it.

Tuesday, August 21, 2012

Simple types

The simple types are defined as instances of the abstract class SimpleType, and have one method in addition to the base Type:

int getSize() const

It returns the size of the value of this type. For void it's 0, for string 1 (the minimal string size), for the rest of them it's a sizeof. This size is used to extract the values from and copy the values to the compact row format.

For now this is the absolute minimum of information that makes the data usable. The list of methods will be extended over time. For example, the methods for value comparisons will eventually go here. And if the rows will ever hold the aligned values, the alignment information too.

The SimpleType is defined in type/SimpleType.h, and all the actual simple types are defined in type/AllSimpleTypes.h:

VoidType
Uint8Type
Int32Type
Int64Type
Float64Type
StringType

Wednesday, August 15, 2012

Types

Fundamentally, Triceps is a language, even though it is piggybacking on the other languages. And like in pretty much any programming language, pretty much anything in it has a type. Only the tip of that type system is exposed in the Perl API, as the RowType and TableType. But the C++ API has the whole depth. The types go all the way down to the simple types of the fields.

The classes for types are generally defined in the subdirectory type/. The class Type, defiined in type/Type.h is the common base class.

First, every kind of type has its entry in the enum TypeId:

        TT_VOID, // no value
        TT_UINT8, // unsigned 8-bit integer (byte)
        TT_INT32, // 32-bit integer
        TT_INT64,
        TT_FLOAT64, // 64-bit floating-point, what C calls "double"
        TT_STRING, // a string: a special kind of byte array
        TT_ROW, // a row of a table
        TT_RH, // row handle: item through which all indexes in the table own a row
        TT_TABLE, // data store of rows (AKA "window")
        TT_INDEX, // a table contains one or more indexes for its rows
        TT_AGGREGATOR, // user piece of code that does aggregation on the indexes
        TT_ROWSET, // an ordered set of rows

TT_ROWSET is something added in version 1.1.0, it will be described later. TT_VOID is pretty much a placeholder, in case if a void type would be needed later. The TypeId gets hardcoded in the constructor of every Type sub-class. It can be gotten back with the method

TypeId getTypeId() const;

Another method finds out if the type is the simple type of a field:

bool isSimple() const;

It would be true for the types of ids TT_VOID, TT_UINT8, TT_INT32, TT_INT64, TT_FLOAT64, TT_STRING.

Generally, you can check the TypeId and then cast the Type pointer to its subclass. All the simple types have the common base class SimpleType, which will be described in a moment.

There is also a static Type method that finds a simple type object by name (like "int32", "string" etc.):

static Onceref<const SimpleType> findSimpleType(const char *name);

Basically, there is not a whole lot of point in having lots of copies of the simple type objects (though if you want, you can). So there is one common copy of each simple type that can be found by name. If the type is known when you compile your C++ program, you can even avoid the look-up and refer to these objects directly.

    static Autoref<const SimpleType> r_void;
    static Autoref<const SimpleType> r_uint8;
    static Autoref<const SimpleType> r_int32;
    static Autoref<const SimpleType> r_int64;
    static Autoref<const SimpleType> r_float64;
    static Autoref<const SimpleType> r_string;

The type construction may cause errors. It is usually done either by a single constructor with all the needed arguments, or a simple constructor, then additional methods to add the information in bits in pieces, then an initialization method. In both cases there is a problem of how to report the errors. They're not easy to return from a constructor and a pain to check in the bit-by-bit construction.

Instead the error information gets internally collected in an Errors object, and can be read after the construction and/or initialization is completed:

virtual Erref getErrors() const;

A type with errors may not be used for anything other than reading the errors.

The rest of the common virtual methods has to do with the type comparison and print-outs. The comparison methods essentially check if two type objects are aliases for each other:

virtual bool equals(const Type *t) const;
virtual bool match(const Type *t) const;

The concept has been previously described with the Perl API. The equal types are exactly the same. The matching types are the same except for the names of their elements, so it's generally safe to pass the values between these types.

equals() is also available as operator==.

The print methods create a string representation of a type, used mostly for the error messages. There is no method to parse this string representation back, at least yet.

virtual void printTo(string &res, const string &indent = "", const string &subindent = " ") const = 0;
string print(const string &indent = "", const string &subindent = " ") const;

printTo() appends the information to an existing string. print() returns a new string with the message. print() is a wrapper around printTo() that creates an empty string, does printTo() into it and returns it.

The printing is normally done in a multi-line format, nicely indented, and the arguments indent and subindent define the initial indent level and the additional indentation for every level.

There is also a way to print everything in one line: pass the special constant NOINDENT (defined in common/StringUtil.h) in the argument indent. This is similar to using an undef for the same purpose in the Perl API.

The definitions of all the types are collected together in type/AllTypes.h.

Sergey Babkin on CEP and stuff

Thursday, July 11, 2013

no more explicit confessions

Sunday, December 30, 2012

RowSetType

Tuesday, August 21, 2012

Simple types

Wednesday, August 15, 2012

Types

Links

About Me

Labels

Blog Archive