Friday, April 17, 2015

Ordered Index implemented in C++

The ordered index implemented in Perl has been pretty slow, so I've finally got around to implementing one in C++. It became much faster. Here is an excerpt from the performance test results (with an optimized build):

Table insert makeRowArray (single hashed idx, direct) 0.551577 s, 181298.48 per second.
  Excluding makeRowArray 0.311936 s, 320578.44 per second.
Table insert makeRowArray (single ordered int idx, direct) 0.598462 s, 167095.09 per second.
  Excluding makeRowArray 0.358821 s, 278690.37 per second.
Table insert makeRowArray (single ordered string idx, direct) 1.070565 s, 93408.64 per second.
  Excluding makeRowArray 0.830924 s, 120347.91 per second.
Table insert makeRowArray (single perl sorted idx, direct) 21.810374 s, 4584.97 per second.
  Excluding makeRowArray 21.570734 s, 4635.91 per second.
Table lookup (single hashed idx) 0.147418 s, 678342.02 per second.
Table lookup (single ordered int idx) 0.174585 s, 572785.77 per second.
Table lookup (single ordered string idx) 0.385963 s, 259092.38 per second.
Table lookup (single perl sorted idx) 6.840266 s, 14619.32 per second.


Here the "ordered" is the new ordered index in C++ (on an integer field and on a string field), and "perl sorted" is the ordering in Perl, on an integer field. Even though the ordered index is a little slower than the hashed index, it's about 40-70 times faster than the ordering implemented in Perl.

Naturally, ordering by a string field is slower than by an integer one, since it not only has to compare more bytes one-by-one but also honors the locale-specific collation order.

In Perl, the ordered index type is created very similarly to the hashed index type:

Triceps::IndexType->newOrdered(key => [ "a", "!b" ])

The single option "key" specifies the array with the names key fields of the index. The "!" in front of the field name says that the order by this field is descending, and otherwise the order is ascending. This is more compact and arguably easier-to-read format than the one used by the SimpleOrderedIndex.

As usual, the NULL values in the key fields are permitted, and are considered less than any non-NULL value.

The array fields may also be used in the ordered indexes.

The ordered index does return the list of key fields with getKey(), so it can be used in joins and can be found by key fields with TableType::findIndexPathForKeys() and friends, just like the hashed index.

The getKey() for an ordered index returns the list of plain field names, without the indications of descending order. Thus the finding of the index by key fields just works out of the box, unchanged.

To get the key fields as they were specified in the index creation, including the possible "!", use the new method:

@fields = $indexType->getKeyExpr();

The idea of this method is that the contents of the array returned by it depends on the index type and is an "expression" that can be used to build another instance of the same index type. For the hashed index it simply returns the same data as getKey(). For the ordered index it returns the list of keys with indications. For the indexes with Perl conditions it still returns nothing, though in the future might be used to store the condition.

In the C++ API the index type gets created with the constructor:

OrderedIndexType(NameSet *key = NULL);
static OrderedIndexType *make(NameSet *key = NULL);

The key can also be set after construction:

OrderedIndexType *setKey(NameSet *key);

Same as in Perl, the field names in the NameSet can be prefixed with a "!" to specify the descending order on that field.

The new method in the IndexType is:

virtual const NameSet *getKeyExpr() const;

And the new constant for the index id is IT_ORDERED, available in both C++ and Perl.

No comments:

Post a Comment