Sunday, March 22, 2015

more of performance numbers, with optimization

I've realized that the previous published  performance numbers were produced by a build without optimization. So I've enabled the optimization with -O3 -fno-strict-aliasing, and the numbers improved, some of them more than twice (that's still an old Core 2 Duo 3GHz laptop):

Performance test, 100000 iterations, real time.
Empty Perl loop 0.005546 s, 18030711.03 per second.
Empty Perl function of 5 args 0.031840 s, 3140671.52 per second.
Empty Perl function of 10 args 0.027833 s, 3592828.57 per second.
Row creation from array and destruction 0.244782 s, 408526.42 per second.
Row creation from hash and destruction 0.342986 s, 291557.00 per second.
Rowop creation and destruction 0.133347 s, 749922.94 per second.
Calling a dummy label 0.047770 s, 2093352.57 per second.
Calling a chained dummy label 0.049562 s, 2017695.16 per second.
  Pure chained call 0.001791 s, 55827286.04 per second.
Calling a Perl label 0.330590 s, 302489.48 per second.
Row handle creation and destruction 0.140406 s, 712218.40 per second.
Repeated table insert (single hashed idx, direct) 0.121556 s, 822665.80 per second.
Repeated table insert (single hashed idx, direct & Perl construct) 0.337209 s, 296551.58 per second.
  RowHandle creation overhead in Perl 0.215653 s, 463707.00 per second.
Repeated table insert (single sorted idx, direct) 1.083628 s, 92282.62 per second.
Repeated table insert (single hashed idx, call) 0.153614 s, 650981.13 per second.
Table insert makeRowArray (single hashed idx, direct) 0.553364 s, 180713.02 per second.
  Excluding makeRowArray 0.308581 s, 324063.65 per second.
Table insert makeRowArray (double hashed idx, direct) 0.638617 s, 156588.31 per second.
  Excluding makeRowArray 0.393835 s, 253913.40 per second.
  Overhead of second index 0.085254 s, 1172969.41 per second.
Table insert makeRowArray (single sorted idx, direct) 22.355793 s, 4473.11 per second.
  Excluding makeRowArray 22.111011 s, 4522.63 per second.
Table lookup (single hashed idx) 0.142762 s, 700466.78 per second.
Table lookup (single sorted idx) 6.929484 s, 14431.09 per second.
Lookup join (single hashed idx) 2.944098 s, 33966.26 per second.
Nexus pass (1 row/flush) 0.398944 s, 250661.96 per second.
Nexus pass (10 rows/flush) 0.847021 s, 1180608.79 per row per second.
  Overhead of each row 0.049786 s, 2008583.51 per second.
  Overhead of flush 0.349157 s, 286403.84 per second.

I've also tried to run the numbers in a newer laptop with 2GHz Core i7 CPU, in a VM configured with 2CPUs. On the build without the optimization, the numbers came out very similar to the old laptop. On the build with optimization they came up to 50% better but not consistently so (perhaps, running in a VM added variability).

Sunday, March 1, 2015

Triceps 2.0.1 released

This time I'm trying to update the docs following the code changes, and do the small releases.

The release 2.0.1 is here! It includes:

  • Fixed the version information that was left incorrect (at 0.99).
  • Used a more generic pattern in tests for Perl error messages that have changed in the more recent versions of Perl (per CPAN report #99268).
  • Added the more convenient way to wrap the error reports in Perl, Triceps::nestfess() and Triceps::wrapfess().
  • Added functions for the nicer printing of auto-generated code, Triceps::alignsrc() and Triceps::numalign().
  • In the doc chapter on the templates, fixed the output of the examples: properly interleaved the inputs and outputs.