Monday, December 27, 2021

a book on Perl

 I've been browsing at a used books store, and bought the book "Higher order Perl" by Mark-Jason Dominus.  It's been published in 2003 but is still pretty amazing. It's about doing things that people normally associate with languages like Haskell, but in Perl. But it doesn't stop there. The book goes on to show an implementation of a parser infrastructure somewhat like ANTLR (but as you can imagine, with a lot less code), which is pretty mind-boggling, and then goes on to use it for a declarative drawing system that solves linear equations to determine the positions of the elements as described in a domain-oriented language.

The book can now be downloaded for free: https://hop.perl.plover.com/#free from the web site.

Saturday, December 18, 2021

sentiments

 It's pretty amazing how the development of AI research is now intertwining with psychology. Not really surprising, since if the current ML models emulate the human thinking, this is what should be happening, but the amazing part is that the development of the AI research has finally reached this point.

The IEEE Computing Edge magazine tends to be rather boring but usually contains one gem per issue. In the November 2021 issue that gem was the history of Ikonas graphics processing (I didn't realize that the raster video started to appear only in mid-1970s, before then the memory costs were prohibitively expensive, and the displays did vector graphics in hardware). In the December 2021 issue that gem is an article about the human emotions "The hourglass model revisited" (https://www.sentic.net/hourglass-model-revisited.pdf, 10.1109/MIS.2020.2992799).

They define the human emotions as a combination of 6 dimensions, 4 of them that can be positive or negative (hence forming a "hourglass", thin in the middle, growing towards extremes): sensitivity, attitude, temper, introspection, and 2 neutral that can be present or not present: expectation and surprise. Additionally they split the attitude into "towards self" and towards others".

The gradations of 4 "main" emotions from positive to negative are defined there as:

Introspection: ecstasy-joy-contentment-melancholy-sadness-grief

Temper: bliss-calmness-serenity-annoyance-anger-rage

Attitude: delight-pleasantness-acceptance-dislike-disgust-loathing

Sensitivity: enthusiasm-eagerness-responsiveess-anxiety-fear-terror

And then they define the compounds, for example "hate =  anger + fear", or "pessimism = expectation + sadness". Though strangely they define despair in the same way as pessimism and not as "expectation + grief", I guess this model has quite a bit of subjectivity.

They also define the absolute strength of an emotion (in range of -1 to +1) as a sum of strengths of its components (or we could say "of its present components") divided by the count of its present (i.e. non-0) components. 

This is an improvement on always dividing by 4, but I wonder if it could be improved by just taking a sum of highest negative and highest positive components. Their formula already shifts the weights in the same direction compared to the previous one, and arguably presence of a little bit of a side emotion would not reduce the effect of the strongest one, and maybe even add to it if both emotions are in the same direction. So maybe it should be for each direction not even

  max(abs(Ei))

but

  1-Product(1-abs(Ei))

Then I've remembered where I've seen the same problem: in the Bayesian deduction! Just rescale every sentiment from [-1, 1] to [0, 1] and consider it to be a Bayesian probability of an evidence that the sentiment is positive. And then just compose them in the Bayesian deduction, in a classic formula or with formula by odds/chances as described in https://babkin-cep.blogspot.com/2017/02/a-better-explanation-of-bayes-adaboost.html. Then the sentiments that are at 0 would naturally translate to 0.5 and won't affect the computation.

Thursday, December 2, 2021

retraining the neural networks

IEEE Spectrum had printed an issue on AI, and there they've described a way to teach the old neural networks the new tricks without making them forget the old tricks: by marking the more important connections as immutable when training the network on a data set for the new kind of classification.

Which looks like a generalization of the technique of "emergents" (see https://babkin-cep.blogspot.com/2017/06/emergents.html): there everything below the top 2 layers is declared immutable and the top 2 layers are re-trained for the new classification from scratch. The idea being that the first training had taught the network, what details are important, and then re-training assembles the new classifier from these details.

But the generalization goes farther: it can look at the weights of the connections, and if the weight is close to 0 (on the range of [-1, 1]), this can't be an important connection, and is a candidate to re-train, making the model learn the new details too. They say in the article that the old classifier degrades somewhat after this re-training but that's not surprising: the set of the "emergents" has changed during the re-training while the old classifier section is still expecting the old set. It would make sense to fixate the network below the top 2 layers (the "new emergents") and do another training of the old part with that.  The newly learned emergents might even be useful to improve the classification of the previously learned categories. They actually talk in the article about doing a periodic re-training with the previously trained subsets of data but not quite in the same context.

Tuesday, September 7, 2021

"The Practice of Parallel Programming" available as a PDF

It kind of didn't occur to me initially to announce it here, but here we go: I've made the fully edited and typeset version of my book "The Practice of Parallel Programming" available as a PDF. It's exactly the same as the printed version, with a small intro to the Online Edition 1.0 added, and formatted to a page size that is more convenient for printing at home.

Available for download here:

https://sourceforge.net/projects/tpopp/files/tpopp-text/tpopp-pdf-210901.zip

Sunday, August 29, 2021

lexical analysis with regular expressions

Today I've got this simple idea of doing the lexical analysis (i.e. tokenization) in Perl with regular expressions: write all the possible tokes as alternatives in the regular expression, and use Perl's ability to embed code into the expressions to produce the token id. For example:

$s = "++abc"; # string to tokenize
$t = 0; # token ids returned here
while ($s =~ s/^\s*([+][+](?{ $t=1; })|[+](?{ $t=2; })|[a-z]+(?{ $t=3; }))//) {
  print $t, "=", $1, "\n";
}

This looks for the tokens of "+", "++", and a simple lowercase identifier. One caveat I've found is that if one alternative is a prefix of another (such as "+" and "++"), the longer alternative must go first. But otherwise it looks pretty simple, and should be efficient. I'm probably not the first one to discover it, but I've been looking for the various lexer solutions in Perl and Python and haven't seen this one yet. Of course, it would look better in the extended syntax, and with symbolic token names.

Doing the same in Python is a bit more difficult, since it doesn't allow to execute code as part of the expression. So the parsing would be just to a string, and then matching by a string. Or then looking up the token id by a dictionary (which gets a little tricky if there could be more than one non-constant lexeme, but I guess those could be sorted in the second round if needed). Something like this:

import re
tokens = { "++": 1, "+": 2, } # regexp can be auto-generated from reverse-ordered dict keys
lex = re.compile(r'^\s*(' + r'[+][+]' + r'|[+]' + r'|[a-z]+' + r')') # the string to parse
s = "++abc"
m = lex.match(s) # 3 is the token id for the identifiers that are variable
t = tokens[m.group(1)] if m.group(1) in tokens else 3 # consume the token from the string
s = s[m.span()[1]:]

It doesn't do the loop but goes through all the important steps.

P.S. In Perl, the replacement could be avoided by using the global scanning instead, scan with option /g, and use \G as the anchor for the last end of string, new start of string:

while ($s =~ /\G\s*([+][+](?{ $t=1; })|[+](?{ $t=2; })|[a-z]+(?{ $t=3; }))/g)

Thursday, June 17, 2021

on Rust

 I've attended a talk on the Rust programming language, and I've had a couple of realizations.

1. The new and interesting thing in Rust is  its enforcement of "borrowing" vs "consumption" in the references, and the separation of the code that strictly follows the straight-jacketed "safe" code from the "unsafe" code.  The rest of it seems to be substandard compared to C++.

I think that most real code will end up having both the "safe" and "unsafe" parts, because the purely "safe" version is conductive only to writing the COBOL-like programs. But the separation forces you to encapsulate the unsafe parts, keep them separate and small, and then the rest of the code would just rely on them instead of spreading unsafety all over the place.

But can the same concept be imported into C++? I think it can. There actually already are the syntactic means to express the borrowing and consumption in C++. Borrowing is "const &", and consumption is "&&".  So all we need is a way to ban the plain "&" and "*" in the parts of the code that we decree "safe".

2. A thing that looks annoying in Rust is its lack of classes, it has traits instead. In this respect it's similar to Go (just as annoying) and the C++ templates (in an implicit way, and with its own even worse challenges). It obviously works but  in an annoying way.

So what is annoying? That the definitions of Rust functions end up containing a list of traits for each argument: Trait1 + Trait2 + Trait3. The same applies to Go. C++ is actually less annoying because it doesn't even require you to define the traits as such, it just uses them in templates. It pays for that by very non-obvious error messages when a trait is missing, but at least you don't have to drag this list around.

But what is a class (or an interface)? An interface is a list of methods. And each method is also a trait. So when we define an interface, what we really do is write

InterfaceA = TraitA1 + TraitA2 + TraitA3

If we could do all this "trait arithmetic", all the annoyance would go away. I wonder if Rust already supports something like that, just it wasn't mentioned in the talk?

Monday, March 22, 2021

on automatic alerting

 A few years ago I've read the book "Thinking fast and slow" by Daniel Kahneman. One of the things that stuck the chord fo rme thre was the story of how the smaller counties have the cases of both higher and lower occurrence of the diseases than the larger counties. This is very much the same as what we see with the automatic alerting: when we set up an alert for, say, request processing latency, and at night there is only one request during a time period that has an anomalously high latency and triggers an alert (and we solve this with the cludges like "don't alert if there are less than 20 points").

It looked like the number of points needs to be taken in the account, with the fewer points to be treated in a more relaxed way. But I couldn't figure out how to put it together.

Recently I've been reading some books on the probablity, and it looks like the solution happens to be well-researched: we have to treat the number of points as a sample from the full larger number of points. So if we get 1000 hits per a unit of time during the day and only 50 hits during the nighttime, we can really see these 50 hits as a sample of the full traffic, with the people sleeping serving as a filter. And then instead of looking at "is the median over N" we should look at a differnt question: given the sample, what is the probability (i.e. confidence range) that the median of the full set of data would have been over N? That should solve the issue with the false alerts pretty nicely in an automatic way. I wonder, why nobody I've heard of had laready done this?