Sergey Babkin on CEP and stuff

This is fascinating and it's great to see how ...

2025-05-12T08:31:40.091-04:00

This is fascinating and it's great to see how you approached this complex problem.

I'd like to make a correction: Technically sp...

2017-12-30T20:02:16.245-05:00

I'd like to make a correction:

Technically speaking, provided you assume your error is Gaussian with zero mean and a common variance (iid etc.), Bayesian learning will show that the sum of the squared errors is equivalent to finding the maximum likelihood - and that's what gradient descent is doing. So I suppose your right that there is a connection between NN and Bayesian models.

As far as I can tell, sigmoid is just not used that much anymore in current NN models. Maybe they should? LOL.

Oops, I wanted to answer and forgot. 1) From my e...

2017-11-14T13:53:57.296-05:00

Oops, I wanted to answer and forgot.

1) From my experiments with Tensorflow, Relu is actually not a very good function, the arctangent or sigmoid work better. Relu is good for the linear regressions but not as much for the decisions. But there are a couple of reasons for why the Relu can be used to the same effect:

(a) NNs have the offset in each node, so with the cut-off of the negative arguments in Relu, you can add together multiple Relu functions to get a close approximation of a sigmoid. I.e. if you have two Relu functions, one with the offset of 0 and slope of 2, and another one with offset of 0.4 and slope of -1.67, you'd get a broken line that goes from (0, 0) to (0.4, 0.8) to (1, 1), because the second function would kick in in the range [0.4, 1] and the slope in this range will become the sum 2-1.67=0.33. Such addition would come naturally because that's what the second layer does: adds up the outputs of the first layer with coefficients.

(b) The other reason why the cutting-off of the negative side in Relu is useful: basically, it naturally supports the multiple-choice values as described in http://babkin-cep.blogspot.com/2016/08/bayes-23-enumerated-events-revisited.html and its references (although I'm not sure if I wrote it clearly).

2) I think the Bayesian equivalent is computing the conditional probabilities. If you take an one-level NN, the result of the gradient climbing would be the same as computing the averages that give you probabilities. For example, if you have 70 training cases that pull the result up and 30 cases that pull it down, you'd end up with the coefficient of 0.7, which also happens to be the probability. AdaBoost does a variation of the same thing, computes the chances from the gradient descent. But when you have 2 or more layers, the gradient-climbing does two things at once: decides what should the intermediate level-event be and computes the conditional probabilities.

Sorry, I don't have any advice, I'm in kind of the same position only worse: I haven't taken any official grad classes :-) I've been thinking that taking such classes might provide some connections but looks like it didn't? I've had some links to Reddits on this subject that might be a good place to ask around but I haven't used Reddit much yet at all :-)

Thanks for the links, I'll read them!

Sergey, here is a post that talks about it: https...

2017-11-10T14:21:00.165-05:00

Sergey, here is a post that talks about it:

https://towardsdatascience.com/understanding-objective-functions-in-neural-networks-d217cb068138

Maybe that is what you are saying?

Well a couple of things: 1) FYI: Sigmoid isn'...

2017-10-28T10:12:59.859-04:00

Well a couple of things:

1) FYI: Sigmoid isn't really used anymore for modern day NN (relus or leaky relus seem more popular).

2) I don't think you can convert them back and forth as you describe. That's my point. With NN you can arrive at different weights depending on your objective function and type of gradient descent (you can arrive at different local minima depending on how you initialize your weights and your learning rate/step size, etc.). Put simply, you're climbing down a slope using partial derivatives. What is the Bayesian equivalent of that? That's what I don't quite understand in your analysis. You are looking at one node out of context and assigning chances. Fine, but once you add multiple layers, I think this all falls apart very fast.

I think this paper would be of interest to you (using Bayesian Backprop):

https://arxiv.org/pdf/1505.05424.pdf

3) Yeah, exactly. Simple NNs are easier to follow. But once your parameters expand ("curse of dimensionality" and all that), as I said, it falls apart.

As you can tell, ML/DL has been a hobby of mine for the past few months. I took the graduate classes at Columbia via edX last year and just finished the Udacity courses on AI and DL. I love this field and am looking to break into it (even though my background is as you know in kernel and system programming). Have any advice? (you seem to be deeply involved with it).

Hi Alex! What have you been up to? The weights i...

2017-10-25T21:05:55.166-04:00

Hi Alex! What have you been up to?

The weights in a particular neuron do represent the Bayesian probabilities of its inputs, only slightly indirectly, as logarithms of the chances. But they can be converted to the probability values, as described in http://babkin-cep.blogspot.com/2017/02/a-better-explanation-of-bayes-adaboost.html and http://babkin-cep.blogspot.com/2017/06/neuron-in-bayesian-terms-part-2.html . That's kind of the point I'm making, that you can just convert them back and forth, that they are equivalent. It's kind of funny that whatever AI approach you take: decision trees, Bayesian, boosting, neural networks, they all seem to be pretty much variations of each other.

In the simple NNs (like those in the Tensorflow examples) you can actually get interesting insights by looking at the heatmap graphs of the functions of the individual neurons. You can see that this neuron in the first layer matches this feature, and that neuron matches that feature, and then they get combined in a second-layer neuron. But in a large NN I think it's sheer size would make it difficult to look at each single neuron.

Sergey, long time no see! The problem with the abo...

2017-10-25T15:39:53.855-04:00

Sergey, long time no see! The problem with the above is that neurons do not have meanings in themselves. If you look at any given weights of a particular neuron it doesn't tell you anything - the same is not true about a Bayesian network. Bayesian networks impart meaning in their structure and node-to-node relationships (dependence). Neural networks do not (hidden layer X weight matrix has no real obvious relationship between hidden layer Y).

I suppose you could think of the output layer of a fully connected NN using a softmax activation function as kinda Bayesian since the final values are indeed probabilities (beliefs) of classifications. But there is no way to walk those values back which is why it is sometimes hard to explain why a NN behaves the way it does. Again, the same is not true for Bayesian networks.

Alex (remember SmallFoot?)

"Second writing" help not only to improv...

2017-03-27T01:48:14.013-04:00

"Second writing" help not only to improve the writing itself, but also helps to improve your writing skill so your future posts improve too.

Well, it could use some more editing, all the text...

2017-03-26T21:25:08.537-04:00

Well, it could use some more editing, all the texts usually look better on the second writing :-) But the main point is that 10x productivity requires getting a lot of things right to streamline the development. And getting even one or two of them very wrong derails the whole thing.

That should be a numbered lists of tricks. You may...

2017-03-23T13:09:08.201-04:00

That should be a numbered lists of tricks.
You may also consider cover one "10x improvement" item per article and give more examples demonstrating how that item work.

Thanks, fixed.

2017-01-04T15:03:18.444-05:00

Thanks, fixed.

"I thing that the most important..." thi...

2017-01-04T04:38:33.500-05:00

"I thing that the most important..."
think

Thank you for this contribution Sergey!!!

2014-07-28T10:33:41.609-04:00

Thank you for this contribution Sergey!!!

Man, rules are rules. C++ is known for its non-obv...

2014-07-13T16:01:09.325-04:00

Man, rules are rules. C++ is known for its non-obvious rules. On the other hand using new part of C++ language std::shared_ptr is so much easier than programmatic analogues with reference counter stored in some refcounted_base class you must derive from.

Works for me.

Sure. You can write to me at sab123@hotmail.com. ...

2013-06-12T22:40:32.091-04:00

Sure. You can write to me at sab123@hotmail.com.

If you are building on Linux, I would suggest first to try checking out the the latest version from the SVN repository. I've recently committed an improvement that should be capable of detecting the NSPR library on Linux, which is usually the source of the problems.

hi I'm interested by your perl module but i ...

2013-06-12T18:43:47.088-04:00

hi

I'm interested by your perl module but i don't success to build it.

Is it possible to contact you in order to have some help in building the perl module ?

regards
sdeseille

The future versions will instead have the definiti...

2012-08-01T23:12:35.847-04:00

The future versions will instead have the definition -DTRICEPS_NSPR=4 to include from , and -DTRICEPS_NSPR=0 to include from . So editing the Makefile.inc sould be enough, without changing the source code.

Thanks! I've seen a similar report about nspr ...

2012-08-01T21:03:59.095-04:00

Thanks! I've seen a similar report about nspr when building on FreeBSD.

I wanna report two trivial error. I built triceps...

2012-07-31T23:18:13.821-04:00

I wanna report two trivial error. I built triceps on Ubuntu 12.04 and found that.
1) It couldn't found nspr4 but actually the library could be found in /usr/include/nspr
2) It generated error message because triceps-1.0.0/cpp/common/Hash.h Line 26 , so I had to change 2166136261 to 2166136261u.
I just start to learn development so I don't know how to write patch or make file. Sorry about that. But I wanted to contribute as far as I can.
With respect.

Thanks, interesting to know. Well, technically, Al...

2012-03-14T20:42:14.809-04:00

Thanks, interesting to know. Well, technically, Aleri can do it too: you can run in artificial time, setting and stopping it. So you an stop the time, set to record timestamp, feed the record, wait for processing to complete, advance time, wait for any time-based processing to complete, and so on. I'm not sure if it made to Sybase R5, but it definitely worked on Aleri. However there was no tool that did it for you easily, and also all these synchronous calls present a pretty high overhead.

Esper can give complete control over time to an ap...

2012-03-14T09:44:18.302-04:00

Esper can give complete control over time to an application and there is no limit in which events can be processed when time is controlled externally

For (1) it actually sounds like StreamBase simply ...

2012-01-06T19:07:26.637-05:00

For (1) it actually sounds like StreamBase simply doesn't need this particular pattern because it allows the direct access to tables, which are the same thing as Coral8 calls windows. Same as Aleri.

For (2), I don't really understand how StreamBase works, but it looks like it can't really take a list of fields as a parameter value. It's the same limitation as the C++ templates have. I've been feeling for a long time that the C preprocessor and later C++ template should have the directives for conditional compilation and loops in the macros/templates. The lack of the loops leads to all kind of recursive monstrosities in the Alexandrescu book about C++ templates.

I hope that using Perl for the macro-language can solve both the problem with loops in code generation and with debugging difficulties: provide the flexible code generation, and also the more meaningful high-level error messages. I.e. when an error is found, the code can provide a more meaningful message of what is wrong with the template parameters rather than simply dumping the call trace. And of course, it could do the explicit checks of template parameters before using them.

Checking out the "A little about templates ar...

2011-12-30T00:24:21.343-05:00

Checking out the "A little about templates article"; I debated posting this comment there, but I think keeping the conversation in one place might make more sense.

For the first example:

Yes, capture fields enable doing this in a clean way. The idiom in StreamBase would be to create a module that exports a table (as queryable as any other table) and maintains the contents of that table as desired. Capture fields are what allow the exported table's schema to be based on any input schema to the module.

For the second example:

This kind of "templating" is exactly what we use StreamBase "module parameters" for. They work very much like you suggested your template language might work, generating a "substituted" version of a StreamBase module that uses the parameters specified at each module call site.

Unfortunately, module parameters do suffer from all the problems you'd expect of any kind of templating system that generates code: errors get arcane and difficult to debug, static type guarantees become a lot more difficult, compilation times become longer because less work can be shared, etc. Capture fields allow us to avoid these problems of templating for a wide range of real-world cases: whenever the modules are parameterized by type, instead of by arbitrary expression.

The capture fields are definitely a step in the ri...

2011-12-29T08:54:57.213-05:00

The capture fields are definitely a step in the right direction. However they don't seem to solve the field-names-as-parameters problem. They probably could implement the first example in http://babkin-cep.blogspot.com/2011/12/little-about-templates.html, though maybe with a more cumbersome calling sequence, but I think not the second one.

Anonymous: StreamBase has the same properties as ...

2011-12-28T14:39:36.871-05:00

Anonymous: StreamBase has the same properties as Esper here; we try to make it as easy as possible to say "this stream gets the same data as that one" or "this stream gets the same data as that one except for these specific modifications"