Sergey Babkin on CEP and stuff: Bayesian networks & neural networks

Friday, January 6, 2017

Bayesian networks & neural networks

I've been recently reading about the neural networks. By the way, here are some introductory links:

I can most highly recommend this document that explains in a very clear and simple way how the training of the neural networks through the backpropagation works:
http://www.numericinsight.com/uploads/A_Gentle_Introduction_to_Backpropagation.pdf

A simple introductory series of 6 (and maybe growing to more) articles starting with this one:
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.hxsoanuo2

Some other links:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://colah.github.io/posts/2015-01-Visualizing-Representations/
http://mmlind.github.io/Deep_Neural_Network_for_MNIST_Handwriting_Recognition/

Also, the apparently the father of the deep neural networks is G.E. Hinton, and you may also want to search for the articles by Harry Shum. Hinton's home page is:
https://www.cs.toronto.edu/~hinton/
that seems to have a bunch of the links to his courses but I haven't looked at them yet.

As you can see from the introductory reading, each neuron in a neural network is a pretty simple machine: it takes some input values, multiples them by some coefficients, adds the results up, and then passes the result through a nonlinear (usually some kind of a sigmoid) function. The whole thing can be written as an expression:

result = nonlinear( sum [for inputs i] (input_i * C_i) )

The nonlinear part is pretty much what we do at the end of the Bayesian computations: if the probability of a hypothesis is above some level, we accept it as true, i.e. in other words pull it up to 1, if it's below some equal or lower level we reject it, i.e. pull it down to 0, and if these levels are not the same and the probability is in the middle then we leave it as some value in the middle, probably modified in some way from the original value.

The sum part is pretty much the same as the sum done in AdaBoost. AdaBoost does the sum of logarithms. And I've shown in the previous posts that this sum can be converted to a logarithm of a product, and then the product can be seen as a Bayesian computation expressed as chances, and the logarithm being a part of the decision-making process that converts the resulting chance value to a positive-or-negative value. So we can apply the same approach to the sum in the neuron, say that the values it sums up are logarithms, convert it to the logarithm of a product, make the logarithm a part of the final nonlinear function, and then the remaining product can be seen as a Bayesian computation on chances.

This pretty much means that a neuron can be seen as a Bayesian machine.

And guess what, apparently there is also such a thing as a Bayesian network. There people take multiple Bayesian machines, and connect the results of one set of machines as the input events to the Bayesian machines of the next level. For all I can tell, the major benefit of the handling of the problem when some hypotheses are indicated by a XOR of the input events, similarly to the splitting of the hypotheses into two and then merging them afterwards like I've shown before but instead of the external arithmetics the logic being folded into the Bayesian computation of the second level.

But if a neuron can be seen as a Bayesian machine then the neural networks can also be seen as the Bayesian networks! The only difference being that in the Bayesian networks the first-level (and in general intermediate-level) hypotheses are hand-picked while the neural networks find these intermediate hypotheses on their own during the training.

I wonder if this is something widely known, or not known, or known to be wrong?

6 comments:

AlexMetalFiOctober 25, 2017 at 3:39 PM
Sergey, long time no see! The problem with the above is that neurons do not have meanings in themselves. If you look at any given weights of a particular neuron it doesn't tell you anything - the same is not true about a Bayesian network. Bayesian networks impart meaning in their structure and node-to-node relationships (dependence). Neural networks do not (hidden layer X weight matrix has no real obvious relationship between hidden layer Y).

I suppose you could think of the output layer of a fully connected NN using a softmax activation function as kinda Bayesian since the final values are indeed probabilities (beliefs) of classifications. But there is no way to walk those values back which is why it is sometimes hard to explain why a NN behaves the way it does. Again, the same is not true for Bayesian networks.

Alex (remember SmallFoot?)
ReplyDelete
Replies

Add comment

Sergey Babkin on CEP and stuff

Friday, January 6, 2017

Bayesian networks & neural networks

6 comments:

Links

About Me

Labels

Blog Archive