History of the most simple model to explain the Boltzmann machine

In the previous article , a small evening about the logistic regression Bayesian network in order to compete, they have begun their evolution. However, so we did not expect that the final logistic regression actually evolved into a generative model - Restricted Boltzmann Machine (RBM), which is turned into an enemy (generative model) weapons.

RBM get unexpected naive Bayes very pleasantly surprised, and kindled the urge to do its own Bayesian network and the enemy sent RBM fusion!

So Naive Bayes crazy idea can achieve it?

 

 

v2-7429becd51ead3c3c4fa0fdf00756795_b.png Or in accordance with established practice, first described the background position. In the "naive Bayes to Bayesian network" , the evening is a little naive Bayes drew a portrait:

 

v2-27c2f2b98754751d4e48e439de5bdc27_b.png

Naive Bayes after seeing his portrait, his deeply Park (weak) vegetarian (chicken), carried out an evolution - abandon their conditional independence assumptions, the condition of each dimension modeling feature vectors X internal dependencies . So, Naive Bayes Bayesian networks have evolved in the following figure:

 

v2-82779aa2b66bf0647cb4d473a6607ec3_b.png

In the previous article , "the logistic regression to Restricted Boltzmann Machine" in, RBM compared to logistic regression with very big step forward, we can be more reasonably calculated for each sample and each category of "intimacy", also It is associated with the size of the probabilistic graphical model energy function E (v1, v2) of.

RBM looks so complicated, so little soul artist Xi Yao can not be like Bayesian as RBM to draw a portrait of it?

 

v2-7429becd51ead3c3c4fa0fdf00756795_b.png

RBM's portrait?

Of course it may - an article from the definition of the energy function E (v1, v2) point of view, this function calculates the time vector v1 and v2 "intimacy" is no direction, that is E (v1, v2) It must be equal to E (v2, v1) of. Therefore, for any two points in the drawing, the edges between them is no direction (note the distinction between naive Bayes and Bayesian Network oh directed edge). So for RBM has two connection points:

 

v2-c23cb3d31deaecac4c503d33e58e3c6f_b.png

Edge is undirected, and represent the energy function with a small blue square.

An article according to the expression, a parameter of RBM - eigenvector matrix W is connected in each dimension X and dimension of each vector Y category, so this RBM should look like the following:

 

 

v2-833b058257c141e3139e34ace1fb6a5d_b.png (Small box too many times, but omitted ~ each side still two nodes representing these parameters with the energy function (in) connection)

Eh? Do you feel. . . Not enough chaos! (Frenzied good idea

Recall from naive Bayes Bayesian network is to experience the relationship between the various dimensions of the interior of the X (various random variables) also modeled! The RBM is the same with the naive Bayes random variables for internal X (also includes more general said internal Y) it is to have independence assumptions. In the "naive Bayes to Bayesian network" has been described in detail this independence assumption in many cases is very deadly!

So, do not just assume that what man does - Boltzmann unlock the shackles of the body inside it - so that X and Y are random variables inside can freely exchange (that is, the model has described the X and Y each internal random the ability to condition-dependent relationship between the variables, but describes the direct two-way conditional dependencies here). Therefore, viewed in the figure is like this:

 

v2-31fcef2a0b3902d6129d1d3b7a23edec_b.png

Boltzmann machine: meow meow meow ~ favorite feeling of freedom friends ~

So how mathematically described it? Of course, you can directly copy the RBM approach it.

RBM is assumed that in the function of the energy function:

E(v1,v2)=-(b^Tv1+c^Tv2+v1^T\cdot W\cdot v2)

Connected with a matrix W all dimensions X and Y in all dimensions. So you want to connect all the internal dimensions X and Y dimensions, then all internal, simply:

 

v2-d1bd6fdc102fff33317086ce5600040b_b.png

I believe that smart you are very easy to understand it, here with two of empathy with the W matrix R, S used to connect the internal dimensions of the various dimensions v1 and v2 are internal.

So Boltzmann machine (BM) function with the assumption that the form of RBM, are

 

v2-2fbb89787e10085fdf32f28ac0e417f2_b.png

Wherein the partition function Z:

 

v2-e86b538be2f7e73b20e5e2186d585830_b.png

Just inside the energy function into a more psychotic above form only a ~

 

v2-7429becd51ead3c3c4fa0fdf00756795_b.png

According to the "generalized machine learning" , we have understood the Boltzmann machine (BM) hypothesis function, you also need to explore how to train this free and powerful model. And how to train, that is, to find or design a suitable loss function, then select the appropriate optimization algorithm to minimize the loss of function resulting model.

However, free and casual brings the price is:

non! often! difficult! To! Training! practice!

Imagine if the loss function we use the most common likelihood function (eh eh? Is not leaving a shortfall of a loss of function of writing articles? All right all right, but fortunately in the "EM algorithm" there is talk likelihood function), then any of the major optimization algorithm will calculate the amount of explosive (imagine the hypothetical derivative function Boltzmann machine, especially for the partition function of the large denominator horror !), even the simplest of gradient descent (of course here to maximize the likelihood function is the gradient ascent friends), will make the BM is not realistic training on engineering.

So how to do it?

A mainstream solution is to use a modified gradient ascent method - MCMC algorithm to maximize the likelihood function.

The origin of this algorithm is a pure mathematical process, and a pile of formula requires a very long article to the collapse of the clear, so. . . For more details, refer to "Deep Learning" (Chinese version link exacity / deeplearningbook-chinese ) Chapter 18 friends (¯∇¯), mathematics is not good Shenru Oh (small evening Honestly speaking, there is no small evening very clear understand thoroughly ... so now talking about the book certainly not as Shang Hao ~)

 

v2-7429becd51ead3c3c4fa0fdf00756795_b.png

Eh eh? Love to think there's little doubt provoke evening:

First of all, this is the Boltzmann machine full potential yet? How it looked a little familiar energy function. . . A bit like, nerve tensor Network (NTN) ? Eh? You will not have to do with neural networks? Will. . . Will the collision sparks with deep learning? Called the depth of the Boltzmann machine ?

Further, since the Bayesian network is used to describe directional side, the Boltzmann machine is used to describe non-directional edges, but both seem that the two are free ~. . . Which is better?

Let small evening at the height of the probability map to describe a new war it ~

 

Published 33 original articles · won praise 0 · Views 3294

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/104553463