"Principles of Artificial Neural Networks" Reading Notes (6)-Boltzmann Machine

Summary of all notes: "Principles of Artificial Neural Networks"-Summary of Reading Notes

1. The proposal of stochastic neural network

Causes of BP and Hopfield networks falling into local minimum points

  • The network error or energy function constitutes a nonlinear hypersurface with multiple minimum points;
  • The network error or energy function can only change monotonously according to the direction of gradient descent, and cannot have any upward trend.

Insert picture description here
The basic idea of ​​stochastic neural network

  • Not only can the network error or energy function change in the direction of gradient descent, but it can also change in the direction of gradient rise in a certain way, so that it is possible to make the network jump out of the local minimum and converge to the global minimum.

Characteristics of stochastic neural networks

  • The output state of the neuron is determined by probability;
  • The adjustment of the network connection weight is processed according to a certain probability distribution;
  • The transition of the network state is determined by a certain probability distribution.

Boltzmann机

  • Is a typical stochastic neural network
  • Is the first neural network inspired by statistical mechanics
  • Its name comes from Boltzmann's early work in statistical thermodynamics and the dynamic distribution behavior of the network itself
  • It is obtained by Hinton et al. based on the idea of ​​simulated annealing and introduced a random mechanism on the basis of the discrete Hopfield neural network.

Second, the network structure of the Boltzmann machine

Between the multi-layer hierarchical structure of the BP neural network and the single-layer fully interconnected structure of the discrete Hopfield neural network.

Nn in the networkThe n neurons are connected to each other in a two-way symmetrical connection structure, that is,wij = wji w_{ij}=w_{ji}wij=wj i

Insert picture description here
Each neuron has no feedback to itself, that is, wii = 0 w_{ii}=0wii=0

The output of each neuron xj x_jxjBoth are 0 and 1 binary discrete output.

n n The state set of n neurons constitutes the state of the Boltzmann machine network.

n n The n neurons are divided into two parts: the visible layer and the hidden layer. The visible layer is further divided into the input part and the output part, but these levels have no obvious boundaries in the Boltzmann machine.

Three, Boltzmann machine processing unit model

Insert picture description here
Insert picture description here

Fourth, the energy function of the Boltzmann machine

The energy function of the Boltzmann machine is E = − 1 2 ∑ i = 1 n ∑ j = 1 nwijxixj + ∑ i = 1 n θ ixi E=-\frac12\sum_{i=1}^n\sum_{j=1} ^nw_{ij}x_ix_j+\sum_{i=1}^n\theta_ix_iE=21i=1nj=1nwijxixj+i=1nθixi

With the operation of the Boltzmann machine, in a probabilistic sense, the energy of the network shows a downward trend. This means that in the evolution of the network state, although the overall change trend of the network energy is declining, it cannot be ruled out that a certain neuron changes its state according to a small probability event at a certain moment, so that the energy of the network temporarily rises.

5. Boltzmann distribution of Boltzmann machine

The probability that the Boltzmann machine network is in a certain state mainly depends on the energy of the network in that state. The lower the energy corresponding to a certain network state, the greater the probability that the state appears; the higher the energy corresponding to a certain network state, the The lower the probability of the state appearing.

When the network status is updated repeatedly and the number of updates is large enough, the probability of a certain state in the network follows the Boltzmann distribution P (E i) = e − E i T ∑ i = 1 me − E i TP(E_i)=\frac {e^{-\frac{E_i}T}}{\sum_{i=1}^me^{-\frac{E_i}T}}P(Ei)=i=1meTEieTEi

Features

  • The smallest energy state appears with the greatest probability
  • The probability that the Boltzmann machine is in a certain state depends on the network temperature parameter TTT
    when temperatureTTWhen T is high, the probability of occurrence of different states of the network is very close, and it is easier for the network to jump out of the local minimum and reach the global minimum;
    when the temperature isTTWhen T is low, the probability of each state of the network varies greatly. After the network falls into the global minimum or local minimum, although there is a certain possibility of jumping out, the probability of jumping out is relatively small.

Six, Boltzmann machine operating rules

Simulated annealing algorithm

The basic idea

  • Think of a neuron as a "particle" inside a metal. The state of the neural network is the set of states of each particle, and the energy of the neural network in each state is the energy state of the particle. If a control parameter TT is set during the operation of the neural networkT simulates the temperature during metal annealing, so thatTTWhen T is large, the network energy is more likely to change from low to high, andTTWhen T is small, the network energy is less likely to change from low to high, then inTTWhen T changes slowly from high to low, the change process of the state of the entire neural network simulates the annealing process of metals. When the parameterTTWhen T drops to a certain level, the network will converge to the minimum energy.

Insert picture description here
Insert picture description here

Network operation rules

Insert picture description here
Insert picture description here
Insert picture description here

Seven, the learning rules of Boltzmann machine

The essence of Boltzmann machine to realize associative memory is that the network learns the target probability distribution function and memorizes it on the connection weight of the network, and can reproduce this probability distribution in the later recall stage.

When the Boltzmann machine makes the network state transition enough times according to the operating rules, the appearance of each state in the network will obey the Boltzmann distribution. The probability of the appearance of the network state obtained by the Boltzmann distribution is called the expected probability, and the actual probability of each state of the network during the operation of the network is called the actual probability. The difference between the two probabilities is the basis for the network to adjust the connection weight.

Self-associative memory learning rules

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

The learning rules of interconnected want to remember

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
The next chapter Portal: "Principles of Artificial Neural Networks" Reading Notes (7)-Adaptive Resonance Theory Neural Network

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/114076059