Watermelon book reading notes (5)-neural network

Summary of all notes: "Machine Learning" Watermelon Book-Summary of reading notes

1. Neuron model

Neural network is a widely parallel interconnected network composed of adaptable simple units. Its organization can simulate the interactive response of the biological nervous system to real world objects. The most basic component of a neural network is a neuron model. Many neurons are connected in a certain hierarchical structure to obtain a neural network.

2. Perceptrons and multilayer networks

The Perceptron is composed of two layers of neurons. Only the output layer neurons perform activation function processing, that is, only one layer of functional neuron (functional neuron), and its learning ability is very limited.

To solve the non-linear separability problem, it is necessary to consider the use of multilayer functional neurons. Such a network is called a multilayer feedforward neural network.
Insert picture description here

Three, error back propagation algorithm

The learning ability of multi-layer networks is much stronger than that of single-layer perceptrons. To train a multilayer network, simple perceptron learning rules are obviously not enough, and a more powerful learning algorithm is needed. The error back propagation (abbreviated BP) algorithm is the most outstanding representative among them. It is the most successful neural network learning algorithm today.

It is precisely because of its powerful representation ability that BP neural networks often encounter overfitting, and its training error continues to decrease, but the test error may increase.

  1. Early stop: divide the data into a training set and a validation set. The training set is used to calculate gradients, update connection weights and thresholds, and the validation set is used to estimate the error. If the error of the training set decreases but the error of the validation set increases, stop training and return at the same time The connection weight and threshold with the smallest validation set error.
  2. Regularization: Add a part to describe the complexity of the network in the error objective function.

Four, global minimum and local minimum

The global minimum must be the local minimum.

We need to try to jump out of the local minimum to get closer to the global minimum.

  1. Search from multiple different initial points;
  2. Simulated annealing: each step accepts a worse result than the current one with a certain probability;
  3. Stochastic gradient descent
  4. Genetic algorithm

Five, other common neural networks

(1) RBF network

It is a single hidden layer feedforward neural network, which uses radial basis functions as the activation function of hidden neurons, and the output layer is a linear combination of the outputs of hidden neurons.

(2) ART network

ART network is an important representative of competitive learning. The network consists of a comparison layer, an identification layer, an identification threshold and a reset module. Among them, the comparison layer is responsible for receiving input samples and passing them to the recognition layer neurons. Each neuron in the recognition layer corresponds to a pattern class. The number of neurons can be dynamically increased during the training process to add new pattern classes.

Competitive learning (competitive learning) is a commonly used unsupervised learning strategy in neural networks. When using this strategy, the output neurons of the network compete with each other. At each moment, only one neuron that wins the competition is activated, and the other neurons The state of the meta is suppressed.

(3) SOM network

SOM network is a competitive learning unsupervised neural network, which can map high-dimensional input data to low-dimensional space (usually two-dimensional) while maintaining the topological structure of the input data in the high-dimensional space, that is, the high-dimensional space Similar sample points are mapped to neighboring neurons in the output layer of the network.

(4) Cascade related networks

Cascading correlation network is an important representative of structural adaptive network (taking network structure as one of the goals of learning, and hoping to find the network structure that best matches the characteristics of the data during the training process).

(5) Elman network

Elman network is one of the most commonly used recursive neural networks (allowing a ring structure to appear in the network, thereby allowing the output of some neurons to be fed back as input signals).

(6) Boltzmann machine

Watch this video directly: Whiteboard Derivation Series Notes (28)-Boltzmann Machine

Six, deep learning

A typical deep learning model is a very deep neural network.

From the perspective of increasing model complexity, increasing the number of hidden layers is obviously more effective than increasing the number of hidden layer neurons, because increasing the number of hidden layers not only increases the number of neurons with activation functions, but also increases the number of activation function embedded. Set of layers.

The method of "pre-training + fine-tuning" can be regarded as grouping a large number of parameters, first finding the locally better settings for each group, and then combining these local better results to perform global optimization.

Next Chapter Portal: Watermelon Book Reading Notes (6)-Support Vector Machine

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/112827837