Deep Learning, Introduction to Neural Networks

Table of contents

1. The overall structure of the neural network

2. Neural network architecture details

3. Regularization and activation functions

4. Neural network overfitting solution

1. The overall structure of the neural network

 

ConvNetJS demo: Classify toy 2D data

We can look at the website of this neural network, which can be used to learn.

The overall structure of the neural network is as follows 1:

  1. Perceptrons are the most basic of all neural networks and the building blocks of more complex neural networks. It only connects one input neuron and one output neuron.

  2. Feed-Forward Networks A feed-forward network is a collection of perceptrons in which there are three basic types of layers: input layer, hidden layer, and output layer. During each connection, the signal from the previous layer is multiplied by a weight, added with a bias, and passed through an activation function. Feedforward networks iteratively update parameters using backpropagation until the desired performance is achieved.

  3. Residual Networks (Residual Networks/ResNet) A problem with deep feedforward neural networks is the so-called gradient disappearance, that is, when the network is too deep, useful information cannot be backpropagated throughout the network. When updating parameters.

For the overall structure of the neural network, we summarize it into four points: hierarchical structure, neuron, full connection and nonlinearity.

 

Hierarchy :

It is not difficult to see from the above figure that we generally divide the neural network into three parts in the neural network:

1: input layer (input layer)

2: hidden layer (hidden layer)

3: output layer (output layer)

ps: It should be noted that the hidden layer in the middle can have multiple layers.

Neuron :

There are many round ball-like things in each level. This thing is the neuron in the neural network, which is the amount of data or the size of the matrix. The content of neurons in each level is different. .

In each neuron in the input layer are different features of your input original data (generally called X), for example, x is a picture, and the pixels of this picture are 32 32 3, and each pixel in it is it The features of it, so there are 3072 features corresponding to the number of input layer neurons is 3072, and these features are input in the form of a matrix. Let's take an example. For example, our input matrix is ​​1*3072 (the number in the first dimension indicates how many inputs there are in a batch (batch refers to how much data is input for each training); the number in the second dimension is each How many features the input has.)

Each layer of neurons in the hidden layer represents the data that updates x once, and there are several neurons in each layer (for example, there are four neurons in the hidden1 layer in the figure) that extend the characteristics of your input data to Several (for example, four in the picture), for example, your input three features are age, weight, and height, and the first neuron in the hidden1 layer in the picture can be transformed into this 'age 0.1+weight 0.4+height 0.5', while the second neuron can be expressed as 'age 0.2+weight 0.5+height 0.3', neurons in each layer can have different representations.

The number of neurons in the output layer mainly depends on what you want the neural network to do. For example, if you want it to do a 10-category problem, the matrix of the output layer can be a '1*10' matrix (the first dimension It represents the same number as the input layer, and the next 10 are 10 categories).

Full connection :

We see gray lines between each layer and the next layer, these lines are called fully connected (because you see that every neuron in the previous layer is connected to all neurons in the next layer) , and these lines can also be represented by a matrix. This matrix is ​​usually called a 'weight matrix' and is represented by a capital W (a parameter that we need to update later). The dimension of the weight matrix W mainly depends on the input data dimension of the incoming data from the previous layer and the input dimension of the next layer. It can be simply understood as the number of neurons in the upper layer and the number of neurons in the next layer. For example, there are 3 neurons in the input layer in the figure, and 4 neurons in the hidden1 layer, the dimension of W is '3*4', and so on. (Mainly because the form of our fully connected layer is a matrix operation form, which needs to satisfy the operation rules of matrix multiplication.

Non-linear :

Non-linear (non-linear), that is, the mathematical relationship between variables, not a straight line but a curve, a surface, or an uncertain attribute, is called nonlinear. Nonlinearity is one of the typical properties of the complexity of nature; compared with linearity, nonlinearity is closer to the nature of objective things itself, and is one of the important methods for quantitative research to understand complex knowledge; any relationship that can be described by nonlinearity is generally called non-linearity. linear relationship.

2. Neural network architecture details

Overall structure :

Infrastructure: f=W2max(0, W1x)

Continue to add a layer: f=W3max(0, W2max(0,W1x))

The power of neural networks lies in fitting complex data with more parameters.

The effect of the number of neurons on the result :

Change before:

layer_defs = [];
layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});
layer_defs.push({type:'fc', num_neurons:, activation: 'tanh'});
layer_defs.push({type:'fc', num_neurons:2, activation: 'tanh'});
layer_defs.push({type:'softmax', num_classes:2});
​
net = new convnetjs.Net();
net.makeLayers(layer_defs);
​
trainer = new convnetjs.SGDTrainer(net, {learning_rate:0.01, momentum:0.1, batch_size:10, l2_decay:0.001});

 

layer_defs = [];
layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});
layer_defs.push({type:'fc', num_neurons:2, activation: 'tanh'});
layer_defs.push({type:'fc', num_neurons:2, activation: 'tanh'});
layer_defs.push({type:'softmax', num_classes:2});
​
net = new convnetjs.Net();
net.makeLayers(layer_defs);
​
trainer = new convnetjs.SGDTrainer(net, {learning_rate:0.01, momentum:0.1, batch_size:10, l2_decay:0.001});

Changed to the pattern after 2

Then the number of nerves is adjusted to 5 or later:

layer_defs = [];
layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});
layer_defs.push({type:'fc', num_neurons:5, activation: 'tanh'});
layer_defs.push({type:'fc', num_neurons:5, activation: 'tanh'});
layer_defs.push({type:'softmax', num_classes:2});
​
net = new convnetjs.Net();
net.makeLayers(layer_defs);
​
trainer = new convnetjs.SGDTrainer(net, {learning_rate:0.01, momentum:0.1, batch_size:10, l2_decay:0.001});

 

3. Regularization and activation functions

The role of regularization :

In machine learning, regularization is often added to the loss function, which is called regularization. Prevent the model from overfitting, that is, add certain rules (restrictions) to the loss function to reduce the solution space, thereby reducing the possibility of finding an overfitting solution.

Activation function :

Commonly used activation functions include Sigmoid, Relu, Tanh, etc., and perform corresponding nonlinear transformations

 

The activation function is used to add nonlinear factors, improve the expressive ability of the neural network to the model, and solve problems that cannot be solved by the linear model.

When learning advanced mathematics, in the part of indefinite integral, there is a drawing to approximate the solution with straight thinking. Then, we can learn from it and use countless straight lines to approximate a curve.

4. Neural network overfitting solution

Parameter initialization :

Parameter initialization is very important, usually we use a random strategy for parameter initialization

W = 0.01 * np.random.randn(D, H)

Data preprocessing :

Different processing results will make a big difference in the effect of the model

 

DROP-OUT

This is the legendary Seven Injuries Fist

Overfitting is a huge headache in neural networks

  1. One meaning is: in machine learning, it is a strategy to solve the problem of model overfitting.

  2. Another meaning is: it is the implementation of the dropout technology, so that the output of each layer of the network is randomly selected to discard some neurons, which can prevent the problem of gradient disappearance and explosion, and help to improve the generalization ability of the entire network.

 

Guess you like

Origin blog.csdn.net/Williamtym/article/details/132028207