directory title
traditional neural network
Inapplicable situation of deep learning: cross-domain (stock forecasting problem), the law of old historical data is not suitable for the law of new data
Matrix calculation:
Input data x[32×32×3]=3072 pixels, expanded into a column,
the purpose: to make a 10 categories, 10 groups of weight parameters, get 10 values, the probability of belonging to each category
bias item b, 10 values
The weight parameter W is obtained: first random, or pre-trained model,
innovation: modify the loss function.
Loss function: Indicates the difference between the prediction and the actual. The larger the difference, the worse the W needs to be adjusted. The smaller the difference, the W fine-tuning
Regularization:
The results of W1 and W2 are the same as x, but it does not mean that w1 and w2 are the same. W1 only focuses on local features (partial, changes drastically with x, and is easily affected by abnormal points), and w2 focuses on the overall situation (balanced, stable changes
) Reflect the difference between w1 and w2? -----Regularized penalty (to prevent overfitting)
abnormal point processing: it is best to manually process it when entering the model (in the game data)
L2 penalty item:
Loss function = data loss + regularization penalty
regularization L2 penalty:
Activation function sigmoid
For classification problems, what we hope to get is a probability value. But the data from the model can do anything, how to do it? —Mapping
sigmoid function:
gradient disappearance problem: the larger the value, the closer the gradient is to 0, and the behavior parameters cannot be updated. The neural network is a model that passes one by one. This problem has led to no development of neural networks from 1997 to 12 years: multiple
linear Group Sum of Regression and Logistic Regression
loss function
After getting the predicted value,
1, first enlarge the difference (add an exponential function e x )
2, and then use normalization to convert it into a probability value
3. Calculate the loss value: only consider the probability value of the correct category, the closer to 1 the closer the loss value is to 0
log function:
forward propagation
Backpropagation (that is, gradient descent)
neural network: convert the features understood by humans into features understood by computers
Hidden layer 1: Feature 1 = 0.7H-0.1W+0.6A,. . .
Intermediate number – weight
Feature transformation: Wx+b becomes a new set of features
Hidden layer 2 does it again to find more suitable features.
The size and number of W and b:
Activation function Relu
If only linearity is introduced, the problem to be solved is limited – the nonlinear function Relu is introduced
: features less than 0 are directly deleted, and
the more important features are learned more
data preprocessing
Demeaning:
Normalization
DROP-OUT
Prevent overfitting without using all neurons
Kill neurons randomly:
Convolutional Neural Networks (CNNs)
There are two problems in processing images with fully connected neural networks:
- The amount of data to be processed is large and the efficiency is low
- It is difficult to retain the original features of the image in the process of dimension adjustment, resulting in low accuracy of image processing
1. The composition of the CNN network
The CNN network is inspired by the human visual nervous system.
The CNN network mainly consists of three parts:
- Convolutional layer, pooling layer and fully connected layer, where the convolutional layer is responsible for extracting local features in the image;
- The pooling layer is used to greatly reduce the magnitude of parameters (dimension reduction);
- The fully connected layer is similar to the part of the artificial neural network, which is used to output the desired result.
2. Convolutional layer
2.1 Calculation method of convolution
The purpose of the convolutional layer is to extract the features of the input feature map.
How to convolve: the original image, to extract the features on this image, perform a convolution with the convolution kernel (traverse the convolution kernel on the image), Output a feature result
The convolution operation is essentially a dot product between the filter and a local area of the input data.
Point calculation method, and other points can be calculated in the same way to get the final convolution result.
The final result is output as a feature map
2.2 padding
To ensure that the result is the same size as the original image, the feature map is much smaller than the original image through padding in the above convolution process. We can perform padding around the original image to ensure that the size of the feature map is not the same during the convolution process . Change.
Carry out 0 to fill:
5 5 map, to output 5 5 feature map:
2.3 stride
Design of step size: If the step size is 1, the stride can also be increased, for example, if it is set to 2, the feature map can also be extracted, as shown in the following figure:
2.4 Multi-channel convolution
The front is single-channel, and the bottom is multi-channel
The calculation method is as follows: when the input has multiple channels (for example, a picture can have three channels of RGB), the convolution kernel needs to have the same number of channels, and the correspondence between each convolution kernel channel and the input layer Channels are convolved, and the convolution results of each channel are added bit by bit to obtain the final Feature Map
2.5 Multi-convolution kernel convolution
When there are multiple convolution kernels, each convolution kernel learns different features, correspondingly generating a Feature Map containing multiple channels, and
n convolution kernels generate n Feature Maps
2...6 feature map size
The size of the output feature map is closely related to the following parameters:
- size: Convolution kernel/filter size, usually an odd number, such as 1*1, 3*3, 5*5
- padding: the way of zero padding
- stride: The step size
is implemented in tf.keras:
tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=None, padding='valid')
#pool_size: 池化窗⼝的⼤⼩
#strides: 窗⼝移动的步⻓,默认为1
#padding: 是否进⾏填充,默认是不进⾏填充的
3. Pooling layer (Pooling)
The pooling layer reduces the input dimension of the subsequent network layer, reduces the model size, improves the calculation speed, and improves the robustness of the Feature Map to prevent over-fitting. It mainly learns the convolutional layer
. The obtained feature map is subjected to subsampling processing, which mainly consists of two types:
3.1 Maximum Pooling
Max Pooling, take the maximum value in the window as the output, this method is widely used
3.2 Average Pooling
Avg Pooling, take the mean of all values in the window as output
4. Fully connected layer
The fully connected layer is located at the end of the CNN network. After the feature extraction of the convolutional layer and the dimensionality reduction of the pooling layer, the feature map is converted into a one-dimensional vector and sent to the fully connected layer for classification or regression operations.
flatten expansion, feature map expansion