Summary of all notes: "Deep Learning" Flower Book-Summary of Reading Notes
"Deep Learning" PDF free download: "Deep Learning"
CNN is a neural network designed to process data with a similar grid structure.
One, convolution calculation
2. Motivation
Convolution operations help improve machine learning systems through three important ideas:
- Sparse interactions
- Parameter sharing: refers to using the same parameters in multiple functions of a model.
- Equivariant representations
Three, pooling
The pooling function uses the overall statistical characteristics of adjacent outputs at a certain location to replace the output of the network at that location. The max pooling function gives the maximum value in the adjacent rectangular area.
Fourth, convolution and pooling as an infinitely strong prior
Whether a prior is considered strong or weak depends on the degree of concentration of the probability density in the prior. Weak priors have high entropy values, such as Gaussian distribution with large variance. Such a priori allows the data to have more or less freedom to change the parameters. Strong priors have low entropy values, such as Gaussian distribution with small variance. Such a priori plays a more active role in determining the final value of the parameter.
Thinking of convolutional neural networks as fully connected networks with infinitely strong priors can help us gain a better insight into how convolutional neural networks work.
- One of the key insights is that convolution and pooling can lead to underfitting.
- Another key insight is that when we compare the statistical learning performance of convolution models, we can only use other convolution models in the benchmark as the object of comparison.
V. Variants of the basic convolution function
- Valid convolution
- Same convolution
- Full convolution
Six, structured output
Convolutional neural networks can be used to output high-dimensional structured objects, not just predicting class labels for classification tasks or real values for regression tasks.
Seven, data type
The data used by convolutional networks usually contains multiple channels, and each channel is a different observation at a certain point in time or space.
8. Efficient convolution algorithm
Convolution is equivalent to using Fourier transform to convert both the input and the kernel to the frequency domain, performing point-by-point multiplication of the two signals, and then using the inverse Fourier transform to convert back to the time domain.
Designing methods to perform convolution or approximate convolution faster without compromising the accuracy of the model is an active area of research. Even techniques that only improve the efficiency of forward propagation are useful, because in a business environment, deploying a network usually consumes more resources than training a network.
Nine, random or unsupervised features
The most expensive part of convolutional network training is learning features.
There are three basic strategies to get the convolution kernel without supervised training.
- Simply initialize them randomly
- Design them manually
- Use unsupervised criteria to learn the kernel
Use unsupervised criteria to learn features so that they can be determined independently of the classification layer at the top of the network structure. Then it only needs to extract the features of all the training sets once to construct a new training set for the last layer. Assuming that the last layer is similar to logistic regression or SVM, learning the last layer is usually a convex optimization problem.
10. Neuroscience basis of convolutional networks
Convolutional networks are perhaps the most successful case of biology inspired artificial intelligence. Although convolutional networks are also guided by many other fields, some of the key design principles of neural networks come from neuroscience.