Summary of machine learning/computer vision/embedded development engineer interview questions

Personal WeChat public account

Insert picture description here

1. The foundation of deep learning

1. Why do you need to do feature normalization and standardization

1. Make the features of different dimensions at the same numerical magnitude, reduce the impact of features with large variance, and make the model more accurate.
2. Speed ​​up the convergence speed of the algorithm

2. What are the commonly used normalization and standardization methods?

1. Linear normalization (min-max standardization)
x'= (x-min(x)) / (max(x)-min(x)), where max is the maximum value of the sample data, and min is the value of the sample data The minimum value is
suitable for the situation where the numerical values ​​are relatively concentrated. The empirical value constant can be used instead of max, min
2. Standard deviation normalization (z-score 0 mean value normalization)
x'=(x-μ) / σ, where μ is The mean of all samples, σ is the standard deviation of all samples.
After processing, it conforms to the standard normal distribution, that is, the mean is 0 and the standard deviation is 1.
3. Non-linear normalization
uses non-linear functions log, exponent, tangent, etc., such as y = 1-e^(-x), the change is more obvious in x∈[0, 6], which is used in scenarios with large data differentiation

Insert picture description here

Introduce the principle and function of hole convolution

Hole convolution also expands convolution and expands convolution. The original proposal is to solve the problem of image segmentation when the feature map is reduced by adding downsampling (pooling, convolution) to increase the receptive field , and then upsampling and going back. Loss of accuracy. The hole convolution introduces a hyperparameter of the expansion rate, which defines the distance between the values ​​when the convolution kernel processes the data.
Insert picture description here
It is possible to increase the receptive field while keeping the size of the feature map unchanged, thereby replacing down-sampling and up-sampling, and obtaining different receptive field sizes by adjusting the expansion rate.

  • a is an ordinary convolution process (dilation rate = 1), and the receptive field after convolution is 3
  • b is the hollow convolution with dilation rate = 2, and the receptive field after convolution is 5
  • c is the hollow convolution with dilation rate = 3, and the receptive field after convolution is 8

It can be said that ordinary convolution is a special case of hole convolution.

How to judge whether the model is over-fitting, and what strategies are there to prevent over-fitting?

In the process of building a model, a training set and a test set are usually divided.

When the model has high accuracy on the training set and poor accuracy on the test set, the model overfits; when the model has poor accuracy on the training set and test set, the model underfits.

Prevention of overfitting strategies:

  • Increase training data: Get more data, you can also use data enhancement, sample enhancement, etc.
  • Use a suitable model: Use to reduce the number of layers of the network and reduce the amount of network parameters.
  • Dropout: Primitive and inhibitory part of the neurons in the network, so that each training has a batch of neurons that do not participate in the model training.
  • L1, L2 regularization: limit the size of the weights during training, increase the penalty mechanism, and make the network more sparse
  • Data cleaning: remove problem data, wrong labels and noisy data
  • Limit network training time: Output the training set and test set losses separately during training. When the training set loss continues to decrease, but the validation set loss does not decrease, the network begins to overfit, and you can stop training at this time.
  • Using BN (Batch Normalization) in the network can also prevent overfitting to a certain extent

In addition to SGD and Adam, what other optimization algorithms do you know?

There are three main categories:
1: Basic gradient descent method: SGD, BGD
2: Momentum optimization method: including momentum, NAG, etc.
3: Adaptive learning rate optimization method: including Adam, AdaGrad, RMSProp, etc.

What are the tuning techniques for training neural networks

Explain the concept of receptive field

The receptive field refers to the size of the area on the input image for each pixel point on the feature map output by each layer of the convolutional neural network. The larger the range of the neuron’s receptive field, the larger the original image range it touches. It means that it can learn more global and higher semantic level feature information. On the contrary, the smaller the range of the receptive field, the more local and detailed features it contains. Therefore, the range of the receptive field can be used to roughly judge the abstraction level of each layer, and we can clearly know that the deeper the network, the larger the receptive field of the neuron.

The size of the receptive field of the convolutional layer is related to the size and step length of the convolution kernel of the previous layer, and has nothing to do with padding.

Illustrate with an example: the
field of perception is the size of the field of view that CNN can see. For example, a 3x3 convolutional layer, the number of convolution kernels=1, stride=1, padding=1, a tensor tensor1 of shape (N, C, H, W), after this convolutional layer , Becomes a tensor tensor2 of shape (N, 1, H, W). So how many pixels of information does tensor1 contain in one pixel of the new tensor tensor2? 3x3=9 pieces. A pixel of the tensor tensor2 "sees" the content of the 9 grid pixels at the same position of the original feature map tensor1, which means that the receptive field of the tensor tensor2 is 3x3 in size. What if I replace the convolutional layer with a 1x1 convolutional layer? Then you can only "see" the content of the pixels at the same position of the original feature map tensor tensor1, that is to say, the receptive field of the tensor tensor2 is 1x1 size. Let me change a little bit more. tensor1 has gone through 2 convolutional layers. The first convolutional layer is 3x3 in size, the number of convolution kernels=1, stride=1, padding=1, and the second convolutional layer is 3x3 size, the number of convolution kernels=1, stride=1, padding=1, tensor1 becomes tensor tensor2 after these two convolutional layers, dare to ask the tensor tensor2 receptive field size? 5x5, a pixel of tensor2 "sees" the content of 5x5 pixels near the same position pixel of the original feature map tensor1. Let me change a little bit more. tensor1 has gone through 2 convolutional layers. The first convolutional layer is 3x3 in size, the number of convolution kernels=1, stride=2, padding=1, and the second convolutional layer is 3x3 size, the number of convolution kernels=1, stride=1, padding=1, tensor1 becomes tensor tensor2 after these two convolutional layers, dare to ask the tensor tensor2 receptive field size? 7x7, a pixel of tensor2 "sees" the content of 7x7 pixels near the same position pixel of the original feature map tensor1. We can see that superimposing the convolutional layer can expand the receptive field size, and the convolution step size will also affect the receptive field size.

What is the role of downsampling? What are the ways?

Down-sampling has two functions, one reduces the amount of calculation and prevents over-fitting; the other is to increase the receptive field, so the following convolution kernel can learn more global information. There are two main ways of down-sampling:
1 , Adopt the pooling layer with stride of 2, such as Max-pooling and Average-pooliing. Currently max-pooling is usually used, because it is simple to calculate and can well retain texture features.
2. Using a convolutional layer with stride of 2, the down-sampling process is a process of information loss, and the pooling layer is not learnable. Replacing pooling with a learnable convolutional layer with stride of 2 can get better results At that time, a certain amount of calculation was also increased.

Principles and common methods of upsampling

In the convolutional neural network, because the input image is extracted through the convolutional neural network, the output size tends to become smaller, and sometimes we need to restore the image to its original size for further calculations (such as image semantic segmentation), This operation of mapping an image from a small resolution to a large resolution is called upsampling. It is generally implemented in three ways:
1: Interpolation. Bilinear interpolation is generally used because the effect is the best, although it is computationally better than other The difference method is complicated. But compared to convolution calculation, it can be said that it is not worth mentioning. Other interpolation methods include nearest neighbor interpolation and trilinear interpolation.
2: Transposed convolution or deconvolution, by filling the input feature map interval to 0, and then performing standard convolution calculations, the size of the output feature map can be made larger than the input.
3: Max Unpooling, record the index position of the maximum value at the symmetric max pooling position, and then place the corresponding value to the original maximum position during the unpooling phase, and fill in the remaining positions with 0;
Insert picture description here

What are the parameters of the model? How to calculate?

What is the FLOPS (calculation amount) of the model? How to calculate?

The concept and function of depth separable convolution

Depth separable convolution performs traditional convolution in two steps, depthwise and pointwise . First follow the channel

Guess you like

Origin blog.csdn.net/zhonglongshen/article/details/114704242