[Dalian University of Technology] Computer Major Elective: Deep Learning 2020 Final Review

After the exam, the teacher's exam scope is his PPT (although I don’t understand the formula in comparison), it will benefit the younger students and build up their character, everyone needs to
choose ~ Question type: single choice 20, multiple choice 15 judgment 10 'Discussion 30'
compulsory test:

Calculation problems:

Convolutional network calculation (tested)
pooling calculation

Essay questions:

Over-fitting solutions (emphasis, one more choice and one short answer)

  1. dropout
  2. Reduce dimensions, reduce width, and reduce model complexity
  3. Parameter regularization
  4. data augmentation

  5. General methods of early stopping data preprocessing: centralization, normalization, whitening, dimensionality reduction,
    target detection RPN network definition, and its role in faster-RCNN (I didn’t understand what was written on the PPT, please search online)
    Definition and role of pooling:
  6. Dimensionality reduction
  7. Achieve nonlinearity
  8. Expand the receptive field
  9. Realize immutability
    Commonly used CNN model, briefly describe the characteristics
  10. LeNet
  11. AlexNet
  12. VGG-16
  13. GoogleNet
  14. ResNet
  15. DenseNet
    deep learning training general steps

Not tested:
KNN, SVM principle
Clustering learning Kmeans
gradient descent
Target detection
LSTM: forget gate, input gate, output gate
GRU has a reset door and an update door (elevator door), the reset door determines how the new input Combined with the previous memory, the renewal gate determines how much previous memory works. GRU merges the forget gate and input gate in LSTM into an update gate

After-school exercises:

  1. Suppose there are 1,000 photos of 5 different animals, briefly describing how
    to complete this task under the conditions of supervised learning and unsupervised learning.

Supervised learning: SVM, KNN, neural network, decision tree
Unsupervised learning: Kmeans performs cluster cutting, k=5

  1. The samples in the data set are known as {cats, dogs, tigers, carps, sharks, sparrows,
    eagles, frogs}. Please construct a decision tree from the perspective of living environment and respiratory organs, and
    divide the samples into mammals, birds, amphibians, Different types of fish
    Decision tree

Multiple choice

exam questions

The role of 1x1 kernel in CNN
Whether the direct correlation of a single model in aggregation learning is high or low
SPPNet performance (multiple choice)
Supervised learning CNN The last layer of neurons and the number of classifications are the same, right?
If you take out the model trained on imagenet , enter one A white picture, then it has equal probability in all categories.
If the random initialization weights are all set to 0, what will happen?

1. What is the "cache" used in forward propagation and back propagation? (D)
A. It is used to track the hyperparameters we are searching to speed up the calculation.
B. Used to cache the intermediate value of the cost function during training.
C. We use it to pass the variables calculated in the back propagation to the corresponding forward propagation step, and it contains variables useful for calculating the derivative of the forward propagation.
D. We use it to pass the variables calculated in the forward propagation to the corresponding back-propagation step, and it contains variables useful for calculating the derivative of the back-propagation.

Analysis: "Cache" records the value from the forward propagation unit and sends it to the back propagation unit. This is the need for chain derivation.

2. Which of the following are "hyperparameters"? (ABEF)
A. The size of the hidden layer
B. The number of layers of the neural network
C. The activation value
D. Weight
E. Learning rate
F. Number of iterations
G. Bias

3. Which of the following statements is correct? (A)
A. The deeper layers of the neural network usually compute more complex input features than the previous layers.
B. The front layer of the neural network usually computes more complex input characteristics than the deeper layers.

4. The following statement about neural networks is correct: A
A. The total number of layers L is 5, and the number of hidden layers is 3.
B. The total number of layers L is 3, and the number of hidden layers is 3.
C. The total number of layers L is 4, and the number of hidden layers is 4.
D. The total number of layers L is 5, and the number of hidden layers is 4.

Analysis: The number of network layers is calculated as the number of hidden layers + 2. The input and output layers are not counted as hidden layers.

5. During the forward propagation, in the forward propagation function of layer l, you need to know what the activation function (Sigmoid, tanh, ReLU, etc.) in layer l is, and the corresponding back propagation function during back propagation It is also necessary to know what the activation function of the first layer is, because the gradient is calculated based on it. Is this description correct? (A)

A. Correct
B. Wrong

Analysis: Different activation functions have different derivatives. During back propagation, it is necessary to know which activation function is used in forward propagation to calculate the correct derivative.

6. Some functions have the following properties:
when calculating a function using shallow network circuits, a large network is required (we measure the size by the number of logic gates in the network), but (ii) using deep network circuits to calculate it, only needs A network with a small exponent. true and false? (A)

A. Correct
B. Wrong

Analysis: The number of hidden units in the deep network is relatively small, and the number of hidden layers is relatively large. If the shallow network wants to achieve the same calculation result, it needs an exponential increase in the number of units.

7. The previous question uses a specific network. In general, what is the dimension of the weight matrix W[l] related to layer l? (A)

The dimension of AW[l] is (n[l], n[l−1])
The dimension of BW[l] is (n[l-1], n[l])
The dimension of CW[l] is (n[ l+1], n[l])
The dimension of DW[l] is (n[l], n[l+1])

Analysis: Generally speaking, the shape of W[l] is (n[l], n[l-1]), and the shape of b[l] is (n[l], 1)

Beihang 2019 machine learning exam questions (partial)

  1. Bayesian decision, a decision based on minimum risk and minimum variance

  2. The basic idea of ​​svm, model expressions, the physical meaning of soft and hard intervals, and how to solve nonlinear problems

  3. What is over-fitting and what are the solutions?
    Over-fitting means that the model is too complex and performs well on the training set but not well on the test set.
    Solution:
    simplify the model structure (reduce the depth/reduce the width);
    dropout (commonly used in deep learning)
    parameter regularization: reduce complexity and improve stability
    data augmentation: take CV as an example, crop rotation and add noise in the training set.
    Early stopping: In each epoch, stop training when the accuracy does not change.

  4. The pca algorithm is based on the idea of ​​minimum mean square error.

  5. Give a 4X4X4X3 fully connected neural network, derive the back propagation algorithm, take the third node of the second layer as an example

  6. Give the connection between machine learning and deep learning, what are the advantages and disadvantages of each, and how do you think deep learning will develop in the future.

Beihang University 2019 Machine Learning Exam Questions https://www.csdn.net/gather_2d/MtjaYg5sMDcyODMtYmxvZwO0O0OO0O0O.html

Deep learning test questions

Detailed analysis

1. Multiple choice questions

  1. The "loss function" of the neural network (Loss fuction) measures (A)

A. The gap between the predicted value and the true value

B. The gap between the training set and the test set

C. The amount of information lost by dropout

D. The amount of information lost by pooling

  1. The limit of the derivative of the function f(x)=1/(1+e^(-x)) at x>∞ is (B)

A.1 B.0 C.0.5 D.∞

Analysis : The limit of the sigmoid function is at (0,1), here is the derivative S'(x)=S(x)(1-S(x)), so it should be 0.

  1. In the process of backpropagation, the gradient of the variable (C) is calculated first, and then it is backpropagated.

A. Connection weight B. Loss function C. Activation function D. Feature mapping

  1. The 16 in the name of the convolutional neural network VGG16 refers to (C)

    A. Paper published in 2016

    B. The network has 16 layers in total

    C. The network has 16 layers of parameters to be trained

    D. VGG announced the 16th generation network

Analysis : VGG16 contains:

13 convolutional layers (Convolutional Layer), respectively represented by conv3-XXX

3 fully connected layers (Fully connected Layer), respectively represented by FC-XXXX

5 pooling layers (Pool layer), respectively represented by maxpool

Among them, the convolutional layer and the fully connected layer have weight coefficients, so they are also called weight layers. The total number is 13+3=16, which is

The source of 16 in VGG16. (The pooling layer does not involve weights, so it does not belong to the weighting layer and is not counted).

So the 16 layers here refer to the number of layers that need to be trained.

  1. In the neural network, the main source of the "gradient disappearance" problem is (D)

A. Discarded by Dropout

B. Discarded by Pooling

C. The gradient is negative

D. Gradient approaches zero

  1. Which of the following introduces nonlinearity in neural networks B
    A. Stochastic gradient descent
    B. Modified linear unit (ReLU)
    C. Convolution function
    D. None of the above

  2. Which of the following is not an adaptive learning rate method A
    A. Mini-batch SGD
    B. Adagrad
    C. RMSprop

  3. If the learning rate used is too large, it will lead to C
    A. The network converges fast
    B. The network converges fast
    C. The network cannot converge
    D. Unsure

  4. Which of the following target detection networks is a one-stage network C
    A. Faster-rcnn
    B. RFCN
    C. YOLOV3
    D. SPP-net

  5. Assume that the activation function X is used in the hidden layer in the neural network. Given any input in a specific neuron, it will get an output [-0.0001]. X may be one of the following activation functions B
    A. ReLU
    B. tanh
    C. Sigmoid
    D. None of the above

Analysis : The activation function may be tanh, because the value range of the function is (-1,1).

  1. If you increase the width of the neural network, the accuracy will increase to a threshold and then start to decrease. The reason for this phenomenon may be that C
    A. Only part of the nuclei is used for prediction
    . B. When the number of nuclei increases, the predictive ability of the neural network decreases
    . C. When the number of nuclei increases, its correlation increases, resulting in overfitting
    D. Above All wrong

Two, multiple choice questions

  1. There are many parameters in the neural network, what are the commonly used initialization methods? (ABD)

A. All zero initialization B. Random initialization C. Load pre-training model D. Use deep belief network

Analysis : The Deep Belief Network (DBN) solves the optimization problem of deep neural networks through layer-by-layer training. Through layer-by-layer training, the entire network is given a good initial weight, so that the network can be achieved by fine-tuning Optimal solution.

  1. The commonly used activation functions of artificial intelligence networks are (ABD)

A.sigmond B.tanh C.sinh D.relu E.cos

  1. Which one is correct about the gradient descent method? (BD)

    A. The accuracy reduction method will be updated along the gradient direction during iteration

    B. The gradient descent method will update along the negative gradient direction during iteration

    C. The gradient direction is the direction that makes the function value drop the fastest

    D. The gradient direction is the direction that makes the function value rise and fall fastest

Analysis : The gradient is a vector. The objective function drops the fastest at a specific point along the opposite direction of the gradient. An image metaphor is to imagine that when you go down a mountain, you can only take one step and the fastest direction to descend is the opposite direction of the gradient. Each step is equivalent to an iterative update of the gradient descent method.

  1. What are the commonly used pooling layers? (AB)

A.MaxPooling B.AveragePooling C.MinPooling D.MedianPooling

  1. Compared with ordinary neural networks, the "cycle" of recurrent neural networks (RNN) is mainly reflected in (ABC)

A. More backpropagation times during training

B. After a certain round of training, reset the parameters to zero

C. The output of deep nodes will in turn affect shallow nodes

D. Each node self-circulates

  1. When the accuracy of image classification is not high, you can consider which of the following methods to improve the accuracy (ABC)
    A. Data enhancement
    B. Adjust hyperparameters
    C. Use pre-trained network parameters
    D. Reduce data sets

  2. Which of the following neural network structures will have weight sharing AB
    A. Convolutional neural network
    B. Recurrent neural network
    C. Fully connected neural network

  3. Regarding the gradient descent algorithm, the following statement is correct ABC
    A. The stochastic gradient descent algorithm considers a single sample to update the weight each time
    B. Mini-Batch gradient descent algorithm is a compromise between batch gradient descent and stochastic gradient descent
    C. Batch gradient descent The algorithm considers the entire training set for weight update every time

  4. When the accuracy of image classification is not high, you can consider which of the following methods to improve the accuracy:
    A. Data enhancement
    B. Adjust hyperparameters
    C. Use pre-trained network parameters
    D. Reduce data set

Guess you like

Origin blog.csdn.net/weixin_42665498/article/details/109435389