Top 25 Deep Learning Interview Questions and Answers

In recent years, the demand for deep learning has grown and its applications are being applied in various business sectors. Companies are now looking for professionals who can leverage deep learning and machine learning techniques. In this article, the 25 most frequently asked questions and answers in deep learning interviews will be compiled. If you are currently interviewing for deep learning related jobs, these questions will help you.

1. What is deep learning?

Deep learning involves taking large amounts of structured or unstructured data and using complex algorithms to train neural networks. It performs complex operations to extract hidden patterns and features (e.g. distinguishing images of cats from dogs)

2. What is a neural network?

Neural networks replicate the way humans learn, inspired by how neurons fire in our brains, but are much simpler than human brains.

The most common neural network consists of three network layers:

  • input layer
  • Hidden layer (this is the most important layer, where feature extraction is done and tuned for faster training and better performance)
  • output layer

Neural networks are used in deep learning algorithms such as CNN, RNN, GAN, etc.

3. What is a multi-layer perceptron (MLP)?

Like neural networks, MLPs have an input layer, a hidden layer, and an output layer. It has the same structure as a single-layer perceptron with one or more hidden layers. A single-layer perceptron can only classify linear classifiables with binary output (0,1), but an MLP can classify non-linear classes.

Except for the input layer, every node in the other layers uses a non-linear activation function. The input layer, the incoming data and the activation function are based on summing all the nodes and weights to produce the output. MLPs use a method called "backpropagation" to optimize the weights of nodes. In backpropagation, the neural network calculates the error with the help of a loss function, propagating this error backwards from the source of the error (adjusting the weights to train the model more accurately).

4. What is data normalization (Normalization) and why do we need it?

The Chinese translation of Normalization is generally called "normalization". It is a special function transformation method for numerical values. Convert to form a normalized value.

Normalization pulls the more and more biased distribution back to the standardized distribution, so that the input value of the activation function falls in the area where the activation function is more sensitive to the input, thereby making the gradient larger, speeding up the learning convergence speed, and avoiding the problem of gradient disappearance.

According to the different objects involved in normalization operations, they can be divided into two categories:

One is to perform Normalization operation on the activation value of each neuron in the L layer, such as BatchNorm/LayerNorm/InstanceNorm/GroupNorm and other methods belong to this category;

The other type is to normalize the weights on the edges connecting adjacent hidden layer neurons in the neural network, such as Weight Norm, which belongs to this category.

The regularization items such as L1/L2 added to the loss function seen in general machine learning also belong to this type of normalization operation in essence.

The normalization goal of the L1 regularization is to cause the sparseness of the parameters, which is to achieve the effect of making a large number of parameter values ​​​​get a value of 0, while the normalization goal of the L2 regularization is to effectively reduce the size of the original parameter values.

With these specification goals, the parameter values ​​are changed through specific normalization means to avoid model overfitting.

5. What is a Boltzmann machine?

One of the most basic deep learning models is a Boltzmann machine, which is similar to a simplified version of a multilayer perceptron. This model has a visible input layer and a hidden layer—just a two-layer neural network that randomly decides whether a neuron should be on or off. Nodes are connected across layers, but two nodes of the same layer are not connected.

6. What is the role of activation function in neural network?

Activation functions model whether a neuron in biology should fire or not. It accepts a weighted sum of inputs and biases as input to any activation function. From a mathematical point of view, the introduction of the activation function is to increase the nonlinearity of the neural network model. Sigmoid, ReLU, and Tanh are all common activation functions.

7. What is a cost function?

Also known as "loss" or "error," the cost function is a measure of how well a model performs. It is used to calculate the error of the output layer during backpropagation. We push the error back through the neural network and use it in different training functions.

8. What is gradient descent?

Gradient descent is an optimal algorithm for minimizing a cost function or minimizing error. The goal is to find a local global minimum of a function. This determines the direction the model should take to reduce the error.

9. What is backpropagation?

This is one of the most frequently asked questions in deep learning interviews.

In 1974, Paul Werbos first gave the learning algorithm of how to train general networks—back propagation. This algorithm can efficiently calculate the gradient during each iteration. The backpropagation algorithm is currently the most commonly used and most effective algorithm used to train Artificial Neural Networks (ANN). Its main idea is:

(1) Input the training set data into the input layer of ANN, pass through the hidden layer, and finally reach the output layer and output the result, which is the forward propagation process of ANN;

(2) Since there is an error between the output result of the ANN and the actual result, the error between the estimated value and the actual value is calculated, and the error is backpropagated from the output layer to the hidden layer until it is propagated to the input layer;

(3) In the process of backpropagation, adjust the values ​​of various parameters according to the error; continuously iterate the above process until convergence.

10. What is the difference between a feedforward neural network and a recurrent neural network?

Feedforward neural network signals propagate in one direction from input to output. There is no feedback loop; the network only considers the current input. It cannot remember previous input (like CNN).

The signal of the recurrent neural network propagates in both directions, forming a recurrent network. It considers the current input and previously received inputs to generate the output of the layer, and thanks to its internal memory, it can remember past data.

11. What are the applications of Recurrent Neural Network (RNN)?

RNN can be used for sentiment analysis, text mining, etc., and can solve time series problems, such as predicting stock prices for a month or quarter.

12. What are Softmax and ReLU functions?

Softmax is an activation function that produces an output between 0 and 1. It divides each output by the sum of all outputs such that the sum of the outputs equals 1. Softmax is usually used in the calculation of the output layer and attention mechanism of classification tasks.

ReLU is the most widely used activation function. If X is positive, output X, otherwise zero. ReLU is often used as the activation function of the hidden layer.

13. What are hyperparameters?

This is another deep learning interview question that is often asked. Hyperparameters In the context of machine learning, hyperparameters are parameters whose values ​​are set before starting the learning process, rather than the parameter data obtained through training. Because in general, we call the variables that can be automatically learned through data iteration according to the model's own algorithm as parameters, and the setting of hyperparameters can affect how these parameters are trained, so they are called hyperparameters.

14. What happens if the learning rate is set too low or too high?

When the learning rate is too low, the training of the model will progress very slowly as only minimal updates are made to the weights. It takes multiple updates to reach the minimum. If it is very small, the final gradient may not jump out of the local minimum, resulting in a training result that is not the optimal solution.

If the learning rate is set too high, this will lead to an undesired divergent behavior of the loss function due to the sharp updates of the weights. It may cause the model to fail to converge, or even diverge (the network cannot be trained).

15. What are Dropout and BN?

Dropout is a technique that randomly deletes hidden and visible units in the network, which can prevent data overfitting (usually delete nodes within 20%). It increases the number of iterations required to converge the network.

BN is a technique to improve the performance and stability of neural networks by normalizing the input of each layer to a normal distribution with an average of 0 and a standard deviation of 1.

16. What is the difference between batch gradient descent and stochastic gradient descent?

17. What is overfitting and underfitting, and how to solve it?

Overfitting means that the model performs well on the training set, but it is poor in the verification and testing stages, that is, the generalization ability of the model is poor. Overfitting occurs when the model learns about details and noise in the training data to the extent that it adversely affects the model's performance on the new information. It is more likely to occur in nonlinear models that have more flexibility in learning the objective function. The number of samples is too small, the sample noise is too large, and the model complexity is too high will cause overfitting.

Underfitting is when a model performs poorly on the training, validation, and test sets. This usually happens when the data to train the model is low and incorrect.

To prevent overfitting and underfitting, you can resample the data to estimate the accuracy of the model (k-fold cross-validation) and evaluate the model on a validation dataset.

18. How to initialize the weights in the network?

In general, random initialization weights are used.

You cannot initialize all weights to 0 as this will make your model resemble a linear model. All neurons and every layer perform the same operation, giving the same output, making deep networks useless.

Randomly initialize all weights Randomly assign weights by initializing them to values ​​very close to 0. Since each neuron performs different calculations, it allows the model to have better accuracy.

19. What are the common layers in CNN?

  • Convolutional Layers - Layers that perform convolutional operations, creating several smaller image windows to browse through the data.
  • Activation layer - it brings non-linearity to the network like RELU converts all negative pixels to zero. The output is a rectified feature map.
  • Pooling Layer - Pooling is a downsampling operation that reduces the dimensionality of the feature map.
  • Fully Connected Layer - The value of the source category or regression of this layer.

20. What is the "pooling" of CNN? How does it work?

Pooling is used to reduce the spatial dimension of CNN. It performs a downsampling operation to reduce dimensionality and creates a pooled feature map by sliding a filter matrix over the input matrix.

21. How does LSTM work?

Long-short-term memory (LSTM) is a special type of recurrent neural network capable of learning long-term dependencies. An LSTM network has three steps:

  • The network decides what to forget and what to remember.
  • It optionally updates cell state values.
  • The network decides which part of the current state can be output.

22. What are gradient disappearance and gradient explosion?

When training an RNN, your slope can become too small or too large; this makes training very difficult. When the slope is too small, this problem is called "vanishing gradient". When the slope tends to grow exponentially instead of decaying, it is called an "exploding gradient". The gradient problem leads to long training time, poor performance, and low accuracy.

23. What is the difference between Epoch, Batch and Iteration in deep learning?

Epoch - represents one iteration of the entire dataset (everything of the training data).

Batch - refers to the fact that the entire data set cannot be passed to the neural network at one time, so we divide the data set into several batches for processing, and each batch is called Batch.

Iteration - if we have 10,000 images as data, Batch size is 200. Then an Epoch should run 50 Iterations (10,000 divided by 50).

24. What is the meaning of tensor in the deep learning framework?

This is another most frequently asked deep learning interview question. Tensors are mathematical objects represented by high-dimensional arrays. These data arrays with different dimensions and ranks used as input to neural networks are called "tensors".

25. More commonly used deep learning frameworks such as Tensorflow, Pytorch

It can be roughly said, for example: these frameworks provide c++ and Python api, both support CPU and GPU computing devices. If you are familiar with it, you can say that. For example, Pytorch is commonly used, but because some implementations are Tensorflow, you need to read the code, so you also know a little about Tensorflow. Don’t say who is good and who is bad, because it is easy to fall into the trap. In case you say that Tensorflow is good, What should the interview company do with Pytorch?

https://avoid.overfit.cn/post/35ba0e271a734fa3ba67271d90b12c3f

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/126867628