Machine Learning Notes - Deep Learning FAQ 3

1. What is a loss function?

        A loss function, also known as "cost" or "error," is a measure of how well a model is performing. It is used to calculate the error of the output layer during backpropagation. We push this error backwards through a neural network and use it in a different training function.

2. What is gradient descent?

        Gradient descent is the best algorithm to minimize the cost function or minimize the error. The goal is to find the local-global minimum of the function. This determines the direction the model should take to reduce error.

 3. What is your understanding of backpropagation?

        Backpropagate the error and update the weights to reduce the error.

 4. What is the difference between a feedforward neural network and a recurrent neural network?

        Feedforward neural network signals travel in one direction from input to output. There is no feedback loop; the network only considers the current input. It cannot remember previous inputs (eg, CNN).

        The signal of the recurrent neural network propagates in both directions, forming a recurrent network. It considers the current input and previously received input to generate the output of the layer, and thanks to its internal memory, it can remember past data.

5. What are the applications of Recurrent Neural Network (RNN)?

        RNNs can be used for sentiment analysis, text mining, and image captioning. Recurrent neural networks can also solve time series problems, such as predicting stock prices over a month or quarter.

6. What are Softmax and ReLU functions?

        Softmax is an activation function that produces an output between 0 and 1. It divides each output so that the sum of the outputs equals 1. Softmax is usually used for the output layer.

         ReLU (or Rectified Linear Unit) is the most widely used activation function. If X is positive, output X, otherwise zero. ReLU is often used for hidden layers.

 7. What are hyperparameters?

        Hyperparameters are parameters whose values ​​are set before the learning process begins. It determines how the network is trained and the structure of the network (e.g. number of hidden units, learning rate, epochs, etc.).

8. What happens if the learning rate is set too low or too high?

        When your learning rate is too low, the training of the model will progress very slowly because we update the weights very rarely. Several updates are required before reaching the lowest point.

        If the learning rate is set too high, this can lead to undesirable divergent behavior of the loss function due to sharp weight updates. It may not converge (the model can give good output) or even diverge (the data is too messy for the network to train).

 9. What is Dropout and Batch Normalization?

        Dropout is a technique that randomly drops the hidden and visible units of the network to prevent overfitting the data (typically 20% of nodes are dropped). It doubles the number of iterations required to converge the network.

         Batch normalization is a technique that improves neural network performance and stability by normalizing the inputs to each layer so that their average output activation is zero and the standard deviation is 1.

10. What is the difference between batch gradient descent and stochastic gradient descent?

        Batch Gradient Descent: Batch Gradient computes the gradient using the entire dataset. Convergence takes time because of the large amount of data and slow weight updates.

        Stochastic Gradient Descent: Stochastic Gradient computes the gradient using a single sample. It converges much faster than batch gradient because it updates the weights more frequently.

11. How are the weights in the network initialized?

        Initialize all weights to 0: this makes your model similar to a linear model. All neurons and each layer perform the same operation, providing the same output and rendering deep networks useless.

        Randomly initialize all weights: Here, weights are randomly assigned by initializing them very close to 0. Since each neuron performs different computations, it gives the model better accuracy. This is the most common method.

12. What are the different layers on a CNN?

        Convolutional layers - layers that perform convolution operations, creating several smaller picture windows to traverse the data.

        ReLU layer - it brings nonlinearity to the network and converts all negative pixels to zero. The output is a corrected feature map.

        Pooling layers - Pooling is a downsampling operation that reduces the dimensionality of feature maps.

        Fully Connected Layer - This layer identifies and classifies objects in an image.

13. What is Pooling on CNN and how does it work?

        Pooling is used to reduce the spatial dimension of the CNN. It performs a downsampling operation to reduce dimensionality and creates a pooled feature map by sliding the filter matrix over the input matrix.

14. How does the LSTM network work? 

        Long Short-Term Memory (LSTM) is a specialized recurrent neural network capable of learning long-term dependencies, with long-term memory information as its default behavior. The LSTM network is divided into three steps:

  • Step 1: The network decides what to forget and what to remember.
  • Step 2: It optionally updates cell state values.
  • Step 3: The network decides which part of the current state to output.

15. What are vanishing gradients and exploding gradients? 

        When training an RNN, the slope can become too small or too large; this makes training difficult. When the slope is too small, the problem is called "vanishing gradient". When the slope tends to grow exponentially rather than decay, it is called an "exploding gradient". The gradient problem leads to long training time, poor performance, and low accuracy.

 16. What is the difference between Epoch, Batch and Iteration in deep learning?

  • Epoch - Represents one iteration over the entire dataset (everything you put into the trained model).
  • Batch - Refers to when we cannot pass the entire dataset to the neural network at once, we divide the dataset into several batches.
  • Iteration - If we have 10,000 images as data and batch size is 200. An epoch should then run 50 iterations (10,000 divided by 50).

Guess you like

Origin blog.csdn.net/bashendixie5/article/details/123507861