Some details about face recognition

Convolutional neural network (Keras) to modify the learning rate:

Using the Keras module of tensorflow , we can build our own defined convolutional neural network model, but generally we will not touch the problem of learning rate. Generally, the default learning rate is 0.001.

activation layer

The general steps of dropout, how to use:

Dropout is a method used by neural networks to prevent overfitting. It is very simple, but very practical.

The basic idea is to give up activated neurons with a certain probability to make the model more robust, which is equivalent to giving up some features, which makes the model not overly dependent on certain features, even if these features are real, of course they may be false.

# Our model uses categorical_crossentropy as the loss function, so it is necessary to 
perform one-hot encoding on the # category label according to the number of categories nb_classes to make it vectorized. Here we have only two categories, and the label data becomes two-dimensional after transformation

Deep learning (3) loss function - cross entropy (CrossEntropy)

Simple cross-entropy loss function, do you really understand it?

From the above two diagrams, it can help us have a more intuitive understanding of the cross-entropy loss function. Whether the true sample label y is 0 or 1, L represents the distance between the predicted output and y.

In addition, it is important to mention that from the graph, we can find that the more the difference between the predicted output and y is, the greater the value of L is, that is to say, the greater the "punishment" for the current model, and it increases non-linearly. A level that resembles exponential growth. This is determined by the characteristics of the log function itself. The advantage of this is that the model will tend to make the predicted output closer to the real sample label y.

 Because my project is a binary classification problem, due to the log characteristic of the cross-entropy loss function, the larger the loss function, the more severe the penalty

è¿éåå¾çæè¿°

 The k value of the relu function is 1, so there will be no gradient explosion or gradient disappearance

Gradient Instability of Deep Neural Networks--Gradient Disappearance and Gradient Explosion

Backpropagation - easy to understand

about ssm

 Classification layer (regression function)

Using softmax,

Classification layer of convolutional neural network - softmax and sigmoid

About sgd and momentum optimizer

Introduction to concepts: Why is stochastic steepest descent (SGD) a good approach?

If you look carefully, in fact, SGD needs more steps to converge, after all, it is drunk. However, because it has very low requirements on the derivative, it can contain a lot of noise, as long as the expectation is correct (sometimes the expectation is wrong..), so the derivative is calculated very quickly . Just take the example of machine learning I just mentioned, such as the neural network. When training, only 128 or 256 data points are taken from the million data points each time, and an inaccurate derivative is calculated, and then SGD is used to take a step of. Think about it, in this way, the calculation time is 10,000 times faster each time. Even if it takes several times more, the calculation is quite worthwhile.

Optimizer: SGD > Momentum > AdaGrad > RMSProp > Adam 

Guess you like

Origin blog.csdn.net/weixin_45721305/article/details/123590851