CHANG machine learning notes -9: Tips For Training DNN

We train a neural network must first do a test on the Training Data, if the effect on the Training Data is very good, and then take the test Testing Data. This is why? And so on down to explain

Here is the general process of training a neural network
Here Insert Picture Description
sometimes Test Data Test results are not too bad fitting results

Here Insert Picture Description
This is because your neural network was bad in the result Train set, and on the map the neural network training results in 56 layers of 20 layers is no good, so naturally the difference in the results of Test set than 20 layers.

If bad in Training Data neural network performance, how to do?
One is to get a new Activation function, one is to update our Lr

Activation function

Often there will be some problems, so we need to replace the Activation function, such as gradient disappearing (vanishing gradient), the gradient disappears problem is that when we deep occur when the neural network problems

Here Insert Picture Description
After several layers of gradient is large, so after a few times before the layers parameter update, the latter parameter has reached convergence (possibly local optimum, because the first layers of the parameters and parameter initialization almost unanimously), so this is not initialized parameters of the problem

How to do that?
Here Insert Picture Description
When we add Activation function can be reduced deep gradient is the sigmoid FIG.

The sigmoid replaced ReLU problem can be solved gradient disappears

Here Insert Picture Description
By the RELU, becomes linear, the gradient becomes smaller, is output. However, only a portion of the layer is linear, all useless linear

Here Insert Picture Description
Here Insert Picture Description

ReLU There are many variations, you can change the part less than zero, into linear and nonlinear can Here Insert Picture Description
Maxout NN-- make their own neural network to learn Activation function

Parameters The following figure on the left if learning is so that you can get ReLU, of course, can get other activation function (such as the right), have different functions, depending on your parameters
Here Insert Picture Description

Lr

对于Lr,adagrad用了一次微分估计二次微分的方法(需要在二次微分相对固定,但通常二次微分不是那么固定)
Here Insert Picture Description
可能在同一个方向需要不同的Lr。可以运用RMSProp
Here Insert Picture Description
其中
Here Insert Picture Description

在训练的时候可能会出现这样的情况,导致还没到达最优就已经停止了

Here Insert Picture Description
于是我们可以加入动量,红色代表梯度,绿色代表动量,蓝色代表实际的方向,可以发现蓝色可以有效地越过上图梯度可能为0但却不是左右的地方,只要考虑前一次,就可以吧前面所有的Movement考虑进来
Here Insert Picture DescriptionHere Insert Picture Description
如何让神经网络在Test set上有好的表现,减少overfitting的情况?
方法有3:Early stopping,Regularzation,Dropout

Early stopping

随着训练次数的增加,我们希望停在测试数据误差增大的那个地方,但是我们不知道在哪里,所以,要边训练边拿Trian Data的一部分数据(Validation set)来做测试

Here Insert Picture Description

Regularization

正则化分为L1,L2

Dropout

Dropout: Before every training will lose some neurons (each neuron assume that the probability of being lost are p%), so that the neural network becomes more thin, but when the test does not sleep discard these neurons, in the case of a small number of neurons can do to increase the neurons can do better, such as athletes plus the burden during practice, normal game no such burden may be able to play better than the level of training time .
Here Insert Picture Description
But the test, weight need to take on (1-p%), because there before discarding neurons, resulting in z and z 'unequal
Here Insert Picture Description
Why Dropout do the same, take a look at Ensemble
training more on a training set neural networks, then the test data are averaged after each test the neural network
Here Insert Picture Description
Here Insert Picture Description
when the action of Activation function is linear below when this situation occurs, and the average value of y is substantially equal, so do the same
Here Insert Picture Description
over

Published 16 original articles · won praise 0 · Views 953

Guess you like

Origin blog.csdn.net/qq_44157281/article/details/98041392