Neural network experimental experience

1. The initialization of weights and biases has a great influence. You can't just initialize directly with the standard normal distribution.

When the amount of data is small, use

weights=tf.Variable(tf.random_normal([input_size,out_size]))*np.sqrt(1/float(input_size))

bias=tf.Variable(tf.random_normal([1,out_size]))*np.sqrt(1/float(input_size))

2. For small data sets, the selection of the optimization function has little effect, and GD can be used directly.
Generally speaking, Adam works best.
3. When the amount of data is small, consider k-fold cross-validation and repeat the calculation several times.
4. Selection of loss function : mean square error (mostly used for regression, classification can also be used): loss=tf.reduce_mean(tf.square(youtput-output))
Cross-entropy (for classification):
5. How to stop learning ?
①Fixed epochs
② Stop training when the loss of the training set drops to a certain range
6. Regularization term : added to the loss function to solve the overfitting problem.
L2 is used more.

7.  During the training process, as the number of training steps increases, the loss appears Nan.

Reasons: ① The learning rate is too large, try to adjust it smaller.

The following is Zhihu Wang Yun's answer:

The most common reason is that the learning rate is too high. For classification problems, a learning rate that is too high will cause the model to "stubbornly" think that some data belongs to the wrong class, and the probability of the correct class is 0 (actually floating-point underflow), so using cross-entropy will calculate infinite loss function. Once this happens, the infinity derivative of the parameter will become NaN, and then the parameters of the entire network will become NaN.

The solution is to reduce the learning rate, or even set the learning rate to 0, and see if the problem persists. If the problem disappears, then it is indeed a learning rate problem. If the problem still exists, it means that the network that has just been initialized has died, and it is likely that there is an error in the implementation.


Author: Wang Yun Maigo
Link: https://www.zhihu.com/question/62441748/answer/232522878
Source: Zhihu

Copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

② Other situations: (quoting Zhihu’s answer)

What function does the loss function use? For classification problems, use categorical cross entropy; for regression problems, there may be a calculation of division by 0, which may be solved by adding a small remainder; the data itself, whether there is Nan, you can use numpy.any(numpy.isnan(x )) Check the input and target; the target itself should be able to be calculated by the loss function, for example, the target of the sigmoid activation function should be greater than 0, and the same data set needs to be checked


Author: Pig Go
Link: https://www.zhihu.com/question/62441748/answer/232520044
Source: Zhihu
The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

7. (Additional at any time...

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325806688&siteId=291194637