After reading the article "Understanding the difficulty of training deep feedforward neural networks", two methods of parameter initialization are proposed:
And the normalized initialization - xavier method:
I recently did an experiment and found that the initialization of the network is too important! In fact, the neural network itself is a dark box, but how to set and adjust each parameter is the most technical.
Several possible initialization methods:
1. pre-training+fine-tuning
For example: first use greedy layerwise auto-encoder to do unsupervised pre-training, and then do fine-tuning.
① pretraining: Take out each layer in the neural network and construct an auto-encoder for training, so that the input layer and the output layer are consistent. During this process, parameters are updated to form initial values
② fine-tuning: Put each layer of the pre-train back into the neural network, and use the initial parameter values and training data obtained in the pre-train stage to adjust the model as a whole. During this process, the parameters are further updated to form the final model.
2. random initialization: np.random.randn(m,n)
The most common method, but it also has drawbacks, once the random distribution is improperly selected, it will get into trouble
3. Xavier initialization:
The basic idea is to ensure that the variance of the input and output is consistent, so that all output values can be prevented from tending to 0. Although the initial derivation is based on linear functions, it also works well for some nonlinear neurons.
tf.Variable(np.random.randn(node_in,node_out))/np.sqrt(node_in)
more suitable for tanh
4. He initialization
Ideal for RELU:
tf.Variable(np.random.randn(node_in,node_out))/np.sqrt(node_in/2)
5. Bengio also proposed a
tf.Variable(np.random.randn(node_in,node_out))/np.sqrt((node_in+node_out)/2)
In fact, the three methods 3, 4, and 5 are all variants of Xavier.
6. BN is actually not an initialization method, but a clever and rude weakening of the impact of bad initialization.
Reference article Google Engineer: Let's talk about the weight initialization of deep learning
And Andrew Ng's deep learning course