Neural network parameter initialization method

After reading the article "Understanding the difficulty of training deep feedforward neural networks", two methods of parameter initialization are proposed:

And the normalized initialization - xavier method:


I recently did an experiment and found that the initialization of the network is too important! In fact, the neural network itself is a dark box, but how to set and adjust each parameter is the most technical.

Several possible initialization methods:

1. pre-training+fine-tuning

For example: first use greedy layerwise auto-encoder to do unsupervised pre-training, and then do fine-tuning.

① pretraining: Take out each layer in the neural network and construct an auto-encoder for training, so that the input layer and the output layer are consistent. During this process, parameters are updated to form initial values

② fine-tuning: Put each layer of the pre-train back into the neural network, and use the initial parameter values ​​and training data obtained in the pre-train stage to adjust the model as a whole. During this process, the parameters are further updated to form the final model.

2. random initialization: np.random.randn(m,n)

The most common method, but it also has drawbacks, once the random distribution is improperly selected, it will get into trouble

3. Xavier initialization:

The basic idea is to ensure that the variance of the input and output is consistent, so that all output values ​​can be prevented from tending to 0. Although the initial derivation is based on linear functions, it also works well for some nonlinear neurons.

tf.Variable(np.random.randn(node_in,node_out))/np.sqrt(node_in)

more suitable for tanh

4. He initialization

Ideal for RELU:

tf.Variable(np.random.randn(node_in,node_out))/np.sqrt(node_in/2)

5. Bengio also proposed a

tf.Variable(np.random.randn(node_in,node_out))/np.sqrt((node_in+node_out)/2)

In fact, the three methods 3, 4, and 5 are all variants of Xavier.

6. BN is actually not an initialization method, but a clever and rude weakening of the impact of bad initialization.


Reference article Google Engineer: Let's talk about the weight initialization of deep learning 

And Andrew Ng's deep learning course

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325806645&siteId=291194637