Video explanation | Why can't all neural network parameters be initialized to all 0

Why can't all neural network parameters be initialized to all 0

Self-explanation, about 6 minutes. If there is wifi, please listen to the self-explanation and watch~ (the traffic is more, please be sure to wifi~)

I recently tried a small experiment and found that when the neural network parameters are all initialized to all 0 (this is the most labor-saving), the effect is not good. Later, I consulted ybb and checked some information online, and recorded my notes and summary , Welcome everyone to point out the wrong communication~

Still the original writing style, through a specific example, in the process of explaining the example, the knowledge points to be explained in a popular way. See below!

Suppose the neural network we need to initialize now is as follows:

Video explanation | Why can't all neural network parameters be initialized to all 0

We initialize the weights
Video explanation | Why can't all neural network parameters be initialized to all 0
Video explanation | Why can't all neural network parameters be initialized to all 0Video explanation | Why can't all neural network parameters be initialized to all 0
where W1 represents the weight matrix from the input layer to the hidden layer, and W2 represents the weight matrix from the hidden layer to the output layer.
Assuming that the input of the network is [x1,x2,x3], and then through the forward propagation of the network, we can get:
Video explanation | Why can't all neural network parameters be initialized to all 0
Video explanation | Why can't all neural network parameters be initialized to all 0
because
we can know:
Video explanation | Why can't all neural network parameters be initialized to all 0
from the above we can know that the value of the hidden layer is the same at this time, and then go through the activation function f Afterwards, the obtained a4 and a5 are still the same, as follows:
Video explanation | Why can't all neural network parameters be initialized to all 0

The final output of the network is:
Video explanation | Why can't all neural network parameters be initialized to all 0
At this time, assuming our true output is y, the mean square error loss function can be expressed as:
Video explanation | Why can't all neural network parameters be initialized to all 0
Here, it is time for our great BP backpropagation algorithm to come out! We need to update the weight in the reverse direction, which makes the predicted output value get closer and closer to the true value.
It is assumed here that our readers already know the process of BP backpropagation. You can refer to the popular understanding of neural network BP backpropagation algorithm
to know that after backpropagation, the gradient changes of nodes 4 and 5 are the same, assuming both , Then the parameters between node 4 and node 6 and the parameters between node 5 and node 6 are changed as follows:
Video explanation | Why can't all neural network parameters be initialized to all 0
From the above formula, it can be seen that the new parameters are the same! ! ! !
In the same way, it can be concluded that the parameter updates between the input layer and the hidden layer are the same, and the parameters after the update
Video explanation | Why can't all neural network parameters be initialized to all 0
are the same! Then no matter how many rounds of forward propagation and back propagation are performed, the parameters between each two layers are the same.
In other words, we originally wanted different nodes to learn different parameters, but because the parameters are the same and the output values ​​are the same, different nodes cannot learn different features at all! In this way, the meaning of network learning features is lost.
There are multiple nodes between the hidden layer and other layers, which is actually just one node! ! As shown in the figure below:
Video explanation | Why can't all neural network parameters be initialized to all 0

Summarizing it like this: w is initialized to all 0, which may directly cause the model to fail and fail to converge.
Therefore, it can be solved by initializing w to a random value (in cnn, the randomization of w is also to make multiple filters of the same layer, the initial w is different, you can learn different characteristics, if they are all 0 or a certain value, because The same calculation method may not achieve the purpose of learning different features)

Recommended reading articles:

A popular understanding of neural network BP back-propagation algorithm
Hidden Markov model-basic model and three basic questions to show
you the naive Bayes classification algorithm

全是通俗易懂的硬货!只需置顶~欢迎关注交流~

Video explanation | Why can't all neural network parameters be initialized to all 0

Guess you like

Origin blog.51cto.com/15009309/2553595