Why can not the neural network weights initialized to zero (1)

EDITORIAL: The content of the article and the relevant code (the code at the end) to knock out, all I personally hand, correlation analysis concluded it took quite a long time to do it, For reprint this article, please be sure to contact I, in the background you can leave a message.

In depth study, the right to re-initialize neural network approach is very important, it has a greater impact on the convergence speed and performance of the model. A good initial value of weight has the following advantages:

  • Fast convergence gradient descent
  • The depth of the neural network model is not easy to disappear into the explosion gradient or gradient

The series of two articles, we focused on two of the following topics:

  1. Why linear regression and logistic regression can be used to initialize 0, while in the neural network can not be used (in fact, not only is initialized to zero, the same weights initialized to any value, are likely to fail to make the model);
  2. Three common weights initialization method: random initialization, Xavier initialization, He initialization

In this article, we mainly talk about the first topic


 

0 initialization

Linear regression and logistic regression, we usually put the weights w and the bias term b is initialized to 0, and our model can achieve better results. Linear regression and logistic regression, we use the following code weight (lower tensorflow frame) are initialized to 0:

1

2

= tf.Variable([[0,0,0]],dtype=tf.float32,name='weights')

= tf.Variable(0,dtype=tf.float32,name='bias')

However, when the weights in neural networks use all initialized to zero, the model does not work properly.

The reason is: because of the presence of hidden layer in the neural network. We assume that the input to the model [x1, x2, x3], is a hidden layers, the number of hidden layer units 2, the output is y, the model as shown below:

After passing through the forward propagation calculations, we obtain:

z1 = w10 * x0 + w11 * x1 + w12 * x2 +w13 * x3

z2 = w20 * x0 + w21 * x1 + w22 * x2 +w23 * x3

All weights w and the bias value B (can be seen as W10) is initialized at 0, i.e., after the calculated z1 and z2 are equal to 0

Since then a1 = g (z1), a2 = g (z2), obtained after activation function certainly a1 and a2 are the same number, i.e., a1 = a2 = g (z1)

Level output: y = g (w20 * a0 + w21 * a1 + w22 * a2) is a fixed value.

Key: In the back-propagation process, we use a gradient descent function to reduce losses, but in the process of updating the weight value, the partial derivative of the cost function value of the same parameter different weights, i.e., the same [Delta] w, so backpropagation when the update parameters:

w21 = 0 + Δw

w22 = 0 + Δw

In fact, the same parameters of different nodes so that after the update, empathy argument after the other updates can also get is the same, how many rounds of forward propagation and reverse propagation carried out regardless of the resulting parameters are the same! Therefore, the neural network will lose the ability to learn its characteristics.


 

0 initialized use effect in a neural network

Let's look at what kind of situation will be initialized to zero:

We use MNIST handwritten digital data set to test: handwritten data set is one of the most image processing and machine learning research data sets, played an important role in the development of learning in depth.

We look at the results using the weights of the neural network training 0 initialization and test the data set:

 

  • 100 iterations, each iteration loss values ​​have not changed
  • The detection accuracy of the model was 11.35 percent, almost no detectable

To summarize: in a neural network, if the weights are initialized to 0, or other uniform constants, activation can lead to the back of the unit have the same value, the same as all of the units means that they are calculated the same features, with the network becomes only one hidden layer nodes, which makes the neural network lost the ability to learn different features!

 


 

If this article to help you, or if you have any questions, please scan code concern micro-channel public number: AI moment can leave a message in the background, let us learn together and progress together!

 


 

 

code show as below:

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

# -*- coding: utf-8 -*-

"""

Created on Wed May  8 08:25:40 2019

 

@author: Li Kangyu

"""

 

import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

import time

# 数据集下载地址:http://yann.lecun.com/exdb/mnist/

 

 

MINIBATCH_SIZE = 100

NUM_HD = 100

data = read_data_sets('MNIST_DATA',one_hot=True)

 

= tf.placeholder(tf.float32,[None,784])

y_true = tf.placeholder(tf.float32,[None,10])

 

def nn_model(x):

    hidden_layer = {

                    'w':tf.Variable(tf.zeros([784,NUM_HD])),

                    'b':tf.Variable(tf.zeros([NUM_HD]))

                   }

    output_layer = {

                    'w':tf.Variable(tf.zeros([NUM_HD,10])),

                    'b':tf.Variable(tf.zeros([10]))

                   }

     

    z1 = tf.matmul(x,hidden_layer['w']) + hidden_layer['b']

    a1 = tf.nn.relu(z1)

     

    output = tf.matmul(a1,output_layer['w']) + output_layer['b']

     

    return output

 

def train_nn(x):

    y_pred = nn_model(x)

     

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_pred,labels=y_true))

    optimizer = tf.train.AdamOptimizer().minimize(cost)

     

    correct_mask = tf.equal(tf.argmax(y_pred,1),tf.argmax(y_true,1))

    accuracy = tf.reduce_mean(tf.cast(correct_mask,tf.float32))

     

    NUM_STEPS = 100

     

    with tf.Session() as sess:

        sess.run(tf.initialize_all_variables())

        for epoch in range(NUM_STEPS):

            epoch_loss = 0

            num_minibatch = int(data.train.num_examples/MINIBATCH_SIZE)

            for in range(num_minibatch):

                 

                batch_xs,batch_ys = data.train.next_batch(MINIBATCH_SIZE)

                _,loss = sess.run([optimizer,cost],feed_dict={x:batch_xs,y_true:batch_ys})

                epoch_loss += loss / num_minibatch

            if epoch % 10 ==0:

                print("Epoch = ",epoch,"loss = ",epoch_loss)

             

            ans = sess.run(accuracy,feed_dict={x:data.test.images,

                                                   y_true:data.test.labels})

     

    print("Accuracy:{:.4}%".format(ans*100))

 

train_nn(x)

Published 74 original articles · won praise 337 · Views 1.3 million +

Guess you like

Origin blog.csdn.net/kebu12345678/article/details/103084851