Finally understood the tf.reduce_sum() function and tf.reduce_mean() function

Reference blog:

1.https://www.zhihu.com/question/51325408/answer/125426642

2.https://www.w3cschool.cn/tensorflow_python/tensorflow_python-5y4d2i2n.html

3.https://blog.csdn.net/dcrmg/article/details/79797826

When I was learning to build a neural network, I copied other people's code, and there was one line of code that I couldn't figure out, which is the following line.

loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction), reduction_indices=[1]))

At the beginning, I followed the code written by the up master like this:

loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction)))

Then this result appeared:

709758.1
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan

What a waste of time, the first thing I thought of as a newbie was to find a girl, but none of the methods I found worked. Then I started checking functions, and finally found the reason. The problem lies in the reduce_sum() function, hahaha, and then Xiaobai I started to look for blogs to learn reduce_sum() and also learn reduce_mean(). After reading several articles, I was still confused. Why use reduction_indices=[1] instead of reduction_indices=[0]? Or not use it at all. It’s so wasteful. After a lot of effort, I finally figured it out, so I quickly recorded it! !
-------------------------------------------------- ---------------------Dividing line--------------------------- -------------------------------------------------- -----
1.tf.reduce_mean function is used to calculate the average value of the tensor along the specified axis (a certain dimension of the tensor). It is mainly used for dimensionality reduction or to calculate the average value of the tensor (image).

reduce_mean(input_tensor,
                axis=None,
                keep_dims=False,
                name=None,
                reduction_indices=None)
  • The first parameter input_tensor: the input tensor to be reduced;
  • The second parameter axis: the specified axis. If not specified, the mean of all elements is calculated;
  • The third parameter keep_dims: whether to reduce the dimension, set to True, the output result will maintain the shape of the input tensor, set to False, the output result will reduce the dimension;
  • The fourth parameter name: the name of the operation;
  • The fifth parameter reduction_indices: used to specify the axis in previous versions, has been deprecated;
    2.tf.reduce_sum function calculates the sum of elements in each dimension of a tensor, generally only need to set two parameters
reduce_sum ( 
    input_tensor , 
    axis = None , 
    keep_dims = False , 
    name = None , 
    reduction_indices = None
 )
  • The first parameter input_tensor: input tensor
  • The second parameter reduction_indices: specifies along which dimension the sum of elements is calculated

The most difficult thing is the dimensionality issue. Anyway, I read several blogs but I don’t understand it very well. In the end, I decided to give examples based on my own understanding.

  • reduce_sum()
tf.reduce_sum
matrix1 = [[1.,2.,3.],            #二维,元素为列表
          [4.,5.,6.]]
matrix2 = [[[1.,2.],[3.,4.]],      #三维,元素为矩阵
           [[5.,6.],[7.,8.]]]

res_2 = tf.reduce_sum(matrix1)
res_3 = tf.reduce_sum(matrix2)
res1_2 = tf.reduce_sum(matrix1,reduction_indices=[0])
res1_3 = tf.reduce_sum(matrix2,reduction_indices=[0])
res2_2 = tf.reduce_sum(matrix1,reduction_indices=[1])
res2_3 = tf.reduce_sum(matrix2,reduction_indices=[1])

sess = tf.Session()
print("reduction_indices=None:res_2={},res_3={}".format(sess.run(res_2),sess.run(res_3)))
print("reduction_indices=[0]:res1_2={},res1_3={}".format(sess.run(res1_2),sess.run(res1_3)))
print("reduction_indices=[1]:res2_2={},res2_3={}".format(sess.run(res2_2),sess.run(res2_3)))

The result is as follows:

axis=None:res_2=21.0,res_3=36.0
axis=[0]:res1_2=[5. 7. 9.],res1_3=[[ 6.  8.]
                                    [10. 12.]]
axis=[1]:res2_2=[ 6. 15.],res2_3=[[ 4.  6.]
                                   [12. 14.]]
  • tf.reduce_mean
    only needs to replace the reduce_sum part of the above code with renduce_mean.
res_2 = tf.reduce_mean(matrix1)
res_3 = tf.reduce_mean(matrix2)
res1_2 = tf.reduce_mean(matrix1,axis=[0])
res1_3 = tf.reduce_mean(matrix2,axis=[0])
res2_2 = tf.reduce_mean(matrix1,axis=[1])
res2_3 = tf.reduce_mean(matrix2,axis=[1])

The result is as follows:

axis=None:res_2=3.5,res_3=4.5
axis=[0]:res1_2=[2.5 3.5 4.5],res1_3=[[3. 4.]
                                       [5. 6.]]
axis=[1]:res2_2=[2. 5.],res2_3=[[2. 3.]
                                 [6. 7.]]

It can be seen that reduction_indices and axis actually represent dimensions. When it is None, reduce_sum and reduce_mean operate on all elements. When it is [0], it actually operates by rows. When it is [1], it operates by columns . Operation , for the three-dimensional case, treat the innermost bracket as a number, so that it can be replaced by the two-dimensional case. The final result is reduced by one dimension on the original basis. The following is explained in a professional way:

For a multi-dimensional array, the axis of the element in the outermost bracket is 0, and then the axis increases by 1 for each lower bracket, until the last element is a single number.

As in the above example, matrix1 = [[1., 2., 3.], [4., 5., 6.]]:

When axis=0, the included elements are: [1., 2., 3.], [4., 5., 6.] When axis=1, the included
elements are: 1., 2., 3 ., 4., 5., 6.
So when reduction_indices/axis=[0], the elements on axis=0 should be operated, so the result obtained by reduce_sum() is [5. 7. 9.], that is, the two Add the corresponding elements of the arrays; when reduction_indices/axis=[1], the operation should be performed on the elements on axis=1, so the result obtained by reduce_sum() is [6. 15.], that is, the elements in each array are added. add. The same applies to reduce_mean().

It is not difficult to see that the same idea applies to the three-dimensional case, such as matrix2 = [[[1,2],[3,4]], [[5,6],[7,8]]] in the above example:

When axis=0, the included elements are: [[1., 2.],[3., 4.]], [[5., 6.],[7., 8.]] When axis=
1 , the included elements are: [1., 2.], [3., 4.], [5., 6.], [7., 8.] When axis=2, the included elements are
: 1., 2., 3., 4., 5., 6., 7., 8.
When reduction_indices/axis=[0], the result obtained by reduce_sum() should be [[ 6. 8.], [10 . 12.]], that is, adding the corresponding position elements of the two matrices; when reduction_indices/axis=[1], the result obtained by reduce_sum() should be [[ 4. 6.], [12. 14.]], That is, add the corresponding elements of the array. The same applies to reduce_mean().

In one sentence, which dimension is being operated on, the outer brackets will be removed after the calculation, which is equivalent to dimensionality reduction.

So the question is, what happens when reduction_indices/axis=[2]? ? ?

  • For the two-dimensional case, of course an error is reported, because the maximum axis is 1
ValueError: Invalid reduction dimension 2 for input with 2 dimensions. for 'Sum_4' (op: 'Sum') with input shapes: [2,3], [1] and with computed input tensors: input[1] = <2>.
  • For the three-dimensional case, the result obtained by reduce_sum() is: [[ 3. 7.], [11. 15.]], that is, the elements in the innermost brackets are summed.

-------------------------------------------------- ---------------------Dividing line--------------------------- -------------------------------------------------- -----

Back to my original question, why is the loss not Nan only if the parameter reduction_indices=[1] is set?

loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),reduction_indices=[1]))
This program builds a 3-layer neural network. The input layer has only 1 neuron and 100 input data. The sample point is a column vector with a shape of (100,1). The hidden layer has 10 neurons and the output layer also has only 1 neuron. Therefore, the shape of the final output data is also a column vector of (100,1). Then The parameter of reduce_sum is a two-dimensional array.

  • If reduction_indices=[0], the final result is an array with only one element, that is, [n]
  • If reduction_indices=[1], the final result is an array with 100 elements, that is, [n1,n2…n100]
  • If reduction_indices=None, the final result is a number

Then when using reduce_mean() to calculate the average, the desired result is sum/100. At this time, only reduce_sum() passes in the parameter reduction_indices=[1] to achieve the desired effect.

Perfect solution! ! !

Guess you like

Origin blog.csdn.net/weixin_42149550/article/details/98759006