Numpy: understanding of Axis

Axis is the array level
Set axis=i, then Numpy operates along the direction of the i-th subscript change
Axis application

Axis is the array level

To understand axis, we must first understand the two concepts of "the dimension of arrays in Numpy" and "the dimension of matrices in linear algebra" and the relationship between them. In the concept of mathematics or physics, dimensions are considered to be the minimum number of coordinates required to represent a point in space, but in Numpy, dimensions refer to the dimensions of the array. For example, the following example:

>>> import numpy as np
>>> a = np.array([[1,2,3],[2,3,4],[3,4,9]])
>>> a
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 9]])

The dimension of this array is only 2, that is, there are two axis axes, axis=0 and axis=1. As shown in the figure below, the 0th dimension (axis=0) of the two-dimensional array has three elements (left picture), that is, the length of the axis=0 axis is 3; the first dimension (axis=1) also has three elements ( Right), that is, the length of axis=1 is 3. It is precisely because the length of axis=0 and axis=1 are 3, and the matrix has 3 numbers horizontally and vertically, the matrix is 3-dimensional in linear algebra (rank rank is 3).

Therefore, axis is the array level.

When axis=0, there are 3 elements on the axis (the size of the array is 3)

a[0]、a[1]、a[2]

When axis=1, there are 3 elements on the axis (the size of the array is 3)

a[0][0]、a[0][1]、a[0][2]

(Or a[1][0], a[1][1], a[1][2])

(Or a[2][0], a[2][1], a[2][2])

Another example is the following array whose shape is (3,2,4):

>>> b = np.array([[[1,2,3,4],[1,3,4,5]],[[2,4,7,5],[8,4,3,5]],[[2,5,7,3],[1,5,3,7]]])
>>> b
array([[[1, 2, 3, 4],
        [1, 3, 4, 5]],

       [[2, 4, 7, 5],
        [8, 4, 3, 5]],

       [[2, 5, 7, 3],
        [1, 5, 3, 7]]])
>>> b.shape
(3, 2, 4)

This shape (represented by tuple) can be understood as the size on each axis, which is the length occupied. For a better understanding, we can temporarily imagine multiple axes as multiple layers. axis=0 means the first layer (the black box in the figure below), the size of the array of this layer is 3, and the element on the corresponding axis length = 3; axis=1 means the second layer (the red box in the figure below), the array of this layer The size of is 2, corresponding to the element length = 2 on the axis; axis=2 means the third layer (the blue box in the figure below), and the element length = 4 on the corresponding axis.

Set axis=i, then Numpy operates along the direction of the i-th subscript change

1. Two-dimensional array example:

For example np.sum(a, axis=1), combined with the following array, a[0][0]=1, a[0][1]=2, a[0][2]=3, the direction in which the subscript will change is the first dimension of the array.

We go to the direction where the subscript will change, and add the elements to get the final result:

[
  [6],
  [9],
  [16]
]

2. Three-dimensional array example:

To give another example, for example, the following np.shape(a)=(3,2,4)3-dimensional array, the length of the 0th dimension of the array is 3 (black box), and the length of the first dimension is 2 (red box), and the length of the first dimension is 2 (red box). The length of the 2 dimension is 4 (blue box).

If we want to calculate np.sum(a, axis=1), in the first black box,

The changing direction of the subscript is as follows:

So, we need to add up and down the two red boxes

According to the same logic to process the second and third black boxes, the final result can be obtained:

Therefore, it is still the sentence we summarized earlier, set axis=i, then Numpy will operate along the direction of the i-th subscript change.

3. Four-dimensional array example:

For example, the following huge and complex 4-dimensional array,

>>> data = np.random.randint(0, 5, [4,3,2,3])
>>> data
array([[[[4, 1, 0],
         [4, 3, 0]],
        [[1, 2, 4],
         [2, 2, 3]],
        [[4, 3, 3],
         [4, 2, 3]]],

       [[[4, 0, 1],
         [1, 1, 1]],
        [[0, 1, 0],
         [0, 4, 1]],
        [[1, 3, 0],
         [0, 3, 0]]],

       [[[3, 3, 4],
         [0, 1, 0]],
        [[1, 2, 3],
         [4, 0, 4]],
        [[1, 4, 1],
         [1, 3, 2]]],

       [[[0, 1, 1],
         [2, 4, 3]],
        [[4, 1, 4],
         [1, 4, 1]],
        [[0, 1, 0],
         [2, 4, 3]]]])

When axis=0, numpy sums along the 0th dimension, that is, the value of the first element=a0000+a1000+a2000+a3000=11, and the second element=a0001+a1001+a2001+a3001=5 In the same way, the final result is as follows:

>>> data.sum(axis=0)
array([[[11,  5,  6],
        [ 7,  9,  4]],

       [[ 6,  6, 11],
        [ 7, 10,  9]],

       [[ 6, 11,  4],
        [ 7, 12,  8]]])

When axis=3, numpy sums along the third dimension, that is, the value of the first element=a0000+a0001+a0002=5, and the second element=a0010+a0011+a0012=7, the same goes for The final result is as follows:

>>> data.sum(axis=3)
array([[[ 5,  7],
        [ 7,  7],
        [10,  9]],

       [[ 5,  3],
        [ 1,  5],
        [ 4,  3]],

       [[10,  1],
        [ 6,  8],
        [ 6,  6]],

       [[ 2,  9],
        [ 9,  6],
        [ 1,  9]]])

Axis application

For example, now we have collected the data (total score 10) that four classmates rated their love of apple, durian, and watermelon. Each classmate has three characteristics:

>>> item = np.array([[1,4,8],[2,3,5],[2,5,1],[1,10,7]])
>>> item
array([[1, 4, 8],
       [2, 3, 5],
       [2, 5, 1],
       [1, 10, 7]])

Each row contains three characteristics of the same person. If we want to see which classmate likes to eat fruit, we can use:

>>> item.sum(axis = 1)
array([13, 10,  8, 18])

It can probably be seen that classmate 4 likes to eat fruit the most.

If we want to see which fruit is the most popular, we can use:

>>> item.sum(axis = 0)
array([ 6, 22, 21])

It can be seen that durian is basically the most popular.