Detailed explanation of python's numpy library

        This chapter introduces the use of python's array library - numpy. The numpy array plays a very important role in the learning of tables, especially pandas. Learn numpy well and lay a good foundation for pandas.

Table of contents

1. Create an array

        (1)np.array()

        (2) np.arange()

2. Create a multidimensional array

        (1) Create a two-dimensional array

        (3) Create a multidimensional array

3. Create a special array

        (1)np.ones()

        (2)np.zeros()

        (3)np.full()

        (4)np.eye()

        (5) np.diag()

4. Array templates create arrays

        (1)np.ones_like()

        (2)np.zeros_like()

        (3)np.full_like()

5. Properties of Arrays

6. random library in numpy

        (1) Random number generation

        (2)np.random.choice()

        (3)np.random.shuffle()

        (4)np.random.permutation()

7. Transformation/transposition of array dimensions/shape

        (1)arr.reshape()

        (2)arr.flatten()

        (3) arr.T or arr.transpose() two-dimensional array transpose

8. Array operation

        (1) Addition, subtraction, multiplication and division between arrays and numbers

        (2) Addition, subtraction, multiplication and division between arrays (cases with the same shape)

        (3) Addition, subtraction, multiplication and division between arrays (different shapes but row/column alignment)

9. Data Selection/Data Slicing

        (1) One-dimensional array

        (2) Two-dimensional array

10. Magic Index

        (1) One-dimensional

        (2) two-dimensional

                ① Transaction

                ② Take columns

                ③ Alignment value

11. Filtering/conditional statistics of array elements

        (1) Filter out the values ​​that meet the conditions

        (2) Count the number of eligible numbers

        (3) Multi-condition screening

12. Change the value of an element

        (1) Global changes

        (2) Local changes (secondary slicing)

        (3) np.where() condition change

13. Sorting of axes and array elements

        (1) arr.sort(axis=1) sorting

        (2) The index position corresponding to arr.argsort() sorting

        (3) The index position where the maximum value of arr.argmax() is located

        (4) The index position where the minimum value of arr.argmin() is located

        (5) np.maximum(), np.minimum() compare the value of the same number

14. Addition/multiplication of axes and arrays

        (1) One-dimensional

        (2) two-dimensional

15. Cumulative addition/multiplication of axes and arrays

        (1) One-dimensional

        (2) two-dimensional

16. Index volume statistics np.bincount()

17. Array merge

        (1) np.vstack() vertical merge

        (2) np.hstack() horizontal merge

        (3) np.concatenate() vertical/horizontal merge

18. Array Split

        (1)np.hsplit()

        (2)e.g.vsplit()

        (3)np.split()

19. Other functions about mathematics and statistics

20. any() and all()

21. np.unique() deduplication

22. np.in1d() common element judgment

23. Shallow copy and deep copy

        (1) shallow copy

        (2) Deep copy

        (3) Compare shallow copy and deep copy

end


1. Create an array

        (1)np.array()

array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,like=None)

object: A sequence, which can be a list or a tuple.

dtype: numpy internal data type, which can convert data into integers or floating point numbers, you can choose int, int32, int64, float, float32, float64, etc.

copy: Copy the object sequence, the default is True. When object is an array, the copy will not affect the original array.

order: create the layout form of the array.

subok: The default is False. Whether to use the internal array type.

ndmin: Specifies the dimension of the array.

like: create an array with dimensions like xxx.

        Create arrays directly, you can type lists, tuples. The most used parameters are only the first two.

import numpy as np
arr = np.array([1,2,3,4,5])
print(arr)
print(type(arr))
print(arr.dtype)

        You can also pass in tuples, try it yourself. The difference in "appearance" of arrays is that lists are separated by commas, while arrays are separated by spaces; the difference in essence is that arrays are a combination of the same data (all numbers or all strings), and Lists can be of different categories.

        If both numbers and strings are passed in, the format data type of the array is strings. like:

arr = np.array([1,2,'大',4,5])
print(arr)
print(type(arr))
print(arr.dtype)

        The type() function is a built-in function of python, which is used to determine what data type the entire variable is; arr.dtype is a function of numpy, which is used to determine what data type the data in the array belongs to.

        (2) np.arange()

arange(start, stop, step, dtype=None, *, like=None) Continuous range creation.

start: start (inclusive).

stop: end (exclusive).

step: step size (spacing).

dtype: numpy internal data type.

like: create an array with dimensions like xxx.

arr = np.arange(1,9)
# arr = np.array([1,2,3,4,5,6,7,8,9])      # 一样
# arr = np.array(range(1,9))        # 一样
print(arr)

arr_2 = np.arange(1,9,2)
print(arr_2)

2. Create a multidimensional array

        (1) Create a two-dimensional array

arr = np.array([[1,2,3,4],
                [5,6,7,8]])
print(arr)

        Two-dimensional can be understood as a plane, or as a table with several rows and columns.

        (3) Create a multidimensional array

arr = np.array([[[1,2,3,4],[5,6,7,8]],
                [[10,11,12,13],[14,15,16,17]]])
print(arr)

        This example is to create a three-dimensional array, which is the superposition of multiple planes (two-dimensional).

3. Create a special array

        (1)np.ones()

np.ones(shape, dtype=None, order='C', *, like=None) Creates an array of all ones in shape.

shape: shape, several rows and several columns.

arr = np.ones([2,3])
print(arr)

        If you want integers, you can set the array data type:

arr = np.ones([2,3],dtype='int32')
print(arr)

        Create an array of 1 with 2 rows and 3 columns.

        (2)np.zeros()

arr = np.zeros([2,3])
print(arr)

        Create an array of zeros with 2 rows and 3 columns.

        (3)np.full()

np.full(shape, fill_value, dtype=None, order='C', *, like=None) Create an array of shape shape full of specific values.
arr = np.full([2,3],520)
print(arr)

        Creates an array of the specified values ​​with the specified shape.

        (4)np.eye()

        np.eye(N, k=0, dtype=<class 'float'>) creates an array with 1s on the diagonal and 0s on the rest.

N: the size and shape of the array, how many rows and columns (number of rows = number of columns)

k: Which side is all 1? The kth line above and below the diagonal (+ -) is all 1s, and the rest are all 0s.

dtype: internal data type.

a = np.eye(N=3,k=0)
print(a)
print('-'*35)

b = np.eye(N=3,k=1)
print(b)

        As shown in the figure, N=3 means 3 rows and 3 columns; k=0 means that the diagonal line is all 1, and the rest are all 0; k=1 means that the kth line above the diagonal line is all 1, and the rest are all 0 ;k=-1 means that the kth line below the diagonal is all 1, and the rest are all 0...

        (5) np.diag()

        np.diag(a, k=0) generates an array whose diagonal elements are a.

a: Can be a list, tuple, etc.

k: Which side is all 1? The element of the kth line above and below the diagonal (+ -) is a, and the rest are all 0.

a = [1,2,3]

b = np.diag(a)
print(b)

        Another example is k=1 (the element of the first line above the diagonal is a, and the rest are all 0):

a = (1,2,3)

b = np.diag(a,k=1)
print(b)

4. Array templates create arrays

        (1)np.ones_like()

np.ones_like(a, dtype=None, order='K', subok=True, shape=None) Use an array as a template to create an array with the same shape and a value of 1.
arr = np.array([[1,2,3],[4,5,6]])
print(arr)
print('-'*70)

a = np.ones_like(arr)
print(a)

        In this case, the arr array is used as a template to create an array with a shape like arr and all values ​​are one or 1.

        (2)np.zeros_like()

np.zeros_like(a, dtype=None, order='K', subok=True, shape=None) Use an array as a template to create an array with the same shape and a value of 0.
arr = np.array([[1,2,3],[4,5,6]])
print(arr)
print('-'*70)

a = np.zeros_like(arr)
print(a)

        (3)np.full_like()

np.full_like(a, fill_value, dtype=None, order='K', subok=True, shape=None) Use an array as a template to create an array with the same shape as the specified value.
arr = np.array([[1,2,3],[4,5,6]])
print(arr)
print('-'*70)

a = np.full_like(arr,520)
print(a)

5. Properties of Arrays

arr.shape Returns the shape of an array, that is, how many rows and how many columns.
arr.size Returns the number of all data elements in an array.
arr The returned data is several-dimensional.
arr = np.array([[[1,2,3,4],[5,6,7,8]],
                [[10,11,12,13],[14,15,16,17]]])
print(arr)
print('-'*70)
print(arr.shape)
print(arr.size)
print(arr.ndim)

        It can be seen that the three-dimensional array consists of 2 planes, each plane consists of 2 rows, and each row consists of 4 elements. The total number of elements is 16. is a three-dimensional array.

6. random library in numpy

        (1) Random number generation

np.random.randint(low, high=None, size=None, dtype=int)

low: minimum value (inclusive).

high: maximum value (exclusive).

size: quantity, shape, etc.

dtype: data type.

ran = np.random.randint(1,51,(2,3))
print(ran)

        Create an array of 2 rows and 3 columns with random numbers. The random number in numpy is that you can set the shape of the array (several rows and columns, etc.), while the random number library random can only get one at a time.

        The usage of other random number functions is the same as that of the random library. You can learn more about random functions in the random library. It should be noted that in the random library, the left and right boundaries are included; in np.random, the right boundary is not included.

        (2)np.random.choice()

        np.random.choice(a,size) Randomly select the size number of elements in a, and the elements may be repeated. a can be a number or a list. If it is a number, it is expressed as range(0,a).

a = np.random.choice(10,(3,2))
print(a)

        (3)np.random.shuffle()

        np.random.shuffle(a) Shuffle the cards and randomly shuffle the array a. Note that this function is directly applied to a, if you redefine the variable, you will get None. If you want to assign to a new variable, you can use the np.random.permutation function.

a = np.random.choice(10,(3,2))
print(a)
print('-'*50)

np.random.shuffle(a)
print(a)

        Knowledge points: One-dimensional shuffle elements, two-dimensional only shuffle the order of rows, and three-dimensional only shuffle the order of blocks. You can also try it yourself.

        (4)np.random.permutation()

        np.random.permutation(a) has the same effect as np.random.shuffle(a), but this function can assign new variables to the shuffled array so that the original array does not change.

a = np.random.choice(10,(3,2))
print(a)
print('-'*50)

b = np.random.shuffle(a)         # 无法赋予给新变量b,若赋予,会得到None
print(b)
print('-'*50)

c = np.random.permutation(a)     # 可以赋予给新变量c
print(c)

        The difference between shuffle and permutation is: the former is to shuffle the original array; the latter is to define a new variable (such as c), so that the new variable changes while the original array remains unchanged.

7. Transformation/transposition of array dimensions/shape

        (1)arr.reshape()

arr.reshape(shape, order='C') Convert the array to shape.
arr = np.arange(1,25)
print(arr)
print('-'*70)

a = arr.reshape(3,8)
print(a)
print('-'*70)

b = a.reshape(4,6)
print(b)

        The arr of 1 row can be converted into two-dimensional 3 rows and 8 columns, and the a of 3 rows and 8 columns can be converted into b of 4 rows and 6 columns. However, it should be noted that the total number of final arrays must be the same. For example, the total number of elements in arr in the case is 24, and the total number of converted a and b needs to be 24. That is to say, if it is two-dimensional, then the number of rows X columns The number must be equal to the total.

        If you want to convert multi-dimensional to one-dimensional, you can use reshape:

arr = np.arange(1,25).reshape(4,6)
print(arr)
print('-'*70)

a = arr.reshape(1,24)
print(a)
print('-'*70)

b = arr.reshape(24)
print(b)

        The difference between a and b is that a is still two-dimensional, and b is one-dimensional. (The easiest way is to look at the number of [ at the beginning)

        (2)arr.flatten()

arr.flatten(order='C') directly convert multi-dimensional to one-dimensional (no matter how many dimensions it is)
arr = np.arange(1,25).reshape(3,8)
print(arr)
print('-'*70)

a = arr.flatten()
print(a)

        (3) arr.T or arr.transpose() two-dimensional array transpose

a = np.arange(21).reshape(7,3)
print(a)
print('-'*70)

print(a.transpose())       # 也可以简写成 a.T

        Array transpose flips the array around the diagonal \ axis.

8. Array operation

        (1) Addition, subtraction, multiplication and division between arrays and numbers

a = np.array([1,4,5,3,8,6]).reshape(2,3)
print(a)
print('-'*70)

b = a*5
print(b)

        When an array adds, subtracts, multiplies and divides a number, it also acts on each number in the array, that is, each number in the array in the case is multiplied by 5.

        (2) Addition, subtraction, multiplication and division between arrays (cases with the same shape)

a = np.array([1,4,5,3,8,6]).reshape(2,3)
b = np.array([2,0,7,5,8,1]).reshape(2,3)

print(a)
print('-'*70)
print(b)
print('-'*70)
print(a*b)

        The premise of addition, subtraction, multiplication and division between arrays is that the shape of the array must comply with the "row/column alignment" or "case alignment" principle, if not, an error will be reported "operands could not be broadcast together with shapes (6,) (2,3)" . "Case alignment" means that each number in one array corresponds to each number in the other array, and both arrays have the same number of rows and the same number of columns.

        Under the condition of conforming to the principle, the operation between the arrays is to add, subtract, multiply and divide the value of the corresponding position. In the division, it should be noted that the divisor array does not contain 0. Note that what is mentioned here is between arrays, not matrices in mathematics, because the multiplication of matrices in mathematics is not multiplication of corresponding positions. Once the operation of linear algebra matrix is ​​involved, np.array() should not be used, but np.matrix() is used. The method of calling is the same as that of addition + subtraction - multiplication * division / operation. Especially for multiplication, the result of multiplying two np.matrix() and multiplying two np.array() is different. It is very obvious that the result obtained by np.matrix() conforms to the matrix in mathematics.

        (3) Addition, subtraction, multiplication and division between arrays (different shapes but row/column alignment)

        "Row-to-bit" means that the length of a single row/column array is the same as the length of each row/column of another array. like:

a = np.array([[1,3,4,6,5,9],[4,3,8,5,1,6]])
b = np.array([6,4,7,4,9,1])
print(a)
print('-'*70)
print(b)

print('-'*70)
print(a+b)

 

        The operation of this kind of array is to add each row of the a array to the b array, and the same is true for subtraction, multiplication and division. Each row is a unit, try it yourself. There is a peculiarity that there must be an array that is single row/single column.

9. Data Selection/Data Slicing

        (1) One-dimensional array

a = np.arange(10)
print(a)
print('-'*30)

print(a[4:])      # 取第5个及以后的数
print('-'*30)

print(a[4])       # 取第5个

        (2) Two-dimensional array

        arr[row,column]

a = np.arange(1,21).reshape(4,5)
print(a)
print('-'*30)

print(a[2,:])      # 行:取第3行及以后,列:全部。当列为全部时也可简写成a[2]
print('-'*30)

print(a[:,1])      # 行:全部,列:第1列。

        In arr[row,column], row is a row, column is a column, row first and then column. Select multiple rows and columns as follows:

a = np.arange(1,21).reshape(4,5)
print(a)
print('-'*30)

print(a[2:,:])      # 选取第3行到最后,也可写成a[2:]
print('-'*30)

print(a[:,:-1])      # 选取全部行 列到倒数第二列

10. Magic Index

        The above multiple rows/columns are continuous, such as 3 to the penultimate row. For taking non-contiguous rows/columns, you need to use a magic index, and the magic index is expressed as two [[ to discrete values.

        (1) One-dimensional

a = np.arange(21)
print(a)
print('-'*70)

print(a[[3,5,7,8,10]])      # 取a中的第4,6,8,9,11个数

        (2) two-dimensional

                ① Transaction

a = np.arange(21).reshape(7,3)
print(a)
print('-'*70)

print(a[[1,3,5],:])          # 取第1,3,5行(从0开始)。当列全取时,也可写成a[[1,3,5]]

                ② Take columns

a = np.arange(21).reshape(7,3)
print(a)
print('-'*70)

print(a[:,[0,2]])

                ③ Alignment value

        arr[ [row] , [column] ] where the number of rows and the number of columns must be equal , this function is a bit value, such as arr[ [0,2,4] , [1,3,5] ] what you get is The number of (row 0, column 1) (row 2, column 3) (row 4, column 5), 3 in total.

a = np.arange(30).reshape(5,6)
print(a)
print('-'*30)

print(a[[0,2,4],[1,3,5]])    # 取(第0行第1列)(第2行第3列)(第4行第5列)

        The same is true for multi-dimensional methods, think about it.

11. Filtering/conditional statistics of array elements

        (1) Filter out the values ​​that meet the conditions

rand = np.random.randint(1,50,(4,5))
a = np.array(rand)
print(a)
print('-'*70)

print(a < 20)
print('-'*70)

print(a[a<20])

        This time we use the knowledge learned above, random numbers, and randomly generate 20 arrays ranging from 1 to 49 with 4 rows and 5 columns.

        It can be seen that when a<20 is judged, the Boolean value (True or False) of each number is obtained. Then you can select these numbers in a, a[a<20], the number that is True will be selected, but the number that is False will not, and the returned results are all one-dimensional.

        (2) Count the number of eligible numbers

        On the basis of the above example, you can use the .sum() function to sum. When .sum() acts on a Boolean list (a list composed of True or False), True will be considered as 1, and False will be considered as 0, so the condition is the number of True, that is, the number of eligible number. (Ignore sum(b) for the time being, because this will be summed by column, which will be discussed later when summed by axis)

rand = np.random.randint(1,50,(4,5))
a = np.array(rand)
print(a)
print('-'*70)

b = a<20
print(b.sum())

        What about the unqualified numbers? We can negate the Boolean list so that True becomes False and False becomes True. Add ~ before the boolean list:

b = ~(a<20)
print(b.sum())

        (3) Multi-condition screening

a = np.arange(10).reshape(2,5)
print(a)
print('-'*30)

factor = (a % 2 == 0) & (a < 7)
print(a[factor])

        Multiple conditions screen out even numbers less than 7.

12. Change the value of an element

        (1) Global changes

        Directly assign values ​​to eligible array elements. Change the number <5 to 0, and change the number ≥ 5 to 1 as follows:

a = np.arange(1,11).reshape(2,5)
print(a)
print('-'*70)

a[a < 5] = 0      # 把小于5的数改为0
a[a >= 5] = 1     # 把大于等于5的数改为1

print(a)

        For such operations, special attention needs to be paid to the sequence. For example, the 5th and 6th lines cannot be reversed. Just imagine, if you first change the number ≥ 5 to 0, and then change the number < 5 to 0, then you will get all 0. Because the number ≥ 5 was changed to 1 for the first time, and these numbers that were changed to 1 were changed to 0 in the second time.

        Similarly, we can think of adding, subtracting, multiplying and dividing certain numbers based on the original value:

a = np.arange(1,11).reshape(2,5)
print(a)
print('-'*70)

a[a >= 5] += 100     # 这里变成 +=
a[a < 5] += 10       # 这里也是

print(a)

        Here is also, to consider (operate on ≥5 first) or (operate on less than 5 first). Avoid the problems in the previous paragraph.

        (2) Local changes (secondary slicing)

a = np.arange(1,21).reshape(4,5)
print(a)
print('-'*30)

a[:,3][a[:,3]>5] = 520
print(a)

        (3) np.where() condition change

        np.where(condition, T_value, F_value) Change to T_value if the condition meets the condition, otherwise change to F_value. It can also be like the IF function of excel, and the negative value can continue to be nested from the function, as follows:.

a = np.array([[1,3,6],[9,3,2],[1,4,3]])
print(a)
print('-'*35)

print(np.where(a>3,520,1314))       # 把>3的值改为520,其余改为1314
print('-'*35)

print(np.where(a>3,520,np.where(a>2,555,1314)))  # 把>3的值改为520,>2的改为555,其他改为1314

        When there are multiple conditions, you can write multiple conditional expressions, and add () if necessary. The elements that need to be kept unchanged are directly assigned to the element group, such as changing the value > 3 and < 8 to 520, and the rest remain unchanged:

a = np.array([[1,3,6],[9,3,2],[1,4,3]])
print(a)
print('-'*35)

print(np.where((a>3) & (a<8) ,520,a))     # 把>3且<8的值改为520,其余保持不变

13. Sorting of axes and array elements

        (1) arr.sort(axis=1) sorting

np.random.seed(1)
a = np.random.randint(1,51,(3,4))
print(a)
print('-'*30)

a.sort(axis=1)
print(a)

        Random.seed is set here to make the random numbers output each time equal. axis, that is, the axis, in the .sort() function, when axis=1, the elements in the row are arranged from small to large; when axis=0, the elements in the column are arranged from small to large. The following is the case of axis=0:

        For difficult partners, try both.

        (2) The index position corresponding to arr.argsort() sorting

np.random.seed(1)
a = np.random.randint(1,51,10)
print(a)
print('-'*40)

print(a.argsort())

        This function gets the rank of the size.

        (3) The index position where the maximum value of arr.argmax() is located

np.random.seed(1)
a = np.random.randint(1,51,10)
print(a)
print('-'*40)

print(a.argmax())

        The maximum value is 44, and the position index of 44 is 1. If there are multiple maximum values, only the position index of the first maximum value is displayed.

        (4) The index position where the minimum value of arr.argmin() is located

np.random.seed(1)
a = np.random.randint(1,51,10)
print(a)
print('-'*40)

print(a.argmin())

        The minimum value is 1, and the position index of 1 is 8. Similarly, if there are multiple minimum values, only the position index of the first minimum value is displayed.

        (5) np.maximum(), np.minimum() compare the value of the same number

a = np.random.randint(1,50,10)
b = np.random.randint(1,50,10)
print(a)
print(b)
print('-'*50)

print(np.maximum(a,b))

        At the same position, the array is compared up and down, and the maximum value is taken.

14. Addition/multiplication of axes and arrays

        (1) One-dimensional

a = np.random.randint(1,50,12)
print(a)
print('-'*50)

print(np.sum(a))

        (2) two-dimensional

a = np.random.randint(1,50,12).reshape(3,4)
print(a)
print('-'*50)

print(np.sum(a,axis=0))

        axis=0 is output as rows. Similarly, axis=1 is output as columns (just displayed as rows):

        The principle of array multiplication np.prod() is the same as that of addition, you only need to know the axial axis, and you will naturally think of it when you need it.

15. Cumulative addition/multiplication of axes and arrays

        (1) One-dimensional

a = np.random.randint(1,50,12)
print(a)
print('-'*50)

print(np.cumsum(a))

        Cumulative addition is a snowball method of adding the previous cumulative number to the current number. For example, 25 is obtained by 4+21, 49 is obtained by 25+24, 56 is obtained by 49+7, 66 is obtained by 56+10...

        (2) two-dimensional

a = np.random.randint(1,50,12).reshape(3,4)
print(a)
print('-'*50)

print(np.cumsum(a,axis=0))

        axis=0 outputs as rows. Similarly, axis=1 is output as columns:

        When the cumulative addition and multiplication does not add the axial axis, the array is turned into one-dimensional by default, and then the cumulative addition and multiplication is performed:

a = np.random.randint(1,50,12).reshape(3,4)
print(a)
print('-'*50)

print(np.cumsum(a))

        The axis usage of multiplication np.cumprod() is the same as that of addition, so you can try it yourself.

16. Index volume statistics np.bincount()

a = np.array([4,5,5,3,1,4,4,4,0,5,1,3,4])
b = np.bincount(a)
print(b)

        Many bloggers can’t explain this function clearly, and I have thought about how to explain it clearly for a long time, and finally I think the diagram is easy to understand step by step:

        First get the maximum value of the array, 5, then create an index from 0 to 5, count the occurrences of 0 to 5, fill in the index, fill in 0 if there is no, and get the result [1 2 0 2 5 3] .

17. Array merge

        (1) np.vstack() vertical merge

np.random.seed(1)
a = np.random.randint(1,50,(2,3))
b = np.random.randint(1,20,3)

print(a)
print('-'*35)
print(b)
print('-'*35)

c = np.vstack([a,b])
print(c)

        (2) np.hstack() horizontal merge

np.random.seed(1)
a = np.random.randint(1,50,(2,3))
b = np.random.randint(1,20,(2,1))

print(a)
print('-'*35)
print(b)
print('-'*35)

c = np.hstack([a,b])
print(c)

        Pay attention to horizontal merging. The shape of b needs to be two-dimensional. If it is one-dimensional, an error will be reported.

        (3) np.concatenate() vertical/horizontal merge

np.random.seed(1)
a = np.random.randint(1,50,(2,3))
b = np.random.randint(1,20,(1,3))

print(a)
print('-'*50)
print(b)
print('-'*50)

c = np.concatenate([a,b],axis=0)
print(c)

        Note that whether it is horizontal axis=1 or vertical axis=0, the b array must be two-dimensional. If it is one-dimensional, an error will be reported. You can use .reshape() to convert one dimension into a shape with several rows and columns. When axis=1:

np.random.seed(1)
a = np.random.randint(1,50,(2,3))
b = np.random.randint(1,20,(2,1))   # 这里变了

print(a)
print('-'*50)
print(b)
print('-'*50)

c = np.concatenate([a,b],axis=1)   # 这里也变了
print(c)

        Compared with np.vstack() and np.hstack(), this function has the advantage that it can pass in axis to control the axis. But more attention needs to be paid to the shape of the array.

18. Array Split

        (1)np.hsplit()

np.random.seed(1)
a = np.random.randint(1,50,(5,6))
print(a)
print('-'*35)

b,c = np.hsplit(a,2)     # 第2列以后的会被拆走(保留0,1,2)
print(b)
print('-'*35)
print(c)

        The array is split into two arrays, which can be understood as the inverse operation of np.hstack() horizontal merge. You can also pass in a tuple for multiple splits, such as np.hsplit(a,(1,2)).

        (2)e.g.vsplit()

np.random.seed(1)
a = np.random.randint(1,50,(5,6))
print(a)
print('-'*35)

b = np.vsplit(a,(1,2))
print(b)

        (3)np.split()

np.random.seed(1)
a = np.random.randint(1,50,(5,6))
print(a)
print('-'*35)

b = np.split(a,(1,2),axis=0)
print(b)

        In the same way, np.split() can pass in the axial direction to control the direction of cutting at will.

 

19. Other functions about mathematics and statistics

np.average() weighted average. The parameter weights can be passed in
np.mean() average.
np.median() Medium number.

        Both can be passed in the axial axis.

20. any() and all()

        .any() As long as one is non-zero, then True, otherwise False.

        .all() All non-zero is True, otherwise False.

a = np.array([1,2,1,1,1,1,1,0])

print(a.any())
print(a.all())

        In the numeric array a, as long as one is not 0, any() will return True; as long as one is 0 (all are not 0 to be True), all() will return False.

        In addition, in the list of Boolean values, True means 1, False means 0, you can use True as 1 and False as 0 to judge:

a = np.array([True,True,True,False,False,True,True,False])

print(a.any())
print(a.all())

21. np.unique() deduplication

a = np.array([1,2,1,1,1,1,1,0])
print(np.unique(a))

        While deduplication, the np.unique() function also comes with a sorting function, from small to large. This function can also pass in the axis axis for axis deduplication. When axis is not passed in, if the array is multi-dimensional, the returned result is still one-dimensional.

22. np.in1d() common element judgment

        np.in1d(ar1, ar2, assume_unique=False, invert=False) Determine whether the elements in ar1 are in ar2.
a = np.array([6,0,0,3,2,5,6])
print(np.in1d(a,[2,3,6]))

        As shown in the figure, the first element 6 in a is in [2, 3, 6], so it returns True; the second element 0 is not in [2, 3, 6], so it returns False... Although It is also possible to use a for loop to traverse and judge, but compared with np.in1d(), the amount of code has increased. According to your actual situation, you can use this function if you can think of it. If you can’t think of it, you can use the for loop to traverse and judge directly.

23. Shallow copy and deep copy

        (1) shallow copy

        Shallow copy is the direct point of the new variable to the old variable. If one of the variables is modified, the other variables will also be affected. For example, the old variable directly assigns the new variable, filters, etc.:

a = np.array([6,0,0,3,2,5,6])

b = a          # 浅拷贝
c = a[3:]      # 浅拷贝
shallow copy mempool

        (2) Deep copy

        A deep copy is a copy obtained by duplicating old variables. If one of them is modified, the others will not be affected. Such as .copy():

a = np.array([6,0,0,3,2,5,6])

b = a.copy()
deep copy memory pool

        (3) Compare shallow copy and deep copy

a = np.array([6,0,0,3,2,5,6])

b = a          # 浅拷贝
b.sort()       # 改变b

print(a)

        It can be seen that when the shallow copy changes b, a is also affected and changed accordingly. This is because in the shallow copy, both a and b are in the same memory pool, and they both point to the [6 0 0 3 2 5 6] array. When b is changed, a will naturally change accordingly. You can refer to the above figure Shallow copy memory pool to understand.

a = np.array([6,0,0,3,2,5,6])

b = a.copy()   # 深拷贝
b.sort()       # 改变b

print('数组a:',a)
print('数组b:',b)

        It can be seen that when b is changed during deep copy, a is not affected. Because the deep copy makes them be placed in different memory pools, you can refer to the deep copy memory pool in the above figure to understand.

end

        Numpy is the basis for learning pandas. With the knowledge of numpy, you will be able to learn pandas like a duck to water. In life and work, more data is converted into pandas objects through arrays, and arrays are rarely processed directly, because pandas has more efficient functions and methods. It doesn't matter if you don't know much about numpy, at least you have to have an understanding and an impression that you can "clear at one point" when learning pandas, and you won't be confused for a long time, especially the axis. Do it yourself, after learning here, and use it for half a month, I believe you can directly enter pandas.

Guess you like

Origin blog.csdn.net/m0_71559726/article/details/130374200