Big data: detailed explanation of Numpy basic application

Numpy basic application

Numpy is an open source Python scientific computing library for fast processing of arrays of arbitrary dimensions . Numpy supports common array and matrix operations . For the same numerical calculation task, using NumPy not only makes the code much more concise, but also the performance of NumPy is far better than that of native Python, which is basically a difference of one to two orders of magnitude. The larger the size, the more obvious the advantages of NumPy.

The core data type of Numpy is ndarraythat it ndarraycan handle one-dimensional, two-dimensional and multi-dimensional arrays. This object is equivalent to a fast and flexible large data container. The underlying code of NumPy is written in C language, which solves the limitation of GIL. ndarrayWhen accessing data, the addresses of data and data are continuous, which ensures efficient batch operations, which are far better than those in Python list; On the other hand , objects provide more methods to process data, especially methods related to statistics, which are not ndarraynative to Python .list

For all articles, please visit the column: "Python Full Stack Tutorial (0 Basics)"
and recommend the most recent update: "Detailed Explanation of High-frequency Interview Questions in Dachang Test" This column provides detailed answers to interview questions related to high-frequency testing in recent years, combined with your own Years of work experience, as well as the guidance of peer leaders summed up. It aims to help students in testing and python to pass the interview smoothly and get a satisfactory offer!


Preparation

  1. Start Notebook

    jupyter notebook
    

    Tip : Before starting Notebook, it is recommended to install data analysis-related dependencies, including the three artifacts mentioned above and related dependencies, including: numpy, pandas, matplotlib, and openpyxlso on. If you use Anaconda, you don't need to install it separately.

  2. import

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    

    Note : If you have started the Notebook but have not installed the relevant dependent libraries, for example, you can enter and run the cell numpyin the cell of the Notebook to install NumPy, or you can install multiple third-party libraries at one time, which needs to be in the cell !pip install numpyinput %pip install numpy pandas matplotlib. Note that in the above code, we not only import NumPy, but also import pandas and matplotlib libraries.

create array object

There are many ways to create ndarrayobjects. The following describes how to create one-dimensional arrays, two-dimensional arrays and multidimensional arrays.

one-dimensional array

  • Method 1: Using arraya function, by listcreating an array object

    code:

    array1 = np.array([1, 2, 3, 4, 5])
    array1
    

    output:

    array([1, 2, 3, 4, 5])
    
  • Method 2: Use arangea function to create an array object by specifying a value range

    code:

    array2 = np.arange(0, 20, 2)
    array2
    

    output:

    array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
    
  • Method 3: Use linspacea function to create an array object with a specified range of evenly spaced numbers

    code:

    array3 = np.linspace(-5, 5, 101)
    array3
    

    output:

    array([-5. , -4.9, -4.8, -4.7, -4.6, -4.5, -4.4, -4.3, -4.2, -4.1, -4. ,
           -3.9, -3.8, -3.7, -3.6, -3.5, -3.4, -3.3, -3.2, -3.1, -3. , -2.9,
           -2.8, -2.7, -2.6, -2.5, -2.4, -2.3, -2.2, -2.1, -2. , -1.9, -1.8,
           -1.7, -1.6, -1.5, -1.4, -1.3, -1.2, -1.1, -1. , -0.9, -0.8, -0.7,
           -0.6, -0.5, -0.4, -0.3, -0.2, -0.1,  0. ,  0.1,  0.2,  0.3,  0.4,
            0.5,  0.6,  0.7,  0.8,  0.9,  1. ,  1.1,  1.2,  1.3,  1.4,  1.5,
            1.6,  1.7,  1.8,  1.9,  2. ,  2.1,  2.2,  2.3,  2.4,  2.5,  2.6,
            2.7,  2.8,  2.9,  3. ,  3.1,  3.2,  3.3,  3.4,  3.5,  3.6,  3.7,
            3.8,  3.9,  4. ,  4.1,  4.2,  4.3,  4.4,  4.5,  4.6,  4.7,  4.8,
            4.9,  5. ])
    
  • Method 4: Use numpy.randomthe function of the module to generate random numbers and create an array object

    yields 10 [0, 1) [0, 1)[0,1 ) Random decimals in range, code:

    array4 = np.random.rand(10)
    array4
    

    output:

    array([0.45556132, 0.67871326, 0.4552213 , 0.96671509, 0.44086463,
           0.72650875, 0.79877188, 0.12153022, 0.24762739, 0.6669852 ])
    

    yields 10 [1, 100) [1, 100)[1,100 ) random integer in the range, code:

    array5 = np.random.randint(1, 100, 10)
    array5
    

    output:

    array([29, 97, 87, 47, 39, 19, 71, 32, 79, 34])
    

    produces 20 μ = 50 \mu=50m=50 σ = 10 \sigma=10 p=The normal distribution random number of 10 , the code:

    array6 = np.random.normal(50, 10, 20)
    array6
    

    output:

    array([55.04155586, 46.43510797, 20.28371158, 62.67884053, 61.23185964,
           38.22682148, 53.17126151, 43.54741592, 36.11268017, 40.94086676,
           63.27911699, 46.92688903, 37.1593374 , 67.06525656, 67.47269463,
           23.37925889, 31.45312239, 48.34532466, 55.09180924, 47.95702787])
    

Note : There are many other ways to create a one-dimensional array, such as reading strings, reading files, parsing regular expressions, etc. We will not discuss these methods here, interested readers can do their own research.

Two-dimensional array

  • Method 1: Use functions to create array objects arraythrough nestinglist

    code:

    array7 = np.array([[1, 2, 3], [4, 5, 6]])
    array7
    

    output:

    array([[1, 2, 3],
           [4, 5, 6]])
    
  • Method 2: Use zeros, ones, and fullfunctions to specify the shape of the array to create an array object

    Use zerosfunction, code:

    array8 = np.zeros((3, 4))
    array8
    

    output:

    array([[0., 0., 0., 0.],
           [0., 0., 0., 0.],
           [0., 0., 0., 0.]])
    

    Use onesfunction, code:

    array9 = np.ones((3, 4))
    array9
    

    output:

    array([[1., 1., 1., 1.],
           [1., 1., 1., 1.],
           [1., 1., 1., 1.]])
    

    Use fullfunction, code:

    array10 = np.full((3, 4), 10)
    array10
    

    output:

    array([[10, 10, 10, 10],
           [10, 10, 10, 10],
           [10, 10, 10, 10]])
    
  • Method 3: Use the eye function to create an identity matrix

    code:

    array11 = np.eye(4)
    array11
    

    output:

    array([[1., 0., 0., 0.],
           [0., 1., 0., 0.],
           [0., 0., 1., 0.],
           [0., 0., 0., 1.]])
    
  • Method 4: By reshapeconverting a one-dimensional array into a two-dimensional array

    code:

    array12 = np.array([1, 2, 3, 4, 5, 6]).reshape(2, 3)
    array12
    

    output:

    array([[1, 2, 3],
           [4, 5, 6]])
    

    Tip : reshapeIt is ndarraya method of the object. reshapeWhen using the method, you need to ensure that the number of array elements after reshaping is consistent with the number of array elements before reshaping, otherwise an exception will occur.

  • Method 5: numpy.randomGenerate random numbers through the function of the module to create an array object

    yields [0, 1) [0, 1)[0,1 ) A two-dimensional array of 3 rows and 4 columns composed of random decimals in the range, the code:

    array13 = np.random.rand(3, 4)
    array13
    

    output:

    array([[0.54017809, 0.46797771, 0.78291445, 0.79501326],
           [0.93973783, 0.21434806, 0.03592874, 0.88838892],
           [0.84130479, 0.3566601 , 0.99935473, 0.26353598]])
    

    yields [1, 100) [1, 100)[1,A two-dimensional array of 3 rows and 4 columns composed of random integers in the range of 100 ) , the code:

    array14 = np.random.randint(1, 100, (3, 4))
    array14
    

    output:

    array([[83, 30, 64, 53],
           [39, 92, 53, 43],
           [43, 48, 91, 72]])
    

Multidimensional Arrays

  • Create multidimensional arrays using random

    code:

    array15 = np.random.randint(1, 100, (3, 4, 5))
    array15
    

    output:

    array([[[94, 26, 49, 24, 43],
            [27, 27, 33, 98, 33],
            [13, 73,  6,  1, 77],
            [54, 32, 51, 86, 59]],
    
           [[62, 75, 62, 29, 87],
            [90, 26,  6, 79, 41],
            [31, 15, 32, 56, 64],
            [37, 84, 61, 71, 71]],
    
           [[45, 24, 78, 77, 41],
            [75, 37,  4, 74, 93],
            [ 1, 36, 36, 60, 43],
            [23, 84, 44, 89, 79]]])
    
  • Reshape a one-dimensional and two-dimensional array into a multidimensional array

    Adjust the shape of a one-dimensional array to a multi-dimensional array, the code:

    array16 = np.arange(1, 25).reshape((2, 3, 4))
    array16
    

    output:

    array([[[ 1,  2,  3,  4],
            [ 5,  6,  7,  8],
            [ 9, 10, 11, 12]],
    
           [[13, 14, 15, 16],
            [17, 18, 19, 20],
            [21, 22, 23, 24]]])
    

    Adjust the shape of a two-dimensional array into a multidimensional array, the code:

    array17 = np.random.randint(1, 100, (4, 6)).reshape((4, 3, 2))
    array17
    

    output:

    array([[[60, 59],
            [31, 80],
            [54, 91]],
    
           [[67,  4],
            [ 4, 59],
            [47, 49]],
    
           [[16,  4],
            [ 5, 71],
            [80, 53]],
    
           [[38, 49],
            [70,  5],
            [76, 80]]])
    
  • Read the picture to get the corresponding three-dimensional array

    code:

    array18 = plt.imread('guido.jpg')
    array18
    

    output:

    array([[[ 36,  33,  28],
            [ 36,  33,  28],
            [ 36,  33,  28],
            ...,
            [ 32,  31,  29],
            [ 32,  31,  27],
            [ 31,  32,  26]],
    
           [[ 37,  34,  29],
            [ 38,  35,  30],
            [ 38,  35,  30],
            ...,
            [ 31,  30,  28],
            [ 31,  30,  26],
            [ 30,  31,  25]],
    
           [[ 38,  35,  30],
            [ 38,  35,  30],
            [ 38,  35,  30],
            ...,
            [ 30,  29,  27],
            [ 30,  29,  25],
            [ 29,  30,  25]],
    
           ...,
    
           [[239, 178, 123],
            [237, 176, 121],
            [235, 174, 119],
            ...,
            [ 78,  68,  56],
            [ 75,  67,  54],
            [ 73,  65,  52]],
    
           [[238, 177, 120],
            [236, 175, 118],
            [234, 173, 116],
            ...,
            [ 82,  70,  58],
            [ 78,  68,  56],
            [ 75,  66,  51]],
    
           [[238, 176, 119],
            [236, 175, 118],
            [234, 173, 116],
            ...,
            [ 84,  70,  61],
            [ 81,  69,  57],
            [ 79,  67,  53]]], dtype=uint8)
    

    Explanation : The above code reads guido.jpgthe picture file named under the current path. The picture in the computer system is usually composed of pixels in several rows and columns, and each pixel is composed of three primary colors of red, green and blue, so it can be Represented by a three-dimensional array. Reading pictures uses matplotliblibrary imreadfunctions.

properties of the array object

  1. sizeAttribute: the number of array elements

    code:

    array19 = np.arange(1, 100, 2)
    array20 = np.random.rand(3, 4)
    print(array19.size, array20.size)
    

    output:

    50 12
    
  2. shapeAttribute: the shape of the array

    code:

    print(array19.shape, array20.shape)
    

    output:

    (50,) (3, 4)
    
  3. dtypeAttribute: the data type of the array elements

    code:

    print(array19.dtype, array20.dtype)
    

    output:

    int64 float64
    

    ndarrayThe data type of the object element can refer to the table shown below.

  4. ndimAttribute: Dimensions of the array

    code:

    print(array19.ndim, array20.ndim)
    

    output:

    1 2
    
  5. itemsizeAttribute: the number of bytes of memory space occupied by a single element of the array

    code:

    array21 = np.arange(1, 100, 2, dtype=np.int8)
    print(array19.itemsize, array20.itemsize, array21.itemsize)
    

    output:

    8 8 1
    

    Description : When using arangeto create an array object, dtypespecify the data type of the element through the parameter. It can be seen that np.int8it represents an 8-bit signed integer, which only occupies 1 byte of memory space, and the value range is [ − 128 , 127 ] [-128,127][128,127]

  6. nbytesAttribute: the number of bytes of memory space occupied by all elements of the array

    code:

    print(array19.nbytes, array20.nbytes, array21.nbytes)
    

    output:

    400 96 50
    
  7. flatAttribute: Iterator to the elements of the array (after one-dimensionalization)

    code:

    from typing import Iterable
    
    print(isinstance(array20.flat, np.ndarray), isinstance(array20.flat, Iterable))
    

    output:

    False True
    
  8. baseAttribute: the base object of the array (if the array shares the memory space of other arrays)

    code:

    array22 = array19[:]
    print(array22.base is array19, array22.base is array21)
    

    output:

    True False
    

    Explanation : The above code uses the slice operation of the array, which is similar to listthe slice of the type in Python, but the details are not exactly the same. The following will specifically explain this knowledge point. Through the above code, it can be found that ndarraythe new array object obtained after slicing shares the data in memory with the original array object, so the array22attribute baseis array19the corresponding array object.

Array indexing and slicing

Similar to lists in Python, NumPy ndarrayobjects can perform indexing and slicing operations. Elements in the array can be obtained or modified through indexing, and a part of the array can be taken out through slicing.

  1. Index operation (ordinary index)

    One-dimensional array, code:

    array23 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
    print(array23[0], array23[array23.size - 1])
    print(array23[-array23.size], array23[-1])
    

    output:

    1 9
    1 9
    

    Two-dimensional array, code:

    array24 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    print(array24[2])
    print(array24[0][0], array24[-1][-1])
    print(array24[1][1], array24[1, 1])
    

    output:

    [7 8 9]
    1 9
    5 5
    [[ 1  2  3]
     [ 4 10  6]
     [ 7  8  9]]
    

    code:

    array24[1][1] = 10
    print(array24)
    array24[1] = [10, 11, 12]
    print(array24)
    

    output:

    [[ 1  2  3]
     [ 4 10  6]
     [ 7  8  9]]
    [[ 1  2  3]
     [10 11 12]
     [ 7  8  9]]
    
  2. slice operation (slice index)

    Slicing is a syntax like this [开始索引:结束索引:步长]. By specifying the start index (the default value is infinitesimal), the end index (the default value is infinite) and the step size (the default value is 1), the elements of the specified part are taken from the array and a new array is formed. Because the start index, end index, and step size have default values, they can all be omitted, and the second colon can also be omitted if the step size is not specified. The slicing operation of a one-dimensional array listis very similar to the slicing of types in Python, and will not be repeated here. For the slicing of a two-dimensional array, you can refer to the following code, which is believed to be very easy to understand.

    code:

    print(array24[:2, 1:])
    

    output:

    [[2 3]
     [5 6]]
    

    code:

    print(array24[2])
    print(array24[2, :])
    

    output:

    [7 8 9]
    [7 8 9]
    

    code:

    print(array24[2:, :])
    

    output:

    [[7 8 9]]
    

    code:

    print(array24[:, :2])
    

    output:

    [[1 2]
     [4 5]
     [7 8]]
    

    code:

    print(array24[1, :2])
    print(array24[1:2, :2])
    

    output:

    [4 5]
    [[4 5]]
    

    code:

    print(array24[::2, ::2])
    

    output:

    [[1 3]
     [7 9]]
    

    code:

    print(array24[::-2, ::-2])
    

    output:

    [[9 7]
     [3 1]]
    

    Regarding the indexing and slicing operations of arrays, you can use the following two pictures to enhance your impression. These two pictures are from the book "Data Analysis with Python"pandas , which is a classic in the field of Python data analysis written by the author of the library, Wes McKinney Textbook, interested readers can buy and read the original book.

  3. fancy index

    Fancy indexing refers to the use of integer arrays for indexing. The integer arrays mentioned here can be NumPy ndarray, or iterable types such as Python list, tupleand can use positive or negative indexes.

    Fancy indexing of 1D arrays, code:

    array25 = np.array([50, 30, 15, 20, 40])
    array25[[0, 1, -1]]
    

    output:

    array([50, 30, 40])
    

    Fancy indexing of 2D arrays, code:

    array26 = np.array([[30, 20, 10], [40, 60, 50], [10, 90, 80]])
    # 取二维数组的第1行和第3行
    array26[[0, 2]]
    

    output:

    array([[30, 20, 10],
           [10, 90, 80]])
    

    code:

    # 取二维数组第1行第2列,第3行第3列的两个元素
    array26[[0, 2], [1, 2]]
    

    output:

    array([20, 80])
    

    code:

    # 取二维数组第1行第2列,第3行第2列的两个元素
    array26[[0, 2], 1]
    

    output:

    array([20, 90])
    
  4. boolean index

    The Boolean index is to index the array elements through an array of Boolean type. The array of Boolean type can be constructed manually, or can be generated by relational operations.

    code:

    array27 = np.arange(1, 10)
    array27[[True, False, True, True, False, False, False, False, True]]
    

    output:

    array([1, 3, 4, 9])
    

    code:

    array27 >= 5
    

    output:

    array([False, False, False, False,  True,  True,  True,  True,  True])
    

    code:

    # ~运算符可以实现逻辑变反,看看运行结果跟上面有什么不同
    ~(array27 >= 5)
    

    output:

    array([ True,  True,  True,  True, False, False, False, False, False])
    

    code:

    array27[array27 >= 5]
    

    output:

    array([5, 6, 7, 8, 9])
    

Tip : Although the slicing operation creates a new array object, the new array and the original array share the data in the array. Simply put, if the data in the array is modified through the new array object or the original array object, the modification is actually the same block data. baseFancy indexing and Boolean indexing will also create a new array object, and the new array copies the elements of the original array. The relationship between the new array and the original array does not share data. This can also be understood from the properties of the array mentioned above. Pay attention when using it.

Case: Processing an image by array slicing

Learning basic knowledge is always boring and lacks a sense of accomplishment, so we still come to a case to demonstrate the use of the array indexing and slicing operations learned above. As we mentioned earlier, images can be represented by a three-dimensional array, and then the image can be processed by operating on the three-dimensional array corresponding to the image, as shown below.

Read in the image to create a three-dimensional array object.

guido_image = plt.imread('guido.jpg')
plt.imshow(guido_image)

Perform reverse slice on the 0-axis of the array to realize the vertical flip of the image.

plt.imshow(guido_image[::-1])

Slice the 1-axis of the array in reverse to realize the horizontal flip of the image.

plt.imshow(guido_image[:,::-1])

Cut Guido's head out.

plt.imshow(guido_image[30:350, 90:300])

Methods of Array Objects

statistical methods

Statistical methods mainly include: sum(), mean(), std(), var(), min(), max(), argmin(), , argmax()etc. cumsum(), which are respectively used to sum, average, standard deviation, variance, maximum, minimum, cumulative sum, etc. of the elements in the array, please refer to the following code.

array28 = np.array([1, 2, 3, 4, 5, 5, 4, 3, 2, 1])
print(array28.sum())
print(array28.mean())
print(array28.max())
print(array28.min())
print(array28.std())
print(array28.var())
print(array28.cumsum())

output:

30
3.0
5
1
1.4142135623730951
2.0
[ 1  3  6 10 15 20 24 27 29 30]

Other methods

  1. all()/ any()Method: Determine whether all elements of the array are True/ determine whether the array has promising Trueelements.

  2. astype()Method: Copy the array and convert the elements in the array to the specified type.

  3. dump()Method: save the array to a file, you can load()create an array by loading data from the saved file through the function in NumPy.

    code:

    array31.dump('array31-data')
    array32 = np.load('array31-data', allow_pickle=True)
    array32
    

    output:

    array([[1, 2],
           [3, 4],
           [5, 6]])
    
  4. fill()Method: Fill the specified element into the array.

  5. flatten()Method: Flatten a multidimensional array into a one-dimensional array.

    code:

    array32.flatten()
    

    output:

    array([1, 2, 3, 4, 5, 6])
    
  6. nonzero()Method: Returns the index of the non-zero element.

  7. round()Method: Round the elements in the array.

  8. sort()Method: Sorts the array in-place.

    code:

    array33 = np.array([35, 96, 12, 78, 66, 54, 40, 82])
    array33.sort()
    array33
    

    output:

    array([12, 35, 40, 54, 66, 78, 82, 96])
    
  9. swapaxes()And transpose()method: Swaps the axes specified by the array.

    code:

    # 指定需要交换的两个轴,顺序无所谓
    array32.swapaxes(0, 1)
    

    output:

    array([[1, 3, 5],
          [2, 4, 6]])
    

    code:

    # 对于二维数组,transpose相当于实现了矩阵的转置
    array32.transpose()
    

    output:

    array([[1, 3, 5],
          [2, 4, 6]])
    
  10. tolist()Method: convert the array into Python list.

Guess you like

Origin blog.csdn.net/ml202187/article/details/132310318