(Record) Python Machine Learning - NumPy Basics

1. Install NumPy

Install locally NumPy, enter the command line, and enter pip install numpyit.

NumPy has done a lot of optimizations in terms of vector calculations. The core data structure is ndarray (full name is N-Dimension Array) is an N-dimensional array. ndarray is homogeneous. Homogeneity means that all elements in the N-dimensional array must belong to the same data type. (PS: lists in python are heterogeneous). The basic attributes include shape, ndim, size, and dtype.

shape: the shape of the ndarray object, represented by a tuple;

ndim: the dimension of the ndarray object;

size: the number of elements in the ndarray object;

dtype: The data type of the elements in the ndarray object, such as int64, float32, etc.

# 导入numpy并取别名为np
import numpy as np
# 构造ndarray,15个元素,3行5列
a = np.arange(15).reshape(3, 5)
# 打印a的shape,ndim,size,dtype
print(a.shape)
#shape是(3, 5)(代表3行5列);
print(a.ndim)
#ndim为2,行和列是2维的
print(a.size)
#size是15(因为矩阵总共有15个元素);
print(a.dtype)
#dtype是int32(因为矩阵中元素都是整数,并且用32位整型足够表示矩阵中的元素)。


Second, instantiate ndarray

The most commonly used functions are array, zeros, ones, and empty.

(1) Use the array function to instantiate an ndarray object

Use the array function in NumPy to instantiate an ndarray object with the value in the list as the initial value.

a = np.array([2,3,4])

(2) Instantiate the ndarray object using the zeros function

Just pass the ndarray shape as a parameter. code show as below:

# 实例化ndarray对象a,a是一个3行4列的矩阵,矩阵中元素全为0
a = np.zeros((3, 4))

(3) Instantiate the ndarray object using the ones function

# 实例化ndarray对象a,a是一个3行4列的矩阵,矩阵中元素全为1
a = np.ones((3, 4))

(4) Use the empty function to instantiate the ndarray object with a random value as the initial value

# 实例化ndarray对象a,a是一个2行3列的矩阵,矩阵中元素全为随机值
a = np.empty((2, 3)) 

Three, try

  • shapendarray: for the object that needs to be instantiated shape;

  • datandarray: Indicates the value of the element in the object that needs to be instantiated .

For example: the {'shape':[1, 2], 'data':[[1, 2]]}shape ndarrayof the object is 1row and column2 , the value of the 1row and column , and the value of the row and column .11122

Test input:{'shape':[1, 2], 'data':[[1, 2]]}

Expected output:[[1 2]]



def print_ndarray(input_data):
    '''
    实例化ndarray对象并打印
    :param input_data: 测试用例,类型为字典类型
    :return: None
    '''

    #********* Begin *********#
    a=input_data#输入一个数组的data部分
    b = np.array(a['data'])
    print(b)
    #********* End *********#

1. Change ndarraythe shape of the object

(1)reshape函数 The original shape will not be changed ndarray, but the source will ndarraybe deep copied and deformed, and finally the transformed array will be returned. That is if the code is np.reshape(a, (4, 3))then athe shape will not be modified!


    import numpy as np
    a = np.zeros((3, 4))
    # 调用a的成员函数reshape将3行4列改成4行3列
    a = a.reshape((4, 3))

(2) Use resizefunction: directly change ndarraythe shape of the source

    import numpy as np
    a = np.zeros((3, 4))
    # 将a从3行4列的二维数组变成一个有12个元素的一维数组
    a.resize(12)

(3) Tips: a 6row8 and column ndarray, and then want to transform it into a 2column ndarray( I don’t bother to think about the number of rows ), at this time we can upload one in the dimension of the row -1.

import numpy as np

a = np.zeros((6, 8))

# 行的维度上填-1,会让numpy自己去推算出行的数量,很明显,行的数量应该是24
a = a.reshape((-1, 2))

PS: -1It's good, but don't be greedy! If the code is changed a = a.reshape((-1, -1)), NumPyit will think that you are making things difficult for him and throw an exception to you ValueError: can only specify one unknown dimension.

Two, try

Test Input: [[1, 2, 3], [4, 5, 6]]Expected Output:[1, 2, 3, 4, 5, 6]

import numpy as np

def reshape_ndarray(input_data):
    '''
    将ipnut_data转换成ndarray后将其变形成一位数组并打印
    :param input_data: 测试用例,类型为list
    :return: None
    '''

    #********* Begin *********#
    a=np.array(input_data) #实例化一个数组对象
    print(a.reshape(-1))
    #********* End *********#

1. Operation

(1) Arithmetic operations

    import numpy as np
    a = np.array([0, 1, 2, 3])
    # a中的所有元素都加2,结果为[2, 3, 4, 5]
    b = a + 2
    # a中的所有元素都减2,结果为[-2, -1, 0, 1]
    c = a - 2
    # a中的所有元素都乘以2,结果为[0, 2, 4, 6]
    d = a * 2
    # a中的所有元素都平方,结果为[0, 1, 4, 9]
    e = a ** 2
    # a中的所有元素都除以2,结果为[0, 0.5, 1, 1.5]
    f = a / 2
    # a中的所有元素都与2比,结果为[True, True, False, False]
    g = a < 2

(2) Matrix operation

    import numpy as np
    a = np.array([[0, 1], [2, 3]])
    b = np.array([[1, 1], [3, 2]])
    # a与b逐个元素相加,结果为[[1, 2], [5, 5]]
    c = a + b
    # a与b逐个元素相减,结果为[[-1, 0], [-1, 1]]
    d = a - b
    # a与b逐个元素相乘,结果为[[0, 1], [6, 6]]
    e = a * b
    # a的逐个元素除以b的逐个元素,结果为[[0., 1.], [0.66666667, 1.5]]
    f = a / b
    # a与b逐个元素做幂运算,结果为[[0, 1], [8, 9]]
    g = a ** b
    # a与b逐个元素相比较,结果为[[True, False], [True, False]]
    h = a < b

Use @ and dotfunctions to implement matrix multiplication.

    import numpy as np
    A = np.array([[1, 1], [0, 1]])
    B = np.array([[2, 0], [3, 4]])
    # @表示矩阵乘法,矩阵A乘以矩阵B,结果为[[5, 4], [3, 4]]
    print(A @ B)
    # 面向对象风格,矩阵A乘以矩阵B,结果为[[5, 4], [3, 4]]
    print(A.dot(B))
    # 面向过程风格,矩阵A乘以矩阵B,结果为[[5, 4], [3, 4]]
    print(np.dot(A, B))

(3) Simple statistics

sum, min, max, argmin, argmaxand other functions to achieve simple statistical functions

    import numpy as np
    a = np.array([[-1, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 13]])
    # 计算a中所有元素的和,结果为67
    print(a.sum())
    # 找出a中最大的元素,结果为13
    print(a.max())
    # 找出a中最小的元素,结果为-1
    print(a.min())
    # 找出a中最大元素在a中的位置,由于a中有12个元素,位置从0开始计,所以结果为11
    print(a.argmax())
    # 找出a中最小元素在a中位置,结果为0
    print(a.argmin())

Sometimes, we need to count according to the axis when counting. For example, the basic salary, performance salary, and year-end bonus information of the company's employees are as follows:

Job number Basic wage Performance pay Year-end awards
1 3000 4000 20000
2 2700 5500 25000
3 2800 3000 15000

Such a table is obvious and can ndarraybe used for storage.

import numpy as np

info = np.array([[3000, 4000, 20000], [2700, 5500, 25000], [2800, 3000, 15000]])

infoAfter instantiation, there is the concept of dimension and axis. Obviously, infoit is a two-dimensional array, so its dimension2 is . 2In other words, the dimension infois two axes: 0the number axis and 1the number axis (the number of the axis 0is counted from the beginning) . The axes point in the direction shown in the diagram below:

If you want to count the minimum and maximum values ​​of the basic salary, performance salary and year-end bonus of 3this employee ( that is to say, count the minimum and maximum values ​​in each column ). We can 0count along the number axis. To achieve statistics along which axis, you only need to modify it axis.

import numpy as np

info = np.array([[3000, 4000, 20000], [2700, 5500, 25000], [2800, 3000, 15000]])

# 沿着0号轴统计,结果为[2700, 3000, 15000]
print(info.min(axis=0))

# 沿着0号轴统计,结果为[3000, 5500, 25000]
print(info.max(axis=0))

PS: When not modified axis, axisthe value of is defaulted to None. ndarrayIt means that all elements in the object will be taken into account when counting .

Two, try

Test Input: [[0.2, 0.7, 0.1], [0.1, 0.3, 0.6]]Expected Output:[1 2]

import numpy as np


def get_answer(input_data):
    '''
    将input_data转换成ndarray后统计每一行中最大值的位置并打印
    :param input_data: 测试用例,类型为list
    :return: None
    '''

    #********* Begin *********#
    a = np.array(input_data)
    print(a.argmax(axis=1))# 沿着1号轴统计
    #********* End *********#

1. Simple random number generation

NumPyrandomMany functions for generating random numbers are provided under the module. If there is no requirement for the probability distribution of random numbers, you can usually use functions such as random_sample, choice, randintand so on to realize the function of generating random numbers.

(1)random_sample

It is used to generate [0, 1]a random number with an interval of , and the parameters that need to be filled sizein represent the shape of the generated random number. For example, size=[2, 3]a 2row3 and column will be generated ndarrayand filled with random values.

    import numpy as np
    '''
    结果可能为[[0.32343809, 0.38736262, 0.42413616]
              [0.86190206, 0.27183736, 0.12824812]]
    '''
    print(np.random.random_sample(size=[2, 3]))

 (2)choice

If you want to simulate random values ​​such as dice rolls and coin tosses, which are discrete values, and you know the range, you can use choiceit. choiceThe main parameters of are a, sizeand replace. aIs a one-dimensional array, which means you want to pick randomly afrom it ; sizeit is the shape of the random number generated. If you simulate 5a dice roll, replaceit is used to set whether the same element can be taken, Truewhich means that the same number can be taken; Falseit means that the same number cannot be taken, and the default isTrue。

    import numpy as np
    '''
    掷骰子时可能出现的点数为1, 2, 3, 4, 5, 6,所以a=[1,2,3,4,5,6]
    模拟5此掷骰子所以size=5
    结果可能为 [1 4 2 3 6]
    '''
    print(np.random.choice(a=[1, 2, 3, 4, 5, 6], size=5,replace=False))

(3)randint

randintThe function is choicesimilar to that, except that randintit can only generate integers, and choicethe generated numbers are arelated to . If athere are floating-point numbers, there choiceis a probability that floating-point numbers will be selected.

randintThe parameters have 3a, respectively low, highand size. Among them low, it represents the minimum value that can be generated when the random number is generated, and highrepresents the maximum value that can be generated when the random number is generated minus 1. That is to say, randintthe interval of the generated random number is [low, high). If you simulate 5a dice roll, the code is as follows:

    import numpy as np
    '''
    掷骰子时可能出现的点数为1, 2, 3, 4, 5, 6,所以low=1,high=7
    模拟5此掷骰子所以size=5
    结果可能为 [6, 4, 3, 1, 3]
    '''
    print(np.random.randint(low=1, high=7, size=5)

(4) Probability distribution random number generation

The Gaussian distribution is also known as the normal distribution, and its distribution graph is as follows:

In the figure above, the horizontal axis is the value of the random variable ( here it can be regarded as the generated random value ), and the vertical axis represents the probability corresponding to the random variable ( here it can be regarded as the probability that the random value is selected ). To generate random values ​​based on a Gaussian distribution, you can use normalfunctions.

    import numpy as np
    '''
    根据高斯分布生成5个随机数
    结果可能为:[1.2315868, 0.45479902, 0.24923969, 0.42976352, -0.68786445]
    从结果可以看出0.4左右得值出现的次数比较高,1和-0.7左右的值出现的次数比较低。
    '''
    print(np.random.normal(size=5))

normalIn addition to the parameters of the function size, there are two more important parameters are locand scale, which represent the mean and variance of the Gaussian distribution respectively . locThe position of the point with the highest probability in the distribution of influence, assuming that loc=2, then the position of the point with the highest probability in the distribution is 2. The figure below shows locthe impact on the distribution, where blue fis distributed loc=0and red is distributed loc=5.

scaleIt affects the fatness and thinness of the distribution graph, scalethe smaller it is, the taller and thinner the distribution is, and scalethe larger it is, the shorter and fatter the distribution will be. The figure below shows scalethe impact on the distribution, where blue is distributed scale=0.5and red is distributed scale=1.0.

Therefore, if you want to generate a random value based on a Gaussian distribution 1with mean and variance , the code is as follows:105

import numpy as np

print(np.random.normal(loc=1, scale=10, size=5))

(5) Random seed

A random number generated by a computer is a value calculated by a random seed according to a certain calculation method. So as long as the calculation method is fixed and the random seed is fixed, the generated random number will not change!

If you want to keep the random number generated each time unchanged, you need to set a random seed ( random seed is actually an integer 0)232−1 . Setting the random seed is long and simple, just call seedthe function and set the random seed.

import numpy as np
# 设置随机种子为233
np.random.seed(seed=233)
data = [1, 2, 3, 4]
# 随机从data中挑选数字,结果为4
print(np.random.choice(data))
# 随机从data中挑选数字,结果为4

(6) Index

ndarrayThe index of is actually very similar to the index pythonof list. The index of the element 0starting from .

    import numpy as np
    # a中有4个元素,那么这些元素的索引分别为0,1,2,3
    a = np.array([2, 15, 3, 7])
    # 打印第2个元素
    # 索引1表示的是a中的第2个元素
    # 结果为15
    print(a[1])
    # b是个2行3列的二维数组
    b = np.array([[1, 2, 3], [4, 5, 6]])
    # 打印b中的第1行
    # 总共就2行,所以行的索引分别为0,1
    # 结果为[1, 2, 3]
    print(b[0])
    # 打印b中的第2行第2列的元素
    # 结果为5
    print(b[1][1])

(7) traverse

ndarrayThe traversal method of is also very similar to pythonthe listtraversal method of

    import numpy as np
    a = np.array([2, 15, 3, 7])
    # 使用for循环将a中的元素取出来后打印
    for element in a:
        print(element)
    # 根据索引遍历a中的元素并打印
    for idx in range(len(a)):
        print(a[idx])
    # b是个2行3列的二维数组
    b = np.array([[1, 2, 3], [4, 5, 6]])
    # 将b展成一维数组后遍历并打印
    for element in b.flat:
        print(element)
    # 根据索引遍历b中的元素并打印
    for i in range(len(b)):
        for j in range(len(b[0])):
            print(b[i][j])

(8) slice

Assuming that you want to slice out the purple part in the figure below, you need to determine the range of rows and the range of columns. Since the range of the purple part of the row is 0to 2, the index range of the row when slicing is 0:3( the index range is left-closed and right-open ); and because the range of the purple part of the column is also 0to 2, the index range of the column when slicing is also 0:3( the index range is left close right open ). Finally, the index range of the row and column is integrated [0:3, 0:3]( ,the left is the index range of the row ). Of course, sometimes for convenience, 0it can be omitted, that is [:3, :3].

    import numpy as np
    # a中有4个元素,那么这些元素的索引分别为0,1,2,3
    a = np.array([2, 15, 3, 7])
    '''
    将索引从1开始到最后的所有元素切片出来并打印
    结果为[15  3  7]
    '''
    print(a[1:])
    '''
    将从倒数第2个开始到最后的所有元素切片出来并打印
    结果为[3  7]
    '''
    print(a[-2:])
    '''
    将所有元素倒序切片并打印
    结果为[ 7  3 15  2]
    '''
    print(a[::-1])
    # b是个2行3列的二维数组
    b = np.array([[1, 2, 3], [4, 5, 6]])
    '''
    将第2行的第2列到第3列的所有元素切片并打印
    结果为[[5 6]]
    '''
    print(b[1:, 1:3])
    '''
    将第2列到第3列的所有元素切片并打印
    结果为[[2 3]
          [5 6]]
    '''
    print(b[:, 1:3])

Guess you like

Origin blog.csdn.net/qq_43659681/article/details/130175361