Python basic NumPy array related concepts and operations

NumPy is an open source numerical computing extension library for Python that provides array support and corresponding efficient processing functions. It includes many functions, such as creating n-dimensional array () matrices, performing function operations on arrays, numerical integration, linear algebra calculations, Fourier transform and random number generation, etc.
Why NumPy?
Standard Python uses List (list) to save values, which can be used as an array, but because the elements in the list can be any objects, CPU computing time and memory are wasted. The birth of NumPy made up for these shortcomings, it provides two basic objects:

  • ndarry (n-dimensional array object): It is a multidimensional array that stores a single data type.
  • ufunc (universal function object): It is a function that can process arrays.
    NumPy commonly used import format:
    import numpy as np

First, the creation of array objects

1. Use the array function to create an array object:

  • The format of the array function:
    np.array(object,dtype,ndmin)
    The main parameters and usage instructions of the array function are shown in the following table:
parameter name illustrate
object Receive array, indicating the array you want to create
dtype Receive data-type, indicating the data type required by the array, if not given, select the minimum type required to save the data object, the default is None.
ndmin Receives an int, specifying the minimum dimension that the generated array should have, defaulting to None.
#创建ndarray数组:
import numpy as np
data1 = [1,3,5,7] #列表
w1 = np.array(data1)
print('w1:',w1)
data2 = (1,2,3,4) #元组
w2 = np.array(data2)
print('w2:',w2)
data3 = [[1,2,3,6],[5,6,7,8]] #多维数组
w3 = np.array(data3)
print('w3:',w3)
#Output
#w1: [1 3 5 7]
#w2: [1 2 3 4]
#w3: [[1 2 3 6]
#    [5 6 7 8]]

When creating an array, NumPy will infer a suitable data type for the newly created array and save it in dtype. When there are integers and floating-point numbers in the sequence, dtype will be defined as a floating-point number type.

import numpy as np
data1 = [1,3,5,7.3] #列表
w1 = np.array(data1)
print(w1.dtype)
print('w1:',w1)
#Output
#float64
#w1: [1.  3.  5.  7.3]

2. Functions that specifically create arrays

1) arrange function

  • arange function: Create an arithmetic one-dimensional array. The arrange function is similar to Python's built-in function range, but arrange is mainly used to create arrays.
    Format: np.arange([start, ]stop, [step, ]dtype)
#使用arrange创建数组
kk = np.arange(10)
print(kk)
#Output
#[0 1 2 3 4 5 6 7 8 9]

The arrange function can create a one-dimensional array by specifying the initial value, final value and step size, and the created array does not contain the final value.

kk = np.arange(0,15,1.5)
print(kk)
#Output
#[ 0.   1.5  3.   4.5  6.   7.5  9.  10.5 12.  13.5]

2) linspace function

When the argument of the arrange function is a floating point type, due to the limited precision of the floating point, it is usually not possible to predict or the number of array elements. For this reason, the better function linspace is usually used, which accepts the number of elements as a parameter, through Specify the initial value, final value and element format to create a one-dimensional array, which includes the final value by default.

  • linspace function: Creates an arithmetic one-dimensional array, receiving the number of elements as a parameter.
    Format: np.linspace(start, stop, num, endpoint, retstep=False, dtype=None)

    Parameter description:
    start: scalar (scalar), the starting point of the sequence
    stop: scalar, it will change according to the endpoint, the endpoint is True, If it is False, it does not include (the generated sequence is equivalent to adding 1 to the original num and generating by endpoint = True, the result only displays the first to the penultimate) num:
    int, optional (optional), generate samples quantity, must be non-negative
    endpoint: bool, optional, if true, stop is included, if false, stop is not
    retstep: bool, optional
    dtype: dtype, optional
#使用linspace函数创建数组
kk = np.linspace(1,10,4)
print(kk)
#Output
#[ 1.  4.  7. 10.]

3) logspace function

  • The logspace function is similar to the linspace function, except that it creates a geometric sequence array.
    Format: np.logspace(start, stop, num, endpoint=True,
    base=10.0, dtype=None))
    Among the parameters of logspace, **start and stop represent the power of 10, and the default base is 10. The third parameter is the number of elements.
#生成1~10的具有5个元素的等比数列数组
kk = np.logspace(0,1,5)
print(kk)
#Output
#[ 1.          1.77827941  3.16227766  5.62341325 10.        ]

4) zeros function

  • zeros function: Create an array of all 0s with a specified length or shape Format
    : np.zeros(shape, dtype=float, order='C')
#使用zeros函数创建全零矩阵。
ll = np.zeros(4)
print(ll)
print("-----------------------")
kk = np.zeros(4,float)
print(kk)
print("-----------------------")
cc = np.zeros([3,3],int)
print(cc)
#Output
#[0. 0. 0. 0.]
#-----------------------
#[0. 0. 0. 0.]
#-----------------------
#[[0 0 0]
# [0 0 0]
# [0 0 0]]

It can be seen that the default type of the zeros function is float64.

5) ones function

  • ones function: Create an array of all 1s with a specified length or shape Format
    : np.ones(shape, dtype=None, order='C')

6) diag function

  • diag function: Create a diagonal matrix.
    Format: np.diag(v, k=0)
    Parameters: v can be a one-dimensional or two-dimensional matrix, k<0 means the slash is below the matrix, k>0 means the slash is above the matrix.
kk = np.diag([1,2,3,4])
print(kk)
#Output
 # [[1 0 0 0]
 # [0 2 0 0]
 # [0 0 3 0]
 # [0 0 0 4]]

7) eye function

  • The eye(n) function can create a matrix whose diagonal is 1 and the rest are all 0, that is, the identity matrix. n is the dimension of the unit matrix.
kk = np.eye(4)
print(kk)
#Output
#[[1. 0. 0. 0.]
# [0. 1. 0. 0.]
# [0. 0. 1. 0.]
# [0. 0. 0. 1.]]

3. Properties and data conversion of ndarray objects

The ndarray object attributes created by NumPy mainly include attributes such as shape and size. The details are shown in the following table:

Attributes illustrate
it's me Rank, the number of data axes
shape the dimension of the array
size number of array elements
dtype type of data
itemsize the size in bytes of each element in the array

NumPy - data types:

serial number Data type and description
1 bool_ is stored as a one-byte boolean value (true or false)
2 int_ default integer, equivalent to C's long, usually int32 or int64
3 intc is equivalent to C's int, usually int32 or int64
4 intp integer used for indexing, equivalent to C size_t, usually int32 or int64
5 int8 bytes (-128 ~ 127)
6 int16 16-bit integer (-32768 ~ 32767)
7 int32 32-bit integer (-2147483648 ~ 2147483647)
8 int64 64-bit integer (-9223372036854775808 ~ 9223372036854775807)
9 uint8 8-bit unsigned integer (0 ~ 255)
10 uint16 16-bit unsigned integer (0 ~ 65535)
11 uint32 32-bit unsigned integer (0 ~ 4294967295)
12 uint64 64-bit unsigned integer (0 ~ 18446744073709551615)
13 Shorthand for float_ float64
14 float16 half-precision floating point: sign bit, 5-bit exponent, 10-bit mantissa
15 float32 single-precision floating point: sign bit, 8-bit exponent, 23-bit mantissa
16 float64 double-precision floating point: sign bit, 11-bit exponent, 52-bit mantissa
17 shorthand for complex_ complex128
18
19 complex128 complex number, represented by two 64-bit floats (real and imaginary)

2. Generate random numbers

In the NumPy.random module, a variety of random number generation functions are provided. For example, the randint function generates random integers in a specified range to form an array of a specified shape.
Usage: np.random.randint(low, high = None, size = None), means to generate a random integer matrix, low means the minimum value, high means the maximum value, and size is a tuple type size = shape.

#生成随机整数
kk = np.random.randint(100,200,size=(2,4)) #在100-200数据间生成一个2行4列的随机数数组
print(kk)
#Output
#[[167 189 168 135]
# [116 188 157 102]]

Similarly, there are:
1). np.random.randn( ): Generate a standard normal distribution, without fixed parameters, each additional number represents an additional dimension, Gaussian normal distribution = Gaussian distribution, distribution: is statistics learning. The median value of the standard Gaussian distribution is 0, the best range is 1-1, and the values ​​beyond the range are outliers.

2). np.random.random(size): Generate an array of elements between 0-1, the size table shape, the range of random production is between 0-1, each random number is a dimension.

3). np.random.rand(): Generate an array of elements between 0-1, which has the same function as np.random.random, random needs size to describe the shape, and rand only requires us to directly give the value, pass the value number to determine the shape.

4). np.random.normal(loc,scale,size): Generate an array with normal distribution, location is the value of location, scale is the fluctuation value, and size is the data length.

function illustrate
seed Determines the seed for the random number generator
permutation Returns a random permutation of a sequence or returns a range of random permutations
shuffle randomize a sequence
binomial Generate random numbers from the binomial distribution
normal Generates random numbers from a normal (Gaussian) distribution
beta function that produces a beta distribution
chisquare Generate random numbers from chi-square distribution
gamma Generate random numbers from gamma distribution
uniform Generate random numbers uniformly distributed in [0,1]

3. Array transformation

3.1. Array reshaping

For a defined array, the array dimension can be changed through the reshape method, and the incoming parameter is a tuple of the new dimension. One of the parameters of reshape can be set to -1, indicating that the dimension of the array can be inferred from the data itself.

kk0 = np.arange(8)
print("kk0: ",kk0)
kk1 = kk0.reshape(2,4)
print("kk1: ",kk1)
#Output
#kk0:  [0 1 2 3 4 5 6 7]
#kk1:  [[0 1 2 3]
#      [4 5 6 7]]

  • A parameter in reshape can be set to -1, indicating that the dimension of the array can be judged by the data itself.
kk0 = np.arange(15)
print("kk0: ",kk0)
kk1 = kk0.reshape(5,-1)
print("kk1: ",kk1)
#Output
#kk0:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
#kk1:  [[ 0  1  2]
#       [ 3  4  5]
#       [ 6  7  8]
#       [ 9 10 11]
#       [12 13 14]]
  • The opposite method to reshape is data spreading (ravel) or data flattening (flatten).
kk0 = np.arange(15)
print("kk0: ",kk0)
kk1 = kk0.reshape(5,-1)
print("kk1: ",kk1)
kk2 = kk1.ravel()
print("kk2: ",kk2)
kk3 = kk1.flatten()
print("kk3: ",kk3)
#Output
#kk0:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
#kk1:  [[ 0  1  2]
# [ 3  4  5]
# [ 6  7  8]
# [ 9 10 11]
# [12 13 14]]
#kk2:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
#kk3:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
  • It should be noted that data reshaping will not change the original array.

3.2. Array merge

Array merging is used for operations between multiple arrays. NumPy uses the hstack function, vstack function, and concatenate function: to complete the merging of arrays.

  • hstack function: realize horizontal merger
  • hstack function: realize horizontal merger
  • Concatenate function: can realize horizontal or vertical merging of arrays, horizontal merging is performed when the parameter axis=1, and vertical merging is performed when axis=0.
#横向合并:
kk1 = np.arange(6).reshape(3,2)
print(kk1)
print("----------")
kk2 = kk1 * 2
print(kk2)
print("----------")
kk3 = np.hstack((kk1,kk2))
print(kk3)
#Output
#[[0 1]
# [2 3]
# [4 5]]
#----------
#[[ 0  2]
# [ 4  6]
# [ 8 10]]
#----------
#[[ 0  1  0  2]
# [ 2  3  4  6]
# [ 4  5  8 10]]

3.3. Array segmentation

与数组合并相反,NumPy提供了hsplit函数、vsplit函数和split函数分别实现数组的横向、纵向和指定方向的分割。

arr = np.arange(16).reshape(4,4)
print('横向分割为:\n',np.hsplit(arr,2))
print('纵向组合为:\n',np.vsplit(arr,2))
#Output
#横向分割为:
# [array([[ 0,  1],
#       [ 4,  5],
#       [ 8,  9],
#       [12, 13]]), array([[ 2,  3],
#       [ 6,  7],
#       [10, 11],
#       [14, 15]])]
#纵向组合为:
# [array([[0, 1, 2, 3],
#       [4, 5, 6, 7]]), array([[ 8,  9, 10, 11],
#       [12, 13, 14, 15]])]

同样,split在参数axis = 1时实现数组的横向分割,axis = 0时则进行纵向分割。

3.4.数组转置和轴对换

数组转置是数组重塑的一种特殊形式,可以通过transpose方法进行转置。transpose 方法需要传入轴编号组成的元组。除了使用transpose外,也可以直接利用数组的T属性进行数组转置。

kk = np.arange(6).reshape(3,2)
print('矩阵:',kk)
print('-------------')
print('转置矩阵:',kk.transpose(1,0)) //# np.transpose(kk))

#Output
# 矩阵: [[0 1]
#  [2 3]
#  [4 5]]
# -------------
# 转置矩阵: [[0 2 4]
#  [1 3 5]]

数组的T属性转置:

kk = np.arange(6).reshape(3,2)
print('矩阵:',kk)
print('-------------')
print('转置矩阵:',kk.T)
#Output
# 矩阵: [[0 1]
#  [2 3]
#  [4 5]]
# -------------
# 转置矩阵: [[0 2 4]
#  [1 3 5]]

ndarray 的 swapaxes 方法实现轴对换:

kk = np.arange(6).reshape(3,2)
print('矩阵:',kk)
print('-------------')
print('轴对换:',kk.swapaxes(0,1))
#Output
# 矩阵: [[0 1]
#  [2 3]
#  [4 5]]
# -------------
# 轴对换: [[0 2 4]
#  [1 3 5]]

四、数组的索引和切片

4.1 一维数组的索引

一维数组的索引类似Python中的列表。

kk = np.arange(10)
print(kk)
print(kk[2])
print(kk[-1])
print(kk[2:5])
#Output
# [0 1 2 3 4 5 6 7 8 9]
# 2
# 9
# [2 3 4]

数组的切片返回的是原始数组的视图,不会产生新的数据,如果需要的并非视图而是要复制数据,则可以通过copy方法实现。

kk = np.arange(10)
print(kk)
kk1 = kk[1:3].copy()
print(kk1)
#Output
#[0 1 2 3 4 5 6 7 8 9]
#[1 2]

4.2 多维数组的索引

  • 对于多维数组,它的每一个维度都有一个索引,各个维度的索引之间用逗号分隔。
  • 也可以使用整数函数和布尔值索引访问多维数组。
kk = np.arange(12).reshape(3,4)
print(kk)
print(kk[0,1:3]) #索引第0行中第1列到第2列的元素
print(kk[ :,2]) #索引第2列元素
print(kk[:1,:1]) #索引第0行第0列元素
#Output
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
# [1 2]
# [ 2  6 10]
# [[0]]

4.3 访问多维数组

kk = np.arange(12).reshape(3,4)
print(kk)
print('索引结果1: ',kk[(0,1),(1,3)]) #从两个序列的对应位置取出两个整数来组成下标:kk[0,1],kk[1,3]
print('索引结果2: ',kk[1:2,(0,2,3)]) #索引第一行中的0、2、3列元素
#Output
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
# 索引结果1:  [1 7]
# 索引结果2:  [[4 6 7]]

五、NumPy中的数据统计与分析

在NumPy中,数组运算更为简洁而快速,通常比等价的Python方式快很多,尤其在处理数组统计计算与分析的情况下。

5.1 排序

NumPy 的排序方式有直接排序和间接排序。直接排序是对数据直接进行排序,间接排序是指根据一个或多个键值对数据进行排序。在NumPy中,直接排序使用函数sort,间接排序使用argsort函数和lexsort函数。

  • Sort函数对数据直接进行排序,调用改变原始数组,无返回值。
    格式:numpy.sort(a, axis, kind, order)
    主要参数及其说明见下表:
参数 使用说明
a 要排序的数组
kind 排序算法,默认为’quicksort’
order 排序的字段名,可指定字段排序,默认为None
axis 使用sort函数可以沿着指定轴对数据集进行排序,axis = 1 为沿横轴排序,axis = 0 为沿纵轴排序,axis = None,将数组平坦化之后排序

例1:使用sort 函数进行排序。

kk = np.array([1,4,3,2,5,6,7,8,9])
print("原数组:",kk)
kk.sort()
print("排序后的数组:",kk)
#Output
#原数组: [1 4 3 2 5 6 7 8 9]
#排序后的数组: [1 2 3 4 5 6 7 8 9]

例2:带轴向参数的sort 排序。

kk = np.array([[4,2,9,5],[6,4,8,3],[1,6,2,4]])
print("原数组:",kk)
kk.sort(axis=1) #沿横向排序
print("排序后的数组:",kk)
#Output
# 原数组: [[4 2 9 5]
#  [6 4 8 3]
#  [1 6 2 4]]
# 排序后的数组: [[2 4 5 9]
#  [3 4 6 8]
#  [1 2 4 6]]

np.argsort函数和np.lexsort函数根据一个或多个键值对数据集进行排序。

  • np.argsort(): 返回的是数组值从小到大的索引值。
  • np.lexsort(): 返回值是按照最后一个传入数据排序的结果。

例3:使用argsort 函数进行排序。

kk = np.array([1,4,3,2,5,6,7,8,9])
print("原数组:",kk)
ll = kk.argsort()
print("排序后的数组:",kk)
print("数组下标:",ll) #返回值为数组排序后的下标排列
#Output
#原数组: [1 4 3 2 5 6 7 8 9]
#排序后的数组: [1 4 3 2 5 6 7 8 9]
#数组下标: [0 3 2 1 4 5 6 7 8]

可以看出来,argsort 函数仅仅只是将排序后数组的的下标进行展示,而原有数组不受影响。

例4:使用lexsort 函数进行排序。

a = [2,5,8,4,3,7,6]
b = [9,4,0,4,0,2,1]
c = np.lexsort((a,b))
print(c)
#Output
#[4 2 6 5 3 1 0]

lexsort()分为三个步骤
​ 1.将索引与之一一对应。(a,b同时对应即可)
2.​ 因为lexsort((a,b))中,以b为基准,所以将b排序,再用索引与之对应。
3.​ 当索引遇见相同元素时,以a中元素的大小顺序排序。

5.2 重复数据与去重

在数理统计分析中,需要提前将重复数据剔除,在NumPy中,可以通过unique 函数找到数组中的唯一值并返回已排序的结果。

数组内数据去重。(unique(a))

a = np.array(['red','blue','yellow','red','red','white'])
print("原数组:",a)
print("去重后的数组:",np.uni
#Output
#原数组: ['red' 'blue' 'yellow' 'red' 'red' 'white']
#去重后的数组: ['blue' 'red' 'white' 'yellow']

统计分析中有时候需要把一个数据重复若干次,使用tile和repeat函数即可实现此功能。

  • tile函数的格式:np.tile(A, reps)

其中,参数A表示要重复的数组,reps表示重复次数。

  • repeat函数的格式:np.repeat(A, reps, axis = None)

其中, “a”: 是需要重复的数组元素,“repeats”: 是重复次数, “axis”: 指定沿着哪个轴进行重复,axis = 0表示按行进行元素重复;axis = 1表示按列进行元素重复。

#使用tile 函数实现数据重复。
kk = np.array([1,2,3])
print(kk)
ll = np.tile(kk,3) #将kk数组重复三次
print(ll)
#Output
#[1 2 3]
#[1 2 3 1 2 3 1 2 3]
#使用repeat 函数实现数据重复。
kk = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(kk)
print('------------')
# ll = np.tile(kk,3) #将kk数组重复三次
# print(ll)
ll = np.repeat(kk,2,axis=1)
print(ll)
print('------------')
ss = np.repeat(kk,2,axis=0)
print(ss)
#Output
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]
# ------------
# [[1 1 2 2 3 3]  #按列进行重复
#  [4 4 5 5 6 6]
#  [7 7 8 8 9 9]]
# ------------
# [[1 2 3]        #按行进行重复
#  [1 2 3]
#  [4 5 6]
#  [4 5 6]
#  [7 8 9]
#  [7 8 9]]

5.3 常用统计函数

NumPy中提供了很多用于统计分析的函数,常见的有sum、mean、std、var、min和max等。
几乎所有的统计函数在针对二维数组的时候需要注意轴的概念。axis=0时表示沿着纵轴进行计算,axis=1时沿横轴进行计算。用法:np.函数(数组)

函数 说明
np.sum(a) 数组的和
np.sum(a,axis = 0) 数组纵轴的和
np.sum(a,axis = 0) 数组横轴的和
np.mean(a) 数组的均值
np.mean(a,axis = 0) 数组纵轴的均值
np.mean(a,axis = 1) 数组横轴的均值
np.std(a) 数组的标准差
np.var(a) 数组的标准差
np.average(a) 数组的加权平均值
np.percentile(a) 数组的分位数
np.ptp(a) 数组的极差值
np.median(a) 数组的中位数

Guess you like

Origin blog.csdn.net/chenjh027/article/details/127936067