foreword
Some time ago, because python was used for a large amount of data visualization, in terms of data processing, the numpy module was selected for operation. Now I will organize and share some of the numpy operations used in it.
Article directory
- 1.numpy read csv file
- 2. Data manipulation
-
- 2.1 Data Slicing
- 2.2 Convert data type
- 2.3 Combination
- 2.4 Finding the extreme value
- 2.5 Generate arrays with equal steps
- 2.6 Constructing Matrix
- 2.7 Addition, subtraction, multiplication and division
- 2.8 Trigonometric functions
- 2.9 Summing
- 2.10 Finding the mean
- 2.11 Random numbers
- 2.12 Inversion
- 2.13 Transpose
1.numpy read csv file
First look at the content of my csv file: it can be opened directly by using the loadtxt
method of numpy , but we can open it in many ways by adjusting the parameters of the loadtxt method.
1.1 Direct reading
import numpy as np
p = r'E:\Dev\python\numpy-test\202007150000.txt'
# 直接读取格式化文本文件
npData = np.loadtxt(p,str,delimiter = ",")
print (npData)
[['绔欑偣' 'Lon' 'Lat' ... 'Avg_2' 'VIS' 'PRS_Sea']
['54065' ' 125.65' ' 44.53' ... ' 2.10' ' 4000.00' ' 1004.60']
['58631' ' 118.50' ' 28.90' ... ' 0.60' ' 3100.00' ' 1005.30']
...
['59559' ' 120.75' ' 22.00' ... '999999.00' ' 15000.00' ' 1008.10']
['58974' ' 122.07' ' 25.63' ... '999999.00' ' 30000.00' ' 1007.00']
['58968' ' 121.52' ' 25.03' ... '999999.00' ' 25000.00' ' 1006.40']]
1.2 Chinese Garbled Code Processing
# 中文乱码问题
with open(p,encoding = 'utf-8') as f:
npData = np.loadtxt(f,str,delimiter = ",")
print (npData)
[['站点' 'Lon' 'Lat' ... 'Avg_2' 'VIS' 'PRS_Sea']
['54065' ' 125.65' ' 44.53' ... ' 2.10' ' 4000.00' ' 1004.60']
['58631' ' 118.50' ' 28.90' ... ' 0.60' ' 3100.00' ' 1005.30']
...
['59559' ' 120.75' ' 22.00' ... '999999.00' ' 15000.00' ' 1008.10']
['58974' ' 122.07' ' 25.63' ... '999999.00' ' 30000.00' ' 1007.00']
['58968' ' 121.52' ' 25.03' ... '999999.00' ' 25000.00' ' 1006.40']]
1.3 Skip the first line
skiprows = 1
#跳过第1行 skiprows = 1
with open(p,encoding = 'utf-8') as f:
npData = np.loadtxt(f,str, delimiter = ",", skiprows = 1)
print (npData)
[['54065' ' 125.65' ' 44.53' ... ' 2.10' ' 4000.00' ' 1004.60']
['58631' ' 118.50' ' 28.90' ... ' 0.60' ' 3100.00' ' 1005.30']
['57671' ' 112.37' ' 28.85' ... ' 0.90' ' 1200.00' ' 1005.30']
...
['59559' ' 120.75' ' 22.00' ... '999999.00' ' 15000.00' ' 1008.10']
['58974' ' 122.07' ' 25.63' ... '999999.00' ' 30000.00' ' 1007.00']
['58968' ' 121.52' ' 25.03' ... '999999.00' ' 25000.00' ' 1006.40']]
Then when skiprows = 2, similar, skip the first two rows
1.4 Open with float type
# 以float形式 ,float 也是打开的时默认的数据类型
with open(p,encoding = 'utf-8') as f:
npData = np.loadtxt(f,float,delimiter = ",", skiprows = 1)
print (npData)
[[5.40650e+04 1.25650e+02 4.45300e+01 ... 2.10000e+00 4.00000e+03
1.00460e+03]
[5.86310e+04 1.18500e+02 2.89000e+01 ... 6.00000e-01 3.10000e+03
1.00530e+03]
[5.76710e+04 1.12370e+02 2.88500e+01 ... 9.00000e-01 1.20000e+03
1.00530e+03]
...
[5.95590e+04 1.20750e+02 2.20000e+01 ... 9.99999e+05 1.50000e+04
1.00810e+03]
[5.89740e+04 1.22070e+02 2.56300e+01 ... 9.99999e+05 3.00000e+04
1.00700e+03]
[5.89680e+04 1.21520e+02 2.50300e+01 ... 9.99999e+05 2.50000e+04
1.00640e+03]]
If the first line is not skipped, since the first line contains Chinese, an error will be reported when opening.
1.5 You can also open the formatted list
# 载入格式化了的list
with open(p, encoding='utf-8') as f:
lines = f.readlines()
headerLines = lines[:1]
dataLines = lines[1:300]
print(type(dataLines))
nfData = np.loadtxt(dataLines,delimiter = ",")
print (npData)
<class 'list'>
[[5.40650e+04 1.25650e+02 4.45300e+01 ... 2.10000e+00 4.00000e+03
1.00460e+03]
[5.86310e+04 1.18500e+02 2.89000e+01 ... 6.00000e-01 3.10000e+03
1.00530e+03]
[5.76710e+04 1.12370e+02 2.88500e+01 ... 9.00000e-01 1.20000e+03
1.00530e+03]
...
[5.95590e+04 1.20750e+02 2.20000e+01 ... 9.99999e+05 1.50000e+04
1.00810e+03]
[5.89740e+04 1.22070e+02 2.56300e+01 ... 9.99999e+05 3.00000e+04
1.00700e+03]
[5.89680e+04 1.21520e+02 2.50300e+01 ... 9.99999e+05 2.50000e+04
1.00640e+03]]
1.6 Specify the data type of each column
dtype parameter
# 指定每一列的数据类型,结果返回的是一维
数组,数组内部为元组
#https://numpy.org/doc/1.18/reference/generated/numpy.loadtxt.html
with open(p,encoding = 'utf-8') as f:
lines = f.readlines()
header = lines[1]
data = lines[1:]
fields = [name for name in header.split(' ') if name !='']
# 指定第一列为int类型,后面列为float类型
types = ['int'] + ['f4'] * (len(fields) - 1)
npData = np.loadtxt(data,dtype={
'names':fields,'formats': types},delimiter = ",")
print (npData)
[(54065, 125.65, 44.53, 1.69100e+02, 985.2, 25.3, 7.90000e+01, 3.00000e-01, 1.90000e+01, 2.10000e+00, 4000., 1004.6)
(58631, 118.5 , 28.9 , 1.37000e+02, 989.7, 27.6, 8.20000e+01, 0.00000e+00, 1.09000e+02, 6.00000e-01, 3100., 1005.3)
(57671, 112.37, 28.85, 3.70000e+01, 1000.9, 25.2, 9.90000e+01, 0.00000e+00, 3.46000e+02, 9.00000e-01, 1200., 1005.3)
...
(59559, 120.75, 22. , 9.99999e+05, 1005.3, 26.6, 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 15000., 1008.1)
(58974, 122.07, 25.63, 1.04600e+02, 995.2, 29. , 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 30000., 1007. )
(58968, 121.52, 25.03, 7.10000e+00, 1003. , 31.1, 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 25000., 1006.4)]
In this way, the result is returned as a one-dimensional array, and the inside of the array is a tuple
1.7 Read the specified column
usecols parameter
# 读取指定列 usecols
npData = np.loadtxt(p,delimiter = ",", skiprows = 1, usecols=(0, 2))
print(npData)
[[5.4065e+04 4.4530e+01]
[5.8631e+04 2.8900e+01]
[5.7671e+04 2.8850e+01]
...
[5.9559e+04 2.2000e+01]
[5.8974e+04 2.5630e+01]
[5.8968e+04 2.5030e+01]]
2. Data manipulation
2.1 Data Slicing
# 切片
npData = np.loadtxt(p,delimiter = ",", skiprows = 1)
# 二维数组第一列,
col0 = npData[:,0] #索引从0开始
print('----------第1列----------')
print (col0)
# 二维数组第二行
row2 = npData[1,:]
print('----------第2行----------')
print (row2)
# 取具体位置的元素
n,m = 3,2
item = npData[n,m] # 行索引在前,列索引在后, 第4行第3列元素
print('----------第4行第3列元素----------')
print(item)
----------第1列----------
[54065. 58631. 57671. ... 59559. 58974. 58968.]
----------第2行----------
[5.8631e+04 1.1850e+02 2.8900e+01 1.3700e+02 9.8970e+02 2.7600e+01
8.2000e+01 0.0000e+00 1.0900e+02 6.0000e-01 3.1000e+03 1.0053e+03]
----------第4行第3列元素----------
26.17
2.2 Convert data type
astype method
# 数据类型转换
npData = np.loadtxt(p,str,delimiter = ",", skiprows = 1)
# 取第3列
col2 = npData[:,2]
# 将第3列数据转为float32
newCol2 = col2.astype(np.float32)
print(col2)
print('----------str转float----------')
print(newCol2)
[' 44.53' ' 28.90' ' 28.85' ... ' 22.00' ' 25.63' ' 25.03']
----------str转float----------
[44.53 28.9 28.85 ... 22. 25.63 25.03]
2.3 Combination
c_method
# 组合 c_
npData = np.loadtxt(p,delimiter = ",", skiprows = 1)
x = npData[:,1]
y = npData[:,2]
points = np.c_[x,y]
print (points)
[[125.65 44.53]
[118.5 28.9 ]
[112.37 28.85]
...
[120.75 22. ]
[122.07 25.63]
[121.52 25.03]]
2.4 Finding the extreme value
# 取最大最小值 min、max
minX=np.min(x, axis=0) #axis=0, 获取列最大, axis=1时获取行最大
minY=np.min(y, axis=0)
maxX=np.max(x, axis=0)
maxY=np.max(y, axis=0)
print(minX)
print(minY)
print(maxX)
print(maxY)
75.25
18.65
134.42
53.47
2.5 Generate arrays with equal steps
linspace method
# 生成等步长的数组 linspace,linespace第一个参数指定开始数,第二给参数指定结束数,第三给参照指定结果列表的长度
r1 = np.linspace(10, 30, 5)
r2 = np.linspace(10, 30, 8)
print(r1)
print(r2)
[10. 15. 20. 25. 30.]
[10. 12.85714286 15.71428571 18.57142857 21.42857143 24.28571429
27.14285714 30. ]
2.6 Constructing Matrix
meshgrid
# 构造一个矩阵
x = np.linspace(100, 140, 5)
y = np.linspace(30, 60, 10)
grid = np.meshgrid(x, y)
print(grid)
X,Y = grid
print('----------X----------')
print(X)
print(X.shape)
print('----------Y----------')
print(Y)
print(Y.shape)
print('----------网格第3行第4列坐标----------')
xIndex =2
yIndex= 3
print(X[xIndex,yIndex])
print(Y[xIndex,yIndex])
[array([[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.],
[100., 110., 120., 130., 140.]]), array([[30. , 30. , 30. , 30. , 30. ],
[33.33333333, 33.33333333, 33.33333333, 33.33333333, 33.33333333],
[36.66666667, 36.66666667, 36.66666667, 36.66666667, 36.66666667],
[40. , 40. , 40. , 40. , 40. ],
[43.33333333, 43.33333333, 43.33333333, 43.33333333, 43.33333333],
[46.66666667, 46.66666667, 46.66666667, 46.66666667, 46.66666667],
[50. , 50. , 50. , 50. , 50. ],
[53.33333333, 53.33333333, 53.33333333, 53.33333333, 53.33333333],
[56.66666667, 56.66666667, 56.66666667, 56.66666667, 56.66666667],
[60. , 60. , 60. , 60. , 60. ]])]
----------X----------
[[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]
[100. 110. 120. 130. 140.]]
(10, 5)
----------Y----------
[[30. 30. 30. 30. 30. ]
[33.33333333 33.33333333 33.33333333 33.33333333 33.33333333]
[36.66666667 36.66666667 36.66666667 36.66666667 36.66666667]
[40. 40. 40. 40. 40. ]
[43.33333333 43.33333333 43.33333333 43.33333333 43.33333333]
[46.66666667 46.66666667 46.66666667 46.66666667 46.66666667]
[50. 50. 50. 50. 50. ]
[53.33333333 53.33333333 53.33333333 53.33333333 53.33333333]
[56.66666667 56.66666667 56.66666667 56.66666667 56.66666667]
[60. 60. 60. 60. 60. ]]
(10, 5)
----------网格第3行第4列坐标----------
130.0
36.666666666666664
2.7 Addition, subtraction, multiplication and division
# 加减乘除
a = np.arange(1, 10)
print(a)
print('----------a中元素统一加1---------')
print(a + 1)
print('----------a中元素统一乘以2---------')
print(a * 2)
print('----------a + b----------')
b = np.arange(2,20,2)
print(b)
c = a + b
print(c)
print('----------a - b----------')
c = a - b
print(c)
print('----------a * b----------')
c = a * b
print(c)
print('----------a / b----------')
c = a / b
print(c)
print('----------二维数组加法----------')
d = np.random.randint(1,10,size=(4,4))
print(d)
print('------------')
e = np.random.randint(1,10,size=(4,4))
print(e)
print('----------d + e----------')
print (d+e)
[1 2 3 4 5 6 7 8 9]
----------a中元素统一加1---------
[ 2 3 4 5 6 7 8 9 10]
----------a中元素统一乘以2---------
[ 2 4 6 8 10 12 14 16 18]
----------a + b----------
[ 2 4 6 8 10 12 14 16 18]
[ 3 6 9 12 15 18 21 24 27]
----------a - b----------
[-1 -2 -3 -4 -5 -6 -7 -8 -9]
----------a * b----------
[ 2 8 18 32 50 72 98 128 162]
----------a / b----------
[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
----------二维数组加法----------
[[7 4 5 3]
[3 1 4 8]
[8 6 1 1]
[5 4 3 4]]
------------
[[5 1 5 6]
[2 1 4 1]
[7 5 7 1]
[3 4 4 4]]
----------d + e----------
[[12 5 10 9]
[ 5 2 8 9]
[15 11 8 2]
[ 8 8 7 8]]
2.8 Trigonometric functions
# 计算三角函数 sin、cos
angle = np.array([0, 30, 45, 90])
print(angle)
# 角度转弧度
print('----------角度转弧度----------')
rad = np.deg2rad(angle)
print(rad)
# 求正弦
print('----------求正弦----------')
sin = np.sin(rad)
print(sin)
# 求余弦
print('----------求余弦----------')
cos = np.cos(a)
print(cos)
[ 0 30 45 90]
----------角度转弧度----------
[0. 0.52359878 0.78539816 1.57079633]
----------求正弦----------
[0. 0.5 0.70710678 1. ]
----------求余弦----------
[ 0.54030231 -0.41614684 -0.9899925 -0.65364362 0.28366219 0.96017029
0.75390225 -0.14550003 -0.91113026]
2.9 Summing
# 求和sum
t = np.arange(1,10)
np.sum(t)
45
2.10 Finding the mean
# 计算均值mean
np.mean(t)
5.0
2.11 Random numbers
# 随机数
print('----------随机浮点数----------')
a = np.random.rand(3,2)
print(a)
print('----------随机整数----------')
a = np.random.randint(2,5,size=(3,2))
print(a)
----------随机浮点数----------
[[0.822719 0.82149577]
[0.25140903 0.83072816]
[0.67028644 0.73376812]]
----------随机整数----------
[[3 4]
[3 4]
[4 3]]
2.12 Inversion
# 倒置
a = np.arange(1,10)
print(a)
np.flipud(a)
[1 2 3 4 5 6 7 8 9]
array([9, 8, 7, 6, 5, 4, 3, 2, 1])
2.13 Transpose
# 转置
a = np.random.randint(1,10, size=(2,5))
print(a)
print('----------转置----------')
b = a.T
print(b)
[[2 9 4 2 3]
[8 9 8 2 9]]
----------转置----------
[[2 8]
[9 9]
[4 8]
[2 2]
[3 9]]