numpy module data manipulation

foreword

Some time ago, because python was used for a large amount of data visualization, in terms of data processing, the numpy module was selected for operation. Now I will organize and share some of the numpy operations used in it.

1.numpy read csv file

First look at the content of my csv file: it can be opened directly by using the loadtxt
insert image description here
method of numpy , but we can open it in many ways by adjusting the parameters of the loadtxt method.

1.1 Direct reading

import numpy as np

p = r'E:\Dev\python\numpy-test\202007150000.txt'

# 直接读取格式化文本文件
npData = np.loadtxt(p,str,delimiter = ",") 
print (npData)
[['绔欑偣' 'Lon' 'Lat' ... 'Avg_2' 'VIS' 'PRS_Sea']
 ['54065' ' 125.65' '  44.53' ... '    2.10' '   4000.00' '    1004.60']
 ['58631' ' 118.50' '  28.90' ... '    0.60' '   3100.00' '    1005.30']
 ...
 ['59559' ' 120.75' '  22.00' ... '999999.00' '  15000.00' '    1008.10']
 ['58974' ' 122.07' '  25.63' ... '999999.00' '  30000.00' '    1007.00']
 ['58968' ' 121.52' '  25.03' ... '999999.00' '  25000.00' '    1006.40']]

1.2 Chinese Garbled Code Processing

# 中文乱码问题
with open(p,encoding = 'utf-8') as f:
    npData = np.loadtxt(f,str,delimiter = ",")
    print (npData)
[['站点' 'Lon' 'Lat' ... 'Avg_2' 'VIS' 'PRS_Sea']
 ['54065' ' 125.65' '  44.53' ... '    2.10' '   4000.00' '    1004.60']
 ['58631' ' 118.50' '  28.90' ... '    0.60' '   3100.00' '    1005.30']
 ...
 ['59559' ' 120.75' '  22.00' ... '999999.00' '  15000.00' '    1008.10']
 ['58974' ' 122.07' '  25.63' ... '999999.00' '  30000.00' '    1007.00']
 ['58968' ' 121.52' '  25.03' ... '999999.00' '  25000.00' '    1006.40']]

1.3 Skip the first line

skiprows = 1

#跳过第1行 skiprows = 1
with open(p,encoding = 'utf-8') as f:
    npData = np.loadtxt(f,str, delimiter = ",", skiprows = 1)
    print (npData)
[['54065' ' 125.65' '  44.53' ... '    2.10' '   4000.00' '    1004.60']
 ['58631' ' 118.50' '  28.90' ... '    0.60' '   3100.00' '    1005.30']
 ['57671' ' 112.37' '  28.85' ... '    0.90' '   1200.00' '    1005.30']
 ...
 ['59559' ' 120.75' '  22.00' ... '999999.00' '  15000.00' '    1008.10']
 ['58974' ' 122.07' '  25.63' ... '999999.00' '  30000.00' '    1007.00']
 ['58968' ' 121.52' '  25.03' ... '999999.00' '  25000.00' '    1006.40']]

Then when skiprows = 2, similar, skip the first two rows

1.4 Open with float type

# 以float形式 ,float 也是打开的时默认的数据类型
with open(p,encoding = 'utf-8') as f:
    npData = np.loadtxt(f,float,delimiter = ",", skiprows = 1)
    print (npData)
[[5.40650e+04 1.25650e+02 4.45300e+01 ... 2.10000e+00 4.00000e+03
  1.00460e+03]
 [5.86310e+04 1.18500e+02 2.89000e+01 ... 6.00000e-01 3.10000e+03
  1.00530e+03]
 [5.76710e+04 1.12370e+02 2.88500e+01 ... 9.00000e-01 1.20000e+03
  1.00530e+03]
 ...
 [5.95590e+04 1.20750e+02 2.20000e+01 ... 9.99999e+05 1.50000e+04
  1.00810e+03]
 [5.89740e+04 1.22070e+02 2.56300e+01 ... 9.99999e+05 3.00000e+04
  1.00700e+03]
 [5.89680e+04 1.21520e+02 2.50300e+01 ... 9.99999e+05 2.50000e+04
  1.00640e+03]]

If the first line is not skipped, since the first line contains Chinese, an error will be reported when opening.

1.5 You can also open the formatted list

# 载入格式化了的list
with open(p, encoding='utf-8') as f:
    lines = f.readlines()
    headerLines = lines[:1]
    dataLines = lines[1:300]
    print(type(dataLines))
    nfData = np.loadtxt(dataLines,delimiter = ",") 
    print (npData)
<class 'list'>
[[5.40650e+04 1.25650e+02 4.45300e+01 ... 2.10000e+00 4.00000e+03
  1.00460e+03]
 [5.86310e+04 1.18500e+02 2.89000e+01 ... 6.00000e-01 3.10000e+03
  1.00530e+03]
 [5.76710e+04 1.12370e+02 2.88500e+01 ... 9.00000e-01 1.20000e+03
  1.00530e+03]
 ...
 [5.95590e+04 1.20750e+02 2.20000e+01 ... 9.99999e+05 1.50000e+04
  1.00810e+03]
 [5.89740e+04 1.22070e+02 2.56300e+01 ... 9.99999e+05 3.00000e+04
  1.00700e+03]
 [5.89680e+04 1.21520e+02 2.50300e+01 ... 9.99999e+05 2.50000e+04
  1.00640e+03]]

1.6 Specify the data type of each column

dtype parameter

# 指定每一列的数据类型,结果返回的是一维
数组,数组内部为元组
#https://numpy.org/doc/1.18/reference/generated/numpy.loadtxt.html
with open(p,encoding = 'utf-8') as f:
    lines = f.readlines()
    header = lines[1]
    data = lines[1:]
    fields = [name for name in header.split(' ') if name !='']
    # 指定第一列为int类型,后面列为float类型
    types = ['int'] + ['f4'] * (len(fields) - 1)

    npData = np.loadtxt(data,dtype={
    
    'names':fields,'formats': types},delimiter = ",")
    print (npData)
[(54065, 125.65, 44.53, 1.69100e+02,  985.2, 25.3, 7.90000e+01, 3.00000e-01, 1.90000e+01, 2.10000e+00,  4000., 1004.6)
 (58631, 118.5 , 28.9 , 1.37000e+02,  989.7, 27.6, 8.20000e+01, 0.00000e+00, 1.09000e+02, 6.00000e-01,  3100., 1005.3)
 (57671, 112.37, 28.85, 3.70000e+01, 1000.9, 25.2, 9.90000e+01, 0.00000e+00, 3.46000e+02, 9.00000e-01,  1200., 1005.3)
 ...
 (59559, 120.75, 22.  , 9.99999e+05, 1005.3, 26.6, 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 15000., 1008.1)
 (58974, 122.07, 25.63, 1.04600e+02,  995.2, 29. , 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 30000., 1007. )
 (58968, 121.52, 25.03, 7.10000e+00, 1003. , 31.1, 9.99999e+05, 9.99999e+05, 9.99999e+05, 9.99999e+05, 25000., 1006.4)]

In this way, the result is returned as a one-dimensional array, and the inside of the array is a tuple

1.7 Read the specified column

usecols parameter

# 读取指定列 usecols
npData = np.loadtxt(p,delimiter = ",", skiprows = 1, usecols=(0, 2))
print(npData)
[[5.4065e+04 4.4530e+01]
 [5.8631e+04 2.8900e+01]
 [5.7671e+04 2.8850e+01]
 ...
 [5.9559e+04 2.2000e+01]
 [5.8974e+04 2.5630e+01]
 [5.8968e+04 2.5030e+01]]

2. Data manipulation

2.1 Data Slicing

# 切片
npData = np.loadtxt(p,delimiter = ",", skiprows = 1)

# 二维数组第一列, 
col0 = npData[:,0] #索引从0开始
print('----------第1列----------')
print (col0)

# 二维数组第二行
row2 = npData[1,:]
print('----------第2行----------')
print (row2)

# 取具体位置的元素
n,m = 3,2 
item = npData[n,m] # 行索引在前,列索引在后, 第4行第3列元素
print('----------第4行第3列元素----------')
print(item)
----------第1列----------
[54065. 58631. 57671. ... 59559. 58974. 58968.]
----------第2行----------
[5.8631e+04 1.1850e+02 2.8900e+01 1.3700e+02 9.8970e+02 2.7600e+01
 8.2000e+01 0.0000e+00 1.0900e+02 6.0000e-01 3.1000e+03 1.0053e+03]
----------第4行第3列元素----------
26.17

2.2 Convert data type

astype method

# 数据类型转换

npData = np.loadtxt(p,str,delimiter = ",", skiprows = 1)
# 取第3列
col2 = npData[:,2]
# 将第3列数据转为float32
newCol2 = col2.astype(np.float32)
print(col2)
print('----------str转float----------')
print(newCol2)
['  44.53' '  28.90' '  28.85' ... '  22.00' '  25.63' '  25.03']
----------str转float----------
[44.53 28.9  28.85 ... 22.   25.63 25.03]

2.3 Combination

c_method

# 组合 c_
npData = np.loadtxt(p,delimiter = ",", skiprows = 1)
x = npData[:,1]
y = npData[:,2]
points = np.c_[x,y]
print (points)
[[125.65  44.53]
 [118.5   28.9 ]
 [112.37  28.85]
 ...
 [120.75  22.  ]
 [122.07  25.63]
 [121.52  25.03]]

2.4 Finding the extreme value

# 取最大最小值 min、max
minX=np.min(x, axis=0) #axis=0, 获取列最大, axis=1时获取行最大
minY=np.min(y, axis=0)
maxX=np.max(x, axis=0)
maxY=np.max(y, axis=0)
print(minX)
print(minY)
print(maxX)
print(maxY)
75.25
18.65
134.42
53.47

2.5 Generate arrays with equal steps

linspace method

# 生成等步长的数组 linspace,linespace第一个参数指定开始数,第二给参数指定结束数,第三给参照指定结果列表的长度
r1 = np.linspace(10, 30, 5) 
r2 = np.linspace(10, 30, 8)

print(r1)
print(r2)
[10. 15. 20. 25. 30.]
[10.         12.85714286 15.71428571 18.57142857 21.42857143 24.28571429
 27.14285714 30.        ]

2.6 Constructing Matrix

meshgrid

# 构造一个矩阵
x = np.linspace(100, 140, 5)
y = np.linspace(30, 60, 10)

grid = np.meshgrid(x, y)
print(grid)

X,Y = grid
print('----------X----------')
print(X)
print(X.shape)
print('----------Y----------')
print(Y)
print(Y.shape)

print('----------网格第3行第4列坐标----------')
xIndex =2
yIndex= 3
print(X[xIndex,yIndex])
print(Y[xIndex,yIndex])
[array([[100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.],
       [100., 110., 120., 130., 140.]]), array([[30.        , 30.        , 30.        , 30.        , 30.        ],
       [33.33333333, 33.33333333, 33.33333333, 33.33333333, 33.33333333],
       [36.66666667, 36.66666667, 36.66666667, 36.66666667, 36.66666667],
       [40.        , 40.        , 40.        , 40.        , 40.        ],
       [43.33333333, 43.33333333, 43.33333333, 43.33333333, 43.33333333],
       [46.66666667, 46.66666667, 46.66666667, 46.66666667, 46.66666667],
       [50.        , 50.        , 50.        , 50.        , 50.        ],
       [53.33333333, 53.33333333, 53.33333333, 53.33333333, 53.33333333],
       [56.66666667, 56.66666667, 56.66666667, 56.66666667, 56.66666667],
       [60.        , 60.        , 60.        , 60.        , 60.        ]])]
----------X----------
[[100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]
 [100. 110. 120. 130. 140.]]
(10, 5)
----------Y----------
[[30.         30.         30.         30.         30.        ]
 [33.33333333 33.33333333 33.33333333 33.33333333 33.33333333]
 [36.66666667 36.66666667 36.66666667 36.66666667 36.66666667]
 [40.         40.         40.         40.         40.        ]
 [43.33333333 43.33333333 43.33333333 43.33333333 43.33333333]
 [46.66666667 46.66666667 46.66666667 46.66666667 46.66666667]
 [50.         50.         50.         50.         50.        ]
 [53.33333333 53.33333333 53.33333333 53.33333333 53.33333333]
 [56.66666667 56.66666667 56.66666667 56.66666667 56.66666667]
 [60.         60.         60.         60.         60.        ]]
(10, 5)
----------网格第3行第4列坐标----------
130.0
36.666666666666664

2.7 Addition, subtraction, multiplication and division

# 加减乘除
a = np.arange(1, 10)
print(a)
print('----------a中元素统一加1---------')
print(a + 1)
print('----------a中元素统一乘以2---------')
print(a * 2)
print('----------a + b----------')
b = np.arange(2,20,2)
print(b)
c = a + b
print(c)

print('----------a - b----------')
c = a - b
print(c)

print('----------a * b----------')
c = a * b
print(c)

print('----------a / b----------')
c = a / b
print(c)

print('----------二维数组加法----------')
d = np.random.randint(1,10,size=(4,4))
print(d)
print('------------')
e = np.random.randint(1,10,size=(4,4))
print(e)
print('----------d + e----------')
print (d+e)

[1 2 3 4 5 6 7 8 9]
----------a中元素统一加1---------
[ 2  3  4  5  6  7  8  9 10]
----------a中元素统一乘以2---------
[ 2  4  6  8 10 12 14 16 18]
----------a + b----------
[ 2  4  6  8 10 12 14 16 18]
[ 3  6  9 12 15 18 21 24 27]
----------a - b----------
[-1 -2 -3 -4 -5 -6 -7 -8 -9]
----------a * b----------
[  2   8  18  32  50  72  98 128 162]
----------a / b----------
[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
----------二维数组加法----------
[[7 4 5 3]
 [3 1 4 8]
 [8 6 1 1]
 [5 4 3 4]]
------------
[[5 1 5 6]
 [2 1 4 1]
 [7 5 7 1]
 [3 4 4 4]]
----------d + e----------
[[12  5 10  9]
 [ 5  2  8  9]
 [15 11  8  2]
 [ 8  8  7  8]]

2.8 Trigonometric functions

# 计算三角函数 sin、cos
angle = np.array([0, 30, 45, 90])
print(angle)

# 角度转弧度
print('----------角度转弧度----------')
rad = np.deg2rad(angle)
print(rad)

# 求正弦
print('----------求正弦----------')
sin = np.sin(rad)
print(sin)

# 求余弦
print('----------求余弦----------')
cos = np.cos(a)
print(cos)
[ 0 30 45 90]
----------角度转弧度----------
[0.         0.52359878 0.78539816 1.57079633]
----------求正弦----------
[0.         0.5        0.70710678 1.        ]
----------求余弦----------
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362  0.28366219  0.96017029
  0.75390225 -0.14550003 -0.91113026]

2.9 Summing

# 求和sum
t = np.arange(1,10)
np.sum(t)
45

2.10 Finding the mean

# 计算均值mean
np.mean(t)
5.0

2.11 Random numbers

# 随机数
print('----------随机浮点数----------')
a = np.random.rand(3,2)
print(a)
print('----------随机整数----------')
a = np.random.randint(2,5,size=(3,2))
print(a)
----------随机浮点数----------
[[0.822719   0.82149577]
 [0.25140903 0.83072816]
 [0.67028644 0.73376812]]
----------随机整数----------
[[3 4]
 [3 4]
 [4 3]]

2.12 Inversion

# 倒置
a = np.arange(1,10)
print(a)
np.flipud(a)
[1 2 3 4 5 6 7 8 9]
array([9, 8, 7, 6, 5, 4, 3, 2, 1])

2.13 Transpose

# 转置
a = np.random.randint(1,10, size=(2,5))
print(a)
print('----------转置----------')
b = a.T
print(b)
[[2 9 4 2 3]
 [8 9 8 2 9]]
----------转置----------
[[2 8]
 [9 9]
 [4 8]
 [2 2]
 [3 9]]

Guess you like

Origin blog.csdn.net/u012413551/article/details/108297905