Numpy--using python for data analysis

Introduction to Numpy

The following display methods may be different in the ipython environment and the Jupyter environment (choose the sharing commonly used in work)

The main objects of NumPy are multidimensional arrays of the same elements. This is a table of elements where all elements are of one type, indexed by a tuple of positive integers (usually the elements are numbers). In NumPy, dimensions are called axes, and the number of axes is called rank.

A multidimensional array object ndarray

NumPy's array class is called ndarray. Often called an array. Note that numpy.array is not the same as the standard Python library class array.array, which only deals with one-dimensional arrays and provides a small amount of functionality. Commonly used properties are as follows:


In [10]: np.arange(16).reshape((4,4))
Out[10]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])


In [11]: type(np.arange(16).reshape((4,4)))
Out[11]: numpy.ndarray



data=np.arange(16).reshape((4,4))
data.dtype  
#dtype('int32')


data.shape
#(4, 4)

data.size
#16

create ndarray

There are several ways to create arrays, the commonly used array function creates arrays from regular Python lists and tuples. The type of the created array is deduced from the element types in the original sequence.

In [8]: np.array([1,2,3,4,5])
Out[8]: array([1, 2, 3, 4, 5])


In [7]: np.array([[1,2,3,4,5,6],[1,2,3,4,5,6]])
Out[7]: 
array([[1, 2, 3, 4, 5, 6],
       [1, 2, 3, 4, 5, 6]])

In [14]: np.array([(1,2,3,4,5,6),[1,2,3,4,5,6]])
Out[14]: 
array([[1, 2, 3, 4, 5, 6],
       [1, 2, 3, 4, 5, 6]])

In [16]: np.array(((1,2,3,4,3,6),(1,2,3,4,5,6)))
Out[16]: 
array([[1, 2, 3, 4, 3, 6],
       [1, 2, 3, 4, 5, 6]])

#指定类型
In [17]: np.array(((1,2,3,4,3,6),(1,2,3,4,5,6)),dtype=float)
Out[17]: 
array([[ 1.,  2.,  3.,  4.,  3.,  6.],
       [ 1.,  2.,  3.,  4.,  5.,  6.]])

Commonly used functions of numpy

  • ones creates an array of all ones
  • zeros creates an array of all zeros
  • empty creates an array whose contents are random and depend on the memory state
  • arange Function to create an array that returns an array instead of a list:
In [20]: np.ones((2,3))
Out[20]: 
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [21]: np.zeros((2,3))
Out[21]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
       
#一维数组被打印成行,二维数组成矩阵,三维数组成矩阵列表。      
In [24]: np.arange(6)
Out[24]: array([0, 1, 2, 3, 4, 5])

In [27]: np.arange(12).reshape(2,6)

Out[27]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])


In [29]: np.arange(12).reshape(3,2,2)
Out[29]: 
array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]]])
#如果一个数组用来打印太大了,NumPy自动省略中间部分而只打印角落
In [31]: np.arange(100000)
Out[31]: array([    0,     1,     2, ..., 99997, 99998, 99999])
# 禁用NumPy的这种行为并强制打印整个数组,你可以设置 printoptions参数来更改打印选项。

basic operations

Arithmetic operations on arrays are element-wise. A new array is created and filled with the results.

data1=np.arange(10)
data2=np.arange(10)
data1*data2
#array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Unlike many matrix languages, the multiplication operator * in NumPy indicates element-wise computation, and matrix multiplication can be implemented using the dot function or creating a matrix object

np.dot(data1,data2)
#285

Some operators like += and *= are used to modify an existing array without creating a new one.

data=np.ones(15)
data
#out
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.])
data*=2
#out
array([ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
        2.,  2.])

The operation specifies the axis parameter. You can apply the operation to the axis specified by the array.

data=np.arange(18).reshape((3,6))
data
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])
data.sum(axis=0)
array([18, 21, 24, 27, 30, 33])
data.sum(axis=1)
array([15, 51, 87])

Indexing, Slicing and Iterating

One-dimensional arrays can be indexed, sliced, and iterated over, just like lists and other Python sequences.
Multidimensional arrays can have one index per axis, and these indices are given as a comma-separated tuple.
When an index with less than the number of axes is provided, the lost index is considered as the entire slice: The index slice is a view of the original data, if the modification will affect the original data, if you need to get the copy data, add copy()

data=np.arange(18).reshape((3,6))
data
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])
data[1:3,3:]
array([[ 9, 10, 11],
       [15, 16, 17]])
       
       
#如果一个人想对每个数组中元素进行运算,我们可以使用flat属性,该属性是数组元素的一个迭代器
for i in data.flat:
    print(i)
    
    
data[1:3,3:]=12

data
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8, 12, 12, 12],
       [12, 13, 14, 12, 12, 12]])

data=data[1:3,3:].copy()

boolean index


names=np.array(['bob','les','lee','leslee','sally','silly','alis'])
data=np.random.randn(7,4)
data
array([[-1.13573027, -0.68479345,  0.59825133,  1.78172432],
       [-1.1516828 ,  0.89823945, -0.12296042,  0.12370584],
       [ 1.42724922,  0.84648497, -0.6145136 , -1.92440901],
       [ 0.8897498 , -0.13524427, -0.13473049,  0.22418047],
       [-0.12076329, -0.71757068,  0.22619757, -0.31316627],
       [ 0.13114028,  1.00729055, -0.3865038 ,  1.00018106],
       [ 0.18532823, -1.00441648, -1.04649557, -1.16575243]])
#每个名字对呀data数组中的一行 选出name等于lee的行
names=='lee'
array([False,  True,  True, False, False, False, False], dtype=bool)

data[names=='lee']
array([[-1.1516828 ,  0.89823945, -0.12296042,  0.12370584],
       [ 1.42724922,  0.84648497, -0.6145136 , -1.92440901]])

data[names=='lee',2:]
array([[-0.12296042,  0.12370584],
       [-0.6145136 , -1.92440901]])
       
data[names!='lee',2:]
array([[-0.12296042,  0.12370584],
       [-0.6145136 , -1.92440901]])
       
data[(names=='lee')|(names=='bob'),2:]
array([[ 0.59825133,  1.78172432],
       [-0.12296042,  0.12370584],
       [-0.6145136 , -1.92440901]])
       
data[(names=='lee')&(names=='bob'),2:]
array([], shape=(0, 2), dtype=float64)
# and or 在布尔型数组中无效

data[data>0]

array([ 0.59825133,  1.78172432,  0.89823945,  0.12370584,  1.42724922,
        0.84648497,  0.8897498 ,  0.22418047,  0.22619757,  0.13114028,
        1.00729055,  1.00018106,  0.18532823])

data[names!='bob']=7        

fancy index

data=np.empty((4,4))
data

array([[  6.23042070e-307,   3.56043053e-307,   1.60219306e-306,
          2.44763557e-307],
       [  1.69119330e-306,   1.78022342e-306,   1.05700345e-307,
          1.11261027e-306],
       [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307,
          6.23059726e-307],
       [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306,
          8.45593934e-307]])
data[[1,2,3]]
array([[  1.69119330e-306,   1.78022342e-306,   1.05700345e-307,
          1.11261027e-306],
       [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307,
          6.23059726e-307],
       [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306,
          8.45593934e-307]])
#选取其中的2,3,4行,默认从0开始
#np.ix_将两个以为数组转化为方形区域的索引器
data[np.ix_([1,2,3],[0,1,2])]
array([[  1.69119330e-306,   1.78022342e-306,   1.05700345e-307],
       [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307],
       [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306]])

Data processing with arrays

np.arange(1,10,0.01)
array([ 1.  ,  1.01,  1.02,  1.03,  1.04,  1.05,  1.06,  1.07,  1.08,
        1.09,  1.1 ,  1.11,  1.12,  1.13,  1.14,  1.15,  1.16,  1.17,
        1.18,  1.19,  1.2 ,  1.21,  1.22,  1.23,  1.24,  1.25,  1.26,
        1.27,  1.28,  1.29,  1.3 ,  1.31,  1.32,  1.33,  1.34,  1.35,
        1.36,  1.37,  1.38,  1.39,  1.4 ,  1.41,  1.42,  1.43,  1.44,
        1.45,  1.46,  1.47,  1.48,  1.49,  1.5 ,  1.51,  1.52,  1.53,
        1.54,  1.55,  1.56,  1.57,  1.58,  1.59,  1.6 ,  1.61,  1.62,
        1.63,  1.64,  1.65,  1.66,  1.67,  1.68,  1.69,  1.7 ,  1.71,
        1.72,  1.73,  1.74,  1.75,  1.76,  1.77,  1.78,  1.79,  1.8 ,
        1.81,  1.82,  1.83,  1.84,  1.85,  1.86,  1.87,  1.88,  1.89,
        1.9 ,  1.91,  1.92,  1.93,  1.94,  1.95,  1.96,  1.97,  1.98,  1.99])

np.where 函数是三元表达式 x if condition else y的矢量化版本

result = np.where(cond,xarr,yarr)

当符合条件时是x,不符合是y,常用于根据一个数组产生另一个新的数组。

栗子:假设有一个随机数生成的矩阵,希望将所有正值替换为2,负值替换为-2

arr = np.random.randn(4,4)
arr
np.where(arr>0,2,-2)        
#numpy中有一些常用的用来产生随机数的函数,randn就是其中一个 numpy.random.randn(d0, d1, ..., dn)
#d0, d1, …, dn都应该是整数,是浮点数也没关系,系统会自动把浮点数的整数部分截取出来。d0, d1, …, dn:应该为正整数,表示维度
arr = np.random.randn(4,4)
arr
Out[51]: 
array([[ 0.04150406,  1.27790573, -0.25917274, -1.25604622],
       [ 0.8797799 ,  1.84828821, -1.21709272, -0.41767649],
       [-0.71758894, -0.70595454,  1.72330333,  0.18559916],
       [-2.19529605,  2.11615947, -0.13563148, -1.41532576]])

np.where(arr>0,2,-2)
Out[52]: 
array([[ 2,  2, -2, -2],
       [ 2,  2, -2, -2],
       [-2, -2,  2,  2],
       [-2,  2, -2, -2]])
np.where(cond1&cond2,0,np.where(cond1,1,np.where(cond2,2,3)))
#等价于
if cond1 and cond2:
   0
elif cond1:
   1
elif cond2:
   2
else:
  3

Mathematical and Statistical Methods


data=np.random.randn(5,4)
data

array([[-0.44582163, -1.84127166, -0.31569774,  1.36470645],
       [-0.11506653, -1.03561913, -0.97670808, -1.05951855],
       [-1.24155893, -0.99854379, -0.77521176,  0.96576693],
       [-0.07880383, -1.05389831, -0.98544118,  0.347693  ],
       [ 0.50354977,  1.30615654, -0.39931607,  0.99116404]])
data.sum()#矩阵求和
data.sum(axis=0)#列求和
data.sum(axis=1)#行求和
#其他函数功能就不列举了,自行查询
#布尔型数组的求和方法
(data>0).sum()
data=np.random.randn(5,4)


(data>0).sum(axis=1)
array([2, 3, 3, 3, 2])

#排序 sort
data.sort(axis=1)
data
array([[-1.84127166, -0.44582163, -0.31569774,  1.36470645],
       [-1.05951855, -1.03561913, -0.97670808, -0.11506653],
       [-1.24155893, -0.99854379, -0.77521176,  0.96576693],
       [-1.05389831, -0.98544118, -0.07880383,  0.347693  ],
       [-0.39931607,  0.50354977,  0.99116404,  1.30615654]])

Unique and common set logic

data=np.array(['lee','les','bob','hello','hello'])
np.unique(data)
array(['bob', 'hello', 'lee', 'les'],
      dtype='<U5')
value=np.array([1,2,3,4,5,6,7,8,])  
#判断in1d(x,y)判断x是否在y中
np.in1d(value,[1,2,3,4,5])
array([ True,  True,  True,  True,  True, False, False, False], dtype=bool)

value[np.in1d(value,[1,2,3,4,5])]
array([1, 2, 3, 4, 5])

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325440398&siteId=291194637