Python Data Analysis -Numpy

Numpy Features

Numpy scientific computing with Python as a common library, with the following characteristics:

  • Providing the N-dimensional array (matrix), fast and efficient vector math;
  • Efficient Index, does not need to cycle, since the underlying implementation using the C language.

Common methods of arrays and matrices

Creating the dimension of the array and matrix of information
  • numpy.array()

    ## 数组的创建
    vector = numpy.array([1,2,3,4])
    
    ## 矩阵的创建
    matrix = numpy.array([
        [1,2,3],
        [4,5,6],
        [7,8,9]
    ])
  • shape

    ## 打印数组的维度信息
    vector.shape() ——》(4,) # 数组中存在4个元素
    
    ## 打印矩阵的维度信息
    matrix.shape()——》(3,3) #三行三列
  • reshape

    eg:
    a = np.arange(15).reshape(3, 5) #随机创建3行5列的矩阵
    Out:
       [[ 0  1  2  3  4]
       [ 5  6  7  8  9]
       [10 11 12 13 14]]
    a.ndim # 返回其维数 即 2

    note:

    • reshape create a new array size has changed, but the shape of the original array is not changed.

    • the RESHAPE (-1) : The new shape attribute array should be matching with the original, if equal to -1, then a shape Numpy further calculates the attribute values based on the remaining dimension of the array.

      eg:
      z = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12],
                [13, 14, 15, 16]])
      z.shape
        Out:(4, 4)
      z.reshape(-1)
        Out:array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])
    • the RESHAPE (-1,1) : without knowing how many lines of circumstances, into a data

      eg:
      z.reshape(-1,1)
      Out:array([[ 1],
              [ 2],
              [ 3],
              [ 4],
              [ 5],
              [ 6],
              [ 7],
              [ 8],
              [ 9],
              [10],
              [11],
              [12],
              [13],
              [14],
              [15],
              [16]])
      
    • the RESHAPE (-1,2) : without knowing how many lines, the data is converted into 2

      eg:
      z.reshape(-1, 2)
      Out:array([[ 1,  2],
              [ 3,  4],
              [ 5,  6],
              [ 7,  8],
              [ 9, 10],
              [11, 12],
              [13, 14],
              [15, 16]])
  • linspace

    • In general, to create python sequence by array function converts City array, this is not efficient, but can arange function to specify the start, the final value of the direct and step to create a one-dimensional array (array not including the stop value) .

    • linspace function by specifying a start value, the final value and the number of elements to create a one-dimensional array. However, you can specify whether to include the final value by key endpoint, the default value defaults to and including termination value.

      eg:
      np.linspace(0, 1, 10) # 步长为1/9
      Out:——》array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
      
      
  • logspace

    logspace and linspace similar function, but it creates geometric sequence

    eg:
    np.logspace(0, 2, 20) # 产生1(10^0)到100(10^2)、有20个元素的等比数列
    
    array([ 1. ,    1.27427499,   1.62377674,    2.06913808,
    2.6366509 ,   3.35981829,   4.2813324 ,    5.45559478,
    6.95192796,  8.8586679 ,    11.28837892, 14.38449888,
    18.32980711, 23.35721469, 29.76351442, 37.92690191,
    48.32930239, 61.58482111, 78.47599704, 100. ])
  • zeros (), ones (), empty () to create an array of predetermined shape and type of

    zeros_like (), ones_like (), empty_like () function and the like can create the same type of the shape and parameters of the array of arrays. Thus, "zeros_like (a)" and "zeros (a.shape,
    same a.dtype)" effect .

    1.zeros(,dtype) == zeros_like() ——零矩阵
    eg:
    np.zeros(4, np.float) #元素类型默认为np.float,因此这里可以省略
    array([ 0., 0., 0., 0.])
    
    2.ones()——元素全部为1的矩阵
    eg:
    numpy.ones((3,4),numpy.int) # 3行4列,元素类型为int且全部为1的矩阵
    array([[1, 1, 1, 1],
           [1, 1, 1, 1],
           [1, 1, 1, 1]])
    
    3.empty()——空矩阵
    eg:
    np.empty((2,3),np.int) #只分配内存,不对其进行初始化
    array([[ 32571594, 32635312, 505219724],
              [ 45001384, 1852386928, 665972]])
    
Access and retrieve elements
  • Slice (acquired in accordance with the ranks)
    Note: When slicing operation, showing an index you want to start from the first to the i-th end you do not want indexed, which is the third parameter indicates the steps taken

    ## 数组获取元素
    vector[0:3]  #表示的是从第一个元素开始截取,获取三个元素,返回[1,2,3]
    
    ## 矩阵获取元素
    matrix[1:,0:2] # 从二列开始,获取第一列和第二列,返回([
        [4,5],
        [7,8]
    ])
  • Procured in accordance with the conditions

    eg:
    a = vector[vector > 3] # 截取数组中所有元素大于3的,返回[4]
    a = vector[vector == 4] # 截取数组中元素等于4的数组,如果不存在的话,返回一个空数组
    
    b = matrix[matrix > 5] # 截取矩阵中所有元素大于5,返回结果是一个一维数组,即[6,7,8,9]
    b = matrix[matrix == 9] # 返回的是一个boolean矩阵,结果[
        [False,False,False],
        [False,False,False],
        [Fasle,False,True]
    ]
    • Confuse items

      1. 数组比较和按照条件截取数组内容的返回结果是不相同的
      eg:
      print(vector == 3) # 返回的是将数组的元素逐一比较,返回的是一个boolean数组
      
      print(vector[vector==3]) # 返回的是截取数组中满足条件的数组
      
      2. 矩阵比较和按照条件截取矩阵内容的返回结果是不相同的
      eg:
      print(matrix == 3) # 返回的是将矩阵的元素逐一比较,返回的是一个boolean矩阵
      
      print(matrix[matrix==3]) # 返回的是截取矩阵中满足条件的数组
      
  • According to the sequence of integers accessors

    When a sequence of integers to access array elements, each element will be used in the sequence of integers as a subscript integer sequence may be a list or array. Using the sequence of integers as an array index is not obtained and the original array shared data space. That is, the new array obtained if changed, will not change the original array is in. Compared to the previous two terms, this access mode is effective.

    eg:
    a = numpy.linspace(0,1,10,endpoint=False)
    a
    >> array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
    
    # 按照整数序列,选取第4个、第6个、第8个、第10个元素进行组成新的数组
    b = a[[3,5,7,9]]
    b
    >> array([0.3, 0.5, 0.7, 0.9])
    
    # 按照序列修改对应位置的值
    b[[0,1,2]]= -1,-2,-3
    b
    >> array([-1. , -2. , -3. ,  0.9])
  • Using Boolean arrays

    When used as an element of a Boolean array b subscript x in the array access, the array x collected True all elements in the array corresponding to the subscript b. Using Boolean array as an array index is not obtained in the original array, and shared data space corresponding to the note in this way only a Boolean array can use Boolean list.

  • note:

    • And a list of different sequences Python, acquired by the new array subscript range is a view of the original array. It shares the same piece of the original array data space , that is to say, the new array element is changed, the original array will also changed .
Common operations
  • sum

    eg:
     # 数组求和
    
     # 矩阵求和
     # The axis dictates which dimension we perform the operation on
     #1 means that we want to perform the operation on each row, and 0 means on each column
     matrix = numpy.array([
                     [5, 10, 15], 
                     [20, 25, 30],
                     [35, 40, 45]
                  ])
     matrix.sum(axis=1) # axis=1表示按照每一行进行求和,axis=0表示按照每一列进行求和
  • product

    eg:
    #The matrix product can be performed using the dot function or method
    A = numpy.array([
        [1,2],
        [3,4]
    ])
    B = numpy.array([
        [1,1],
        [4,6]
    ])
    
    # 普通的乘积
    multi = A * B
    print(multi)
    》》[[ 1  2]
        [12 24]]
    
    ## 点积运算
    resultdot = numpy.dot(A,B)
    print(resultdot)
    》》[[ 9 13]
        [19 27]]
    
    # flatten the array 铺平
    print(A.ravel())
    》》[1 2 3 4]
    
    # 横向拼接/纵向拼接
    print(numpy.vstack((A,B)))/numpy.hstack((A,B))
    》》[[1 2]
        [3 4]
        [1 1]
        [4 6]]
    
  • Mean and variance

    • Mean () for averaging the array may be designated by the average seek shaft axis parameter, the parameter specifies out through the output array. And sum () except that, for an array of integers, which uses double precision floating point calculation, whereas for other types of arrays, and the array element of the same type of calculation for the accumulated variables.

    • average () can be calculated by averaging the array. It is not out and dtype parameters, but there is a specified weights of each parameter element weights.

    • STD () and var () were calculated standard deviation and variance array, there Axis , OUT and dtype parameters.

      eg:
      f = numpy.array([[ 0,  1,  2,  3,  4,  5],
             [10, 11, 12, 13, 14, 15],
             [20, 21, 22, 23, 24, 25],
             [30, 31, 32, 33, 34, 35],
             [40, 41, 42, 43, 44, 45],
             [50, 51, 52, 53, 54, 55]])
       ## 均值
      f.mean(f,axis=1) #整数数组使用双精度浮点数进行计算 
      >> array([ 2.5, 12.5, 22.5, 32.5, 42.5, 52.5])
      
      ## 方差
      numpy.var(f,axis=1)
      》》array([2.91666667, 2.91666667, 2.91666667, 2.91666667, 2.91666667,
             2.91666667])
      
      ## 标准差
      numpy.std(f,axis=1)
      》》array([1.70782513, 1.70782513, 1.70782513, 1.70782513, 1.70782513,
             1.70782513])
  • Three kinds transpose operation T, transpose, swapaxes

    • T is a transpose common
    • It belongs axis conversion transpose
    • swapaxes in fact, the two axes of a matrix of a change, is axisymmetric
    arr = numpy.arange(24).reshape((2, 3, 4))
    arr
    》》array([[[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11]],
    
           [[12, 13, 14, 15],
            [16, 17, 18, 19],
            [20, 21, 22, 23]]])
    ## 转置       
    arr.T
    》》
    array([[[ 0, 12],
            [ 4, 16],
            [ 8, 20]],
    
           [[ 1, 13],
            [ 5, 17],
            [ 9, 21]],
    
           [[ 2, 14],
            [ 6, 18],
            [10, 22]],
    
           [[ 3, 15],
            [ 7, 19],
            [11, 23]]])
    
    ## 将轴1和轴0互换,轴2不变,原始是(0,1,2)
    arr.transpose(1,0,2)
    》》array([[[ 0,  1,  2,  3],
            [12, 13, 14, 15]],
    
           [[ 4,  5,  6,  7],
            [16, 17, 18, 19]],
    
           [[ 8,  9, 10, 11],
            [20, 21, 22, 23]]])
    
    ## 将轴1和轴0互换,原始是(0,1)
    arr.swapaxes(1, 0)
    》》array([[[ 0,  1,  2,  3],
            [12, 13, 14, 15]],
    
           [[ 4,  5,  6,  7],
            [16, 17, 18, 19]],
    
           [[ 8,  9, 10, 11],
            [20, 21, 22, 23]]])
  • Most value and sorting

    • max (), min () seeking the maximum and minimum

    • The difference between the maximum and minimum values ​​PTP () is calculated

    • With the argmax () and argmin () you can find the maximum and minimum of the index. If no axis parameters, it returns an array subscript after planarization

    • Sort array () method is used to sort the array, it will change the contents of the array. The sort () function returns a new array without changing the original array . Their axis parameter default values are -1, i.e., a final sort along the axis of the array.
      sort () axis function parameters may be set to None, this time would get the new array is ordered after planarization.

    • Default argsort () returns Jiong sorted array subscript, axis parameter is -1

    • With () Median then the array values can be obtained, i.e., to sort the array, the array is located at an intermediate position of the value when the length is an even number, the average value of the middle to give two numbers. It can also specify the axis and out parameters

    • eg:
      a2 = floor(10*random.random((2,2))) 
      >>> a2 
      array([[ 1., 1.],
             [ 5., 8.]])
      >>>np.min(a2) # 最小值
      1.0
      >>>np.max(a2) # 最大值
      9.0
      >>>np.ptp(a2) # 最大最小值的差值
      8.0
      >>> np.argmax(a) #找到数组a中最大值的下标,有多个最值时得到第一个最值的下标 
      2
      >>> idx = np.argmax(a, axis=1)
      >>> idx
      array([2, 3, 0, 0])
      ## 使用xrange()选择出每行的最大值
      >>> a[xrange(a.shape[0]),idx]
      array([9, 8, 9, 9])
      
      >>> np.sort(a, axis=0) #对每列的数据进行排序 array([[5,1,1, 4, 0],
                [7, 1, 3, 6, 0],
                [9, 5, 9, 7, 2],
                [9, 8, 9'8, 3]])
  • Replication in the direction of an axis --tile

    eg:
    a = numpy.arange(0, 40, 10)
    a
    》》array([ 0, 10, 20, 30])
    
    ## 将数组作为元素复制成3行5列的矩阵
    b = numpy.tile(a, (3, 5)) 
    b
    》》array([[ 0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,
             0, 10, 20, 30],
           [ 0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,
             0, 10, 20, 30],
           [ 0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,  0, 10, 20, 30,
             0, 10, 20, 30]])
Function Module
  • numpy.linalg function module contains linear algebra. Using this module, the inverse matrix can be calculated, the Eigenvalue, linear equations, and the like for Determinants

    • For matrix inversion

      Calculated using the inverse matrix inv numpy.linalg module function, and checks the result with the original matrix obtained by the inverse matrix multiplication matrix is ​​determined.

      eg:
      A = np.mat("0 1 2;1 0 3;4 -3 8") #使用mat函数创建矩阵
      ## 求逆
      inverse = np.linalg.inv(A)
    • Function can solve numpy.linalg solving Ax = b in the form of linear equations, where A is a matrix, b is one or two dimensional array, x is the unknown variable
      eg:
      A = np.mat("0 1 2;1 0 3;4 -3 8") #使用mat函数创建矩阵
      b = np.array([0, 8, -9])
      ## 求解
      x = np.linalg.solve(A, b)
    • Eigenvalues ​​and eigenvectors function --eigvals

      Characteristic value (eigenvalue) equation Ax = ax i.e. roots, is a scalar. Wherein, A is a two-dimensional matrix, x is a one-dimensional vector. Feature vector (Eigenvector) is a vector of feature values ​​on. In numpy.linalg module, eigvals function may be calculated eigenvalue matrix, and eig function can return a tuple comprising an eigenvalue and corresponding.

    • Singular value decomposition
      in numpy.linalg svd function module may be the singular value decomposition of the matrix. The function returns a matrix --U 3, Sigma, and V, where U and V are orthogonal matrices, Sigma input matrix comprising singular values.
summary
  • Feature information

    X.flags    #数组的存储情况信息。
    
    X.shape    #结果是一个tuple,返回本数组的行数、列数、……
    
    X.ndim   #数组的维数,结果是一个数。
    
    X.size    #数组中元素的数量
    
    X.itemsize    #数组中的数据项的所占内存空间大小
    
    X.dtype    #数据类型
    
    X.T   #如果X是矩阵,发挥的是X的转置矩阵
    
    X.trace()    #计算X的迹
    
    np.linalg.det(a)   #返回的是矩阵a的行列式
    
    np.linalg.norm(a,ord=None)    #计算矩阵a的范数
    
    np.linalg.eig(a)    #矩阵a的特征值和特征向量
    
    np.linalg.cond(a,p=None)    #矩阵a的条件数
    
    np.linalg.inv(a)    #矩阵a的逆矩阵
  • index

    x=np.arange(10)
    
    print x[2]    #单个元素,从前往后正向索引。注意下标是从0开始的。
    
    print x[-2]    #从后往前索引。最后一个元素的下标是-1
    
    print x[2:5]    #多个元素,左闭右开,默认步长值是1
    
    print x[:-7]    #多个元素,从后向前,制定了结束的位置,使用默认步长值
    
    print x[1:7:2]   #指定步长值
    
    x.shape=(2,5)    #x的shape属性被重新赋值,要求就是元素个数不变。2*5=10
    
    print x[1,3]    #二维数组索引单个元素,第2行第4列的那个元素
    
    print x[0]   #第一行所有的元素
    
    y=np.arange(35).reshape(5,7)    #reshape()函数用于改变数组的维度
    
    print y[1:5:2,::2]    #选择二维数组中的某些符合条件的元素
    

Guess you like

Origin www.cnblogs.com/cecilia-2019/p/11368223.html