Machine Learning Python library and its popular Features


1 Numpy

  Numpy correlated with commonly used linear algebra operations, to support high-dimensional arrays, matrix and other data operation, and provides a number of related functions. Numpy internal operations to achieve the C language, and therefore the operation speed is very fast. First of all want to use it to import Numpy library, usually imported when alias agreed to np:

    import numpy as np

  Numpy There are two important types of data, namely, the array and the matrix . First, create a two-dimensional array (numpy.ndarray), print the array element type, shape and size.

    array=np.arange(6).reshape(2,3)
    print(array)
    print(array.dtype)
    print(array.shape)
    print(array.size)

  Numpy array thereof by bulk data operations, the following array and scalar mathematical operation, these operations in the array corresponding to each element calculates:

    array1=array+2
    array2=array**2
    print(array1)
    print(array2)

  Numpy array can also be indexed and sliced . For one-dimensional array, indexing and slicing mode with Python in the same List. For high-dimensional array, you can use different dimensions of comma-separated index to access an element:

    print(array[1,2])

  If you ignore the back of the index, it will return to a little lower than the high latitude dimensional array of arrays :

    print(array[1])

  When the array is a two-dimensional array, you can see it as a matrix, create a two-dimensional array:

    arr=np.array([[2,4,1],[1,2,4],[2,1,1]])

  Two-dimensional array and transpose matrix multiplication (dot product):

    arr1=arr.T
    print(np.dot(arr,arr1))

  numpy.linalg contains some common arithmetic function, such as seek Matrices, find the inverse matrix, singular value calculated decomposition (SVD), such as:

    from numpy.linalg import inv
    a=np.array([[3,1],[2,4]])
    b=inv(a)
    print(a)
    print(b)
    print(np.dot(a,b))

2 Pandas

  Pandas is constructed based Numpy, can be used for easy and quick handling of large data sets, the data sets to a relational operation, it can flexibly cope with the time series and the missing data
  is first introduced Pandas library, generally agreed to import alias pd:

    import pandas as pd

  Pandas There are two important data structures: Series and DataFrame .
  Series consists of a set of data with the set data of the index components. Below using a set of data to generate a set Series:

    series=pd.Series([4,1,2,4])
    print(series)

  Series printout can see the data and index creation, because there is no specified when creating the function will automatically create a set of natural numbers sequence indexes beginning with 0. You can specify when using the Series index parameters to create an index, you can also change the index according to their own needs after it is created:

    myindex=['a','b','c','d']
    series.index=myindex
    print(series)

  Data indexing, filtering, sorting:

    print(series['a'])
    print(series[series>1])
    print(series.sort_values())

  Also it is convenient to use a function library Numpy batch process these data

    print(np.exp(series))

  In addition Series, Pandas also provides a more Likewise, similar form type data structure DataFrame. First create a DataFrame dictionary:

    df=pd.DataFrame({'name':['vivi','cici','gigi','cici'],'age':[17,19,18,19],'height':[1.6,1.7,1.8,1.7]})
    print(df)

  It can be seen as a plurality of columns DataFrame Series , Series which share an index index, using the table and column names can obtain a single Series:

    print(df['age'])

  Pandas data cleansing may be used, by pd.read_csv()function, save the file can be read .csv table can specify a file path, file encoding parameters such as:

    df=pd.read_csv(r"test.csv",encoding="utf-8")

  After reading the data can use the head()function to see the first few rows of data:

    print(df.head())

  Use drop_duplicates()function can remove duplicate sample:

    # 删除姓名相同的样本,保留最后一项,并更新 DataFrame
    df.drop_duplicates(subset=["name"],inplace=True,keep="last")
    print(df)

  Use drop()function removes the specified column:

    #删除 id 列,并更新 DataFrame
    df.drop(columns=['id'],inplace=True)
    print(df)

  You can also filter the data according to demand:

    #筛选 name 字段以 'ci' 开头的数据
    sub_df1=df[df['name'].str.startswith('ci')]
    #筛选 age 列为18,height 列为1.7的数据
    sub_df2=df[(df['age']==18)&(df['height']==1.7)]
    print(sub_df1)
    print(sub_df2)

3 Matplotlib

  Matplotlib.pyplot is a set of command-style function, like MATLAB as the Matplotlib work.
  First import Matplotlib.pyplot library, usually imported when alias agreed to plt:

    import matplotlib.pyplot as plt

  Draw a simple diagram below

    import matplotlib.pyplot as plt
    import numpy as np

    x = np.linspace(0,2*np.pi,100)
    y = np.sin(x)
    plt.plot(x,y)
    plt.show()

Summed up five steps:

  • Create a drawing (figure)
  • Creating one or more graphics (Plotting) in the drawing area (also called sub-picture, coordinates / axes, axes)
  • Depict various marker points, lines, etc. on the plotting area
  • Adding the decorative label (or line drawing of the axes) of plotting
  • Various other DIY

Details refer to the documentation: Python - Matplotlib (basic usage)

Guess you like

Origin blog.csdn.net/weixin_33728268/article/details/91010203