Partial python library set

The internet

urllib

Has been integrated in python3

beautifulsoup4

Used to explain the extracted html content

http.cookiejar

Changed to http.cookiejar in python3. In python2, cookielib is used.

PySpider

The main functional requirements of the crawler framework are:

  • Crawl, update and schedule specific pages of multiple sites
  • Structured information extraction of pages is required
  • Flexible and scalable, stable and monitorable

requests

an examination

re

Regular expression

chardet

Judge the encoding module such as utf-8, etc.

Graphics

PIL

PIL (Python Imaging Library Python, image processing class library) provides general image processing functions and a large number of useful basic image operations, such as image scaling, cropping, rotation, color conversion, etc.
It can be downloaded from http://www.pythonware.com/products/pil/.

opencv

Graphic image processing

matplotlib

A 2D drawing library that produces publication-quality charts.

http://matplotlib.org/

scikit-image

There is a set of image processing algorithms that can make it easy to filter a picture, which is very suitable for preprocessing of images.

pip install scikit-image --upgrade

Machine learning

sklearn

Sklearn is a machine learning algorithm library based on numpy and scipy. It is very elegantly designed. It allows us to use the same interface to implement all different algorithm calls.

data

json

json interpretation and packaging module

numpy

A very general mathematical calculation library, often used in machine learning.
http://www.numpy.org/

loadtxt

Load text content of txt or csv, mostly used for importing table data of exl.

dot()

Returns the dot product of two arrays (dot product)

#如果处理的是一维数组,则得到的是两数组的內积
In: d = np.arange(0,9)
Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In : e = d[::-1]
Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

In : np.dot(d,e) 
Out: 84
#如果是二维数组(矩阵)之间的运算,则得到的是矩阵积(mastrix product)所得到的数组中的每个元素为,第一个矩阵中与该元素行号相同的元素与第二个矩阵与该元素列号相同的元素,两两相乘后再求和。
In : a = np.arange(1,5).reshape(2,2)
Out:
array([[1, 2],
       [3, 4]])

In : b = np.arange(5,9).reshape(2,2)
Out: array([[5, 6],
            [7, 8]])

In : np.dot(a,b)
Out:
array([[19, 22],
       [43, 50]])

image

Matrix operations

1. Several related functions for generating numpy matrix:

  • numpy.array()

  • numpy.zeros()

    Generate a matrix array of all 0s.

  • numpy.ones()

    Generate a matrix array of all ones.

  • numpy.eye()

    Generate diagonal matrix

2. Several related functions of numpy matrix generated in series:

  • numpy.array()
  • numpy.row_stack()
  • numpy.column_stack()
  • numpy.reshape()
>>> import numpy  
>>> numpy.eye(3)  
array([[ 1.,  0.,  0.],  
       [ 0.,  1.,  0.],  
       [ 0.,  0.,  1.]])  
>>> numpy.zeros(3)  
array([ 0.,  0.,  0.])  
>>> numpy.ones(3)  
array([ 1.,  1.,  1.])  
>>> x1 = numpy.array((1, 2, 3))  
>>> x1  
array([1, 2, 3])  
>>> x2 = numpy.array([4, 5, 6])  
>>> x2  
array([4, 5, 6])  
>>> x3 = numpy.array((x1, x2))  
>>> x3  
array([[1, 2, 3],  
       [4, 5, 6]])  
>>> x4 = x3.reshape(2, 3)  
>>> x4  
array([[1, 2, 3],  
       [4, 5, 6]])  
>>> x4 = x3.reshape(3, 2)  
>>> x4  
array([[1, 2],  
       [3, 4],  
       [5, 6]])  
>>> x5 = numpy.row_stack((x1, x2))  
>>> x5  
array([[1, 2, 3],  
       [4, 5, 6]])  
>>> x6 = numpy.row_stack([x1, x2])  
>>> x6  
array([[1, 2, 3],  
       [4, 5, 6]])  
>>> x7 = numpy.row_stack((x6, x2))  
>>> x7  
array([[1, 2, 3],  
       [4, 5, 6],  
       [4, 5, 6]])  
>>> x7[0]  
array([1, 2, 3])  
>>> x7[1]  
array([4, 5, 6])  
>>> x7[2]  
array([4, 5, 6])  
>>> x8 = numpy.column_stack([x1, x2, x1, x2])  
>>> x8  
array([[1, 4, 1, 4],  
       [2, 5, 2, 5],  
       [3, 6, 3, 6]])  
>>> x8[0]  
array([1, 4, 1, 4])  
>>> x8[1]  
array([2, 5, 2, 5])  
>>> x8[2]  
array([3, 6, 3, 6])  
>>> x8[0][3]  
4  
>>>

pandas

Python data analysis library, including dataframes (dataframes) and other structures http://pandas.pydata.org/

Learning materials: http://pandas.pydata.org/pandas-docs/stable/10min.html

scikit-learn

Machine learning algorithms for data analysis and data mining characters are general machine learning libraries that cover the k-nearest neighbor algorithm

http://scikit-learn.org/stable/

scipy

Learning materials: http://www.scipy-lectures.org/

Theano

Effectively define, optimize and evaluate mathematical expressions containing multi-dimensional arrays.

multimedia

pdfkit

A module to save html web pages as pdf

books

It is a third-party library for audio feature extraction in Python. There are many ways to extract audio features.

nltk

The module contains a large number of corpora, which can easily complete many natural language processing tasks, including word segmentation, part-of-speech tagging, named entity recognition (NER), and syntax analysis.

Published 18 original articles · praised 31 · 50,000+ views

Guess you like

Origin blog.csdn.net/alvinlyb/article/details/103797211