Article Directory
The internet
urllib
Has been integrated in python3
beautifulsoup4
Used to explain the extracted html content
http.cookiejar
Changed to http.cookiejar in python3. In python2, cookielib is used.
PySpider
The main functional requirements of the crawler framework are:
- Crawl, update and schedule specific pages of multiple sites
- Structured information extraction of pages is required
- Flexible and scalable, stable and monitorable
requests
an examination
re
Regular expression
chardet
Judge the encoding module such as utf-8, etc.
Graphics
PIL
PIL (Python Imaging Library Python, image processing class library) provides general image processing functions and a large number of useful basic image operations, such as image scaling, cropping, rotation, color conversion, etc.
It can be downloaded from http://www.pythonware.com/products/pil/.
opencv
Graphic image processing
matplotlib
A 2D drawing library that produces publication-quality charts.
http://matplotlib.org/
scikit-image
There is a set of image processing algorithms that can make it easy to filter a picture, which is very suitable for preprocessing of images.
pip install scikit-image --upgrade
Machine learning
sklearn
Sklearn is a machine learning algorithm library based on numpy and scipy. It is very elegantly designed. It allows us to use the same interface to implement all different algorithm calls.
data
json
json interpretation and packaging module
numpy
A very general mathematical calculation library, often used in machine learning.
http://www.numpy.org/
loadtxt
Load text content of txt or csv, mostly used for importing table data of exl.
dot()
Returns the dot product of two arrays (dot product)
#如果处理的是一维数组,则得到的是两数组的內积
In: d = np.arange(0,9)
Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In : e = d[::-1]
Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0])
In : np.dot(d,e)
Out: 84
#如果是二维数组(矩阵)之间的运算,则得到的是矩阵积(mastrix product)所得到的数组中的每个元素为,第一个矩阵中与该元素行号相同的元素与第二个矩阵与该元素列号相同的元素,两两相乘后再求和。
In : a = np.arange(1,5).reshape(2,2)
Out:
array([[1, 2],
[3, 4]])
In : b = np.arange(5,9).reshape(2,2)
Out: array([[5, 6],
[7, 8]])
In : np.dot(a,b)
Out:
array([[19, 22],
[43, 50]])
Matrix operations
1. Several related functions for generating numpy matrix:
-
numpy.array()
-
numpy.zeros()
Generate a matrix array of all 0s.
-
numpy.ones()
Generate a matrix array of all ones.
-
numpy.eye()
Generate diagonal matrix
2. Several related functions of numpy matrix generated in series:
- numpy.array()
- numpy.row_stack()
- numpy.column_stack()
- numpy.reshape()
>>> import numpy
>>> numpy.eye(3)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> numpy.zeros(3)
array([ 0., 0., 0.])
>>> numpy.ones(3)
array([ 1., 1., 1.])
>>> x1 = numpy.array((1, 2, 3))
>>> x1
array([1, 2, 3])
>>> x2 = numpy.array([4, 5, 6])
>>> x2
array([4, 5, 6])
>>> x3 = numpy.array((x1, x2))
>>> x3
array([[1, 2, 3],
[4, 5, 6]])
>>> x4 = x3.reshape(2, 3)
>>> x4
array([[1, 2, 3],
[4, 5, 6]])
>>> x4 = x3.reshape(3, 2)
>>> x4
array([[1, 2],
[3, 4],
[5, 6]])
>>> x5 = numpy.row_stack((x1, x2))
>>> x5
array([[1, 2, 3],
[4, 5, 6]])
>>> x6 = numpy.row_stack([x1, x2])
>>> x6
array([[1, 2, 3],
[4, 5, 6]])
>>> x7 = numpy.row_stack((x6, x2))
>>> x7
array([[1, 2, 3],
[4, 5, 6],
[4, 5, 6]])
>>> x7[0]
array([1, 2, 3])
>>> x7[1]
array([4, 5, 6])
>>> x7[2]
array([4, 5, 6])
>>> x8 = numpy.column_stack([x1, x2, x1, x2])
>>> x8
array([[1, 4, 1, 4],
[2, 5, 2, 5],
[3, 6, 3, 6]])
>>> x8[0]
array([1, 4, 1, 4])
>>> x8[1]
array([2, 5, 2, 5])
>>> x8[2]
array([3, 6, 3, 6])
>>> x8[0][3]
4
>>>
pandas
Python data analysis library, including dataframes (dataframes) and other structures http://pandas.pydata.org/
Learning materials: http://pandas.pydata.org/pandas-docs/stable/10min.html
scikit-learn
Machine learning algorithms for data analysis and data mining characters are general machine learning libraries that cover the k-nearest neighbor algorithm
http://scikit-learn.org/stable/
scipy
Learning materials: http://www.scipy-lectures.org/
Theano
Effectively define, optimize and evaluate mathematical expressions containing multi-dimensional arrays.
multimedia
pdfkit
A module to save html web pages as pdf
books
It is a third-party library for audio feature extraction in Python. There are many ways to extract audio features.
nltk
The module contains a large number of corpora, which can easily complete many natural language processing tasks, including word segmentation, part-of-speech tagging, named entity recognition (NER), and syntax analysis.