Sklearn interpretation and application of the pipeline.Pipeline and preprocessing.Polynomialfeatures

pipeline.Pipeline
recommended blog link: https://blog.csdn.net/lanchunhui/article/details/50521648
1. sklearn use Pipeline

(1 Introduction

When we apply various preprocessing operations on the training set (normalized feature, principal component analysis, etc.),
we need to repeat the test set these parameters.

achieve a flow pipeline and package management of all of the steps, so that the parameter can be easily set to be reused in a new data set.

It can be used for several pipeline following:

Modular Feature Transform, just write less code will be able to update to the new Feature training set.

Automation Grid Search, as long as the pre-set parameters Model and good use of the candidate, and can automatically search for the best record in the Model.

Automation Ensemble Generation, from time to time to existing best used to do the K Model Ensemble.

(2) Examples:

Note that each step is the intermediate pipeline Transformer, i.e. they must contain and fit transform method, or fit_transform.

The final step is a Estimator, that is the last step of the model must be fit method, can not transform method.

Then the training set with Pipeline.fit training, pipe_lr.fit (X_train, y_train)
then the test set directly Pipeline.score forecast and rate pipe_lr.score (X_test, y_test)

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris 
 
# 获取iris数据集
iris = load_iris()
X_data = iris.data
y_data = iris.target
 
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, \
                                                    test_size = 0.25, random_state = 1)
 
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
 
# 构建pipeline
pipe_lr = Pipeline([('sc', StandardScaler()),
                    ('pca', PCA(n_components=2)),
                    ('clf', LogisticRegression(random_state=1))
                    ])
pipe_lr.fit(X_train, y_train)
print('Test accuracy: %.3f' % pipe_lr.score(X_test, y_test))
 
Test accuracy: 0.842

2.preprocessing.PolynomialFeatures features for constructing
is configured to use sklearn.preprocessing.PolynomialFeatures features.

It is performed using a polynomial method, and if a, b two features, then it is a polynomial of degree 2 (1, a, b, a ^ 2, ab, b ^ 2).

There are three parameters PolynomialFeatures

degree: degree polynomial Control

interaction_only: The default is False, if you specify True, there would be characterized by its own terms and its own combination of the above quadratic term is not A 2 and b 2.

include_bias: The default is True. If True, then there will be 1 that one above.

from sklearn.preprocessing import PolynomialFeatures
X_train = [[1],[2],[3],[4]]
quadratic_featurizer_2 = PolynomialFeatures(degree=2)
X_train_quadratic_2 = quadratic_featurizer_2.fit_transform(X_train)
print("feature names")
print(quadratic_featurizer_2.get_feature_names())
print(X_train_quadratic_2)
 
quadratic_featurizer_3 = PolynomialFeatures(degree=3)
X_train_quadratic_3 = quadratic_featurizer_3.fit_transform(X_train)
print("feature names")
print(quadratic_featurizer_3.get_feature_names())
print(X_train_quadratic_3)
 
X_train = [[1,3],[2,6],[3,7],[4,8]]
quadratic_featurizer_2 = PolynomialFeatures(degree=2)
X_train_quadratic_2 = quadratic_featurizer_2.fit_transform(X_train)
print("feature names")
print(quadratic_featurizer_2.get_feature_names())
print(X_train_quadratic_2)
 
quadratic_featurizer_3 = PolynomialFeatures(degree=3)
X_train_quadratic_3 = quadratic_featurizer_3.fit_transform(X_train)
print("feature names")
print(quadratic_featurizer_3.get_feature_names())
print(X_train_quadratic_3)

Export

feature names
['1', 'x0', 'x0^2']
[[  1.   1.   1.]
 [  1.   2.   4.]
 [  1.   3.   9.]
 [  1.   4.  16.]]
feature names
['1', 'x0', 'x0^2', 'x0^3']
[[  1.   1.   1.   1.]
 [  1.   2.   4.   8.]
 [  1.   3.   9.  27.]
 [  1.   4.  16.  64.]]
feature names
['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']
[[  1.   1.   3.   1.   3.   9.]
 [  1.   2.   6.   4.  12.  36.]
 [  1.   3.   7.   9.  21.  49.]
 [  1.   4.   8.  16.  32.  64.]]
feature names
['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2', 'x0^3', 'x0^2 x1', 'x0 x1^2', 'x1^3']
[[   1.    1.    3.    1.    3.    9.    1.    3.    9.   27.]
 [   1.    2.    6.    4.   12.   36.    8.   24.   72.  216.]
 [   1.    3.    7.    9.   21.   49.   27.   63.  147.  343.]
 [   1.    4.    8.   16.   32.   64.   64.  128.  256.  512.]]

Reference links:

https://blog.csdn.net/tiange_xiao/article/details/79755793

https://www.cnblogs.com/magle/p/5881170.html

https://blog.csdn.net/ssdut_209/article/details/81869795

https://blog.csdn.net/zhuzuwei/article/details/80956787

Guess you like

Origin blog.csdn.net/weixin_42542536/article/details/90322216