Original link: https://blog.csdn.net/weixin_39175124/article/details/79463993
When the first data processing, data is often involve standardization. The existing data by a relationship, mapped into a space. Commonly used standardized way, subtracting the average, and the difference is mapped to the average to 0 by standard space. The system will record the mean and standard difference of each input parameter, so that data can be restored.
Average of the input parameters of the algorithm requires a lot of training ML is 0 and have the same variance of the order of, for example: Linear regression RBF kernel SVM, L1 and L2 canonical
sklearn.preprocessing.StandardScale r can easily achieve the above functions.
Called as follows:
First, the definition of an object:
SS = sklearn.preprocessing.StandardScaler (Copy = True, with_mean = True, with_std = True)
Here
copy; with_mean; with_std
The default value is True.
copy If false, it will replace the original values normalized value; if the data is not normalized np.array or scipy.sparse CSR matrix, the original copy of the data will be not be replaced
with_mean in dealing with sparse CSR or CSC matrices must be set False otherwise would Super Memory
To query attributes:
scale_: scaling, but also the standard deviation
mean_ : average value of each feature
var_: variance of each feature
n_sample_seen_: number of samples can be increased by patial_fit
for example:
import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.preprocessing import StandardScaler #data = pd.read_csv("C:/学习/python/creditcard/creditcard.csv") x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape((3, 3)) ss = StandardScaler() print(x) ss.fit(X=x) print(ss.n_samples_seen_) print(ss.mean_) print(ss.var_) print(ss.scale_) y = ss.fit_transform(x) print(y) z = ss.inverse_transform(y) print(z)
Operating results as follows:
It can be called Methods:
Fit (X-, Y = None) : calculating an average value, standard deviation, and the scaling factor after each feature of the input data, since you can call transofrm () according to the data
X: training set
y: In order that the incoming and compatible Pipeline
fit_transform (X, y = None, ** fit_params): After adjustment data by fit_params X, y to obtain an adjustment X, characterized in that each data distribution with zero mean and variance. 1
X-is array: training set
y is label
returns after a change of X
get_params (deep = True): returns the set parameters StandardScaler object,
inverse_transform (X-, Copy = None) : As the name suggests, is in accordance with the scaling law of reverse current data restore
transform (X, y = 'deprecated ', copy = None): based on existing objects rules, new parameters normalized
Can be considered fit_transform () is fit () and transform () fit.