sklearn.preprocessing.StandardScaler data standardization

Original link: https://blog.csdn.net/weixin_39175124/article/details/79463993

When the first data processing, data is often involve standardization. The existing data by a relationship, mapped into a space. Commonly used standardized way, subtracting the average, and the difference is mapped to the average to 0 by standard space. The system will record the mean and standard difference of each input parameter, so that data can be restored.

Average of the input parameters of the algorithm requires a lot of training ML is 0 and have the same variance of the order of, for example: Linear regression RBF kernel SVM, L1 and L2 canonical

sklearn.preprocessing.StandardScale r can easily achieve the above functions.

Called as follows: 
First, the definition of an object: 
SS = sklearn.preprocessing.StandardScaler (Copy = True, with_mean = True, with_std = True) 
Here 
copy; with_mean; with_std 
The default value is True.

copy  If false, it will replace the original values normalized value; if the data is not normalized np.array or scipy.sparse CSR matrix, the original copy of the data will be not be replaced

with_mean  in dealing with sparse CSR or CSC matrices must be set False otherwise would Super Memory

To query attributes:

scale_:  scaling, but also the standard deviation

mean_ : average value of each feature

var_: variance of each feature

n_sample_seen_: number of samples can be increased by patial_fit

for example:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
#data = pd.read_csv("C:/学习/python/creditcard/creditcard.csv")
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape((3, 3))
ss = StandardScaler()
print(x)
ss.fit(X=x)
print(ss.n_samples_seen_)
print(ss.mean_)
print(ss.var_)
print(ss.scale_)
y = ss.fit_transform(x)
print(y)
z = ss.inverse_transform(y)
print(z)

Operating results as follows:

 

 

It can be called Methods:

Fit (X-, Y = None) : calculating an average value, standard deviation, and the scaling factor after each feature of the input data, since you can call transofrm () according to the data
X: training set
y: In order that the incoming and compatible Pipeline

fit_transform (X, y = None, ** fit_params):  After adjustment data by fit_params X, y to obtain an adjustment X, characterized in that each data distribution with zero mean and variance. 1
X-is array: training set 
y is label 
returns after a change of X

get_params (deep = True):  returns the set parameters StandardScaler object,

inverse_transform (X-, Copy = None) : As the name suggests, is in accordance with the scaling law of reverse current data restore 

transform (X, y = 'deprecated ', copy = None): based on existing objects rules, new parameters normalized

Can be considered fit_transform () is fit () and transform () fit.

Guess you like

Origin www.cnblogs.com/loubin/p/11299116.html