Feature scaling | normalization and standardization (lower)

Front feature scaling | normalization and standardization (on)  a brief introduction to what is the feature scaling and normalization, here is mainly involved in the standardization and scaling some of the features are summarized.

 

What is standardization?

Standardization also feature scaling another way. It data normalized to a zero mean and variance of the distribution 1.

Such as:

There are a set of samples

          (3 * 10 samples feature)

After the standardization

In this case, the features of the mean 0 and variance 1; wherein 2 is the mean 0, variance 1. Feature 3 case

From the point of view of the results, normalized and normalized difference is that: all the data after normalization is mapped into the range 0-1.

And after normalizing, the average of all the data is 0 and variance 1 (as in the reference)

 

Standardization formula:

Characterized by subtracting the mean feature value divided by the standard deviation of the characteristic

 

Python is simple to achieve

AS NP numpy Import 

DEF standard_scaler (X-): 
    '' 'normalized' '' 
    Assert X.ndim == 2, 'X-Muse BE ....' 

    n_feature X.shape = [. 1] 

    for n-in Range (n_feature): 
        = np.mean n_mean (X-[:, n-]) 
        n_std np.std = (X-[:, n-]) 
        X-[:, n-] = (X-[:, n-] - n_mean) / n_std 

    return X- 

x_samples = NP .linspace (0,15,15) .reshape (5,3) # generates sample data 
print ( 'sample collection: \ n-', x_samples) 
x_samples = standard_scaler (x_samples) 
print ( 'standardization: \ n-', x_samples) 

n_feature x_samples.shape = [. 1] 
for n-in Range (n_feature): 
    . Print ( 'the first column {} {} mean, variance {}' format (n, np.mean (x_samples [:, n]), np. var (x_samples [:, n] )))

 operation result

Sample collection: 
 [[0. 1.07142857 2.14285714] 
 [3.21428571 4.28571429 5.35714286] 
 [7.5 8.57142857 6.42857143] 
 [9.64285714 10.71428571 11.78571429] 
 [12.85714286 13.92857143 15.]] 
Standardization: 
 [[-1.41421356e -1.41421356e + 00 + 00 + -1.41421356e 00] 
 [01 -7.07106781e--7.07106781e-01 -7.07106781e-01] 
 [16 0.00000000e 1.95389284e-0.00000000e + 00 + 00] 
 [01 7.07106781e-7.07106781e-01 7.07106781e-01] 
 [1.41421356e +00 1.41421356e + 00 1.41421356e + 00] ] 
mean 1.3322676295501878e-16 column 0, variance 0.9999999999999997 
mean first column -4.4408920985006264e-17, variance 0.9999999999999997 
mean second column -4.4408920985006264e-17, variance 0.9999999999999997
Mean of 0 1.3322676295501878e-16 , variance 0.9999999999999997 
the operation results point of view, are almost close to 0 mean and variance are almost close. 1 


sklearn the API:
from sklearn.preprocessing import StandardScaler

 

to sum up:

1. normalized and scaled extremes are, therefore vulnerable to interference extreme values. Its output range [0, 1]

2. Standardization scaling and mean, variance related to its output range between minus infinity to plus infinity

3. When there is data or a noise prediction value, may be used standardization

 

Guess you like

Origin www.cnblogs.com/qiutenglong/p/10960230.html