Feature scaling | normalization and standardization (on)

What is the zoom feature:

  It is to map all the data to the same scale. Such as:

  X_train a training set as follows:

  

                                  (x_trian)

  After which a certain feature scaling, a new value obtained:

  

       Obviously after scaling features, feature values ​​smaller

          

Why should feature scaling it?

Some features that have value interval limits, such as age, body weight. Some features of the value can be increased without limit, as the count value.

So the gap between the numerical model will feature characteristics and adversely affected. Such as:

In this sample set, due to the different dimensions, the model by the 'number' dominant features. Therefore, if there is no preprocessing of the data words

There may bring deviations between the degree of importance is difficult to better response characteristics. In fact, there are other reasons in favor optimization

 

Feature scaling classification:

         (More than two, but commonly used standardized)

1. look at normalized (min-max scaled)

 By normalizing the value mapped to a value between 0-1.

 official:

          

  All of the features of the minimum value X (i) for a characteristic value, X (min) of this feature, X (max) for this feature

  The maximum value of all the eigenvalues ​​of

  That is, (a characteristic feature value minus the minimum difference) divided by (maximum minus characteristic minimum feature difference)

  Whereby the characteristic value after normalization value.

   Python is simple to achieve:

import numpy as np

def min_max_scaler(X):
    '''归一化'''
    assert X.ndim == 2,'必须为二维数组'
    X = np.array(X,dtype=float)
    n_feature = X.shape[1]
    for n in range(n_feature):
        min_feature = np.min(X[:,n])
        max_feature = np.max(X[:,n])
        X[:, n] = (X[:,n] - min_feature) / (max_feature - min_feature)
    return X

x = np.random.randint(0,100,(25,4))
print(min_max_scaler(x))

'''

[[0.89247312 0.11494253 0.17857143 0.29347826]
[0.09677419 0.74712644 0.10714286 0.63043478]
[0. 0.87356322 0.95238095 0.67391304]

.......
[0.2688172 0.4137931 0.33333333 0.89130435]
[0.11827957 0.7816092 0.55952381 0.15217391]
[1. 0.57471264 0.70238095 0.45652174]
[0.16129032 1. 0.75 0.23913043]]

'''

  sklearn中对应API: from sklearn.preprocessing import MinMaxScaler

  

 

  

  

 

Guess you like

Origin www.cnblogs.com/qiutenglong/p/10956165.html