What is the zoom feature:
It is to map all the data to the same scale. Such as:
X_train a training set as follows:
(x_trian)
After which a certain feature scaling, a new value obtained:
Obviously after scaling features, feature values smaller
Why should feature scaling it?
Some features that have value interval limits, such as age, body weight. Some features of the value can be increased without limit, as the count value.
So the gap between the numerical model will feature characteristics and adversely affected. Such as:
In this sample set, due to the different dimensions, the model by the 'number' dominant features. Therefore, if there is no preprocessing of the data words
There may bring deviations between the degree of importance is difficult to better response characteristics. In fact, there are other reasons in favor optimization
Feature scaling classification:
(More than two, but commonly used standardized)
1. look at normalized (min-max scaled)
By normalizing the value mapped to a value between 0-1.
official:
All of the features of the minimum value X (i) for a characteristic value, X (min) of this feature, X (max) for this feature
The maximum value of all the eigenvalues of
That is, (a characteristic feature value minus the minimum difference) divided by (maximum minus characteristic minimum feature difference)
Whereby the characteristic value after normalization value.
Python is simple to achieve:
import numpy as np def min_max_scaler(X): '''归一化''' assert X.ndim == 2,'必须为二维数组' X = np.array(X,dtype=float) n_feature = X.shape[1] for n in range(n_feature): min_feature = np.min(X[:,n]) max_feature = np.max(X[:,n]) X[:, n] = (X[:,n] - min_feature) / (max_feature - min_feature) return X x = np.random.randint(0,100,(25,4)) print(min_max_scaler(x))
'''
[[0.89247312 0.11494253 0.17857143 0.29347826]
[0.09677419 0.74712644 0.10714286 0.63043478]
[0. 0.87356322 0.95238095 0.67391304]
.......
[0.2688172 0.4137931 0.33333333 0.89130435]
[0.11827957 0.7816092 0.55952381 0.15217391]
[1. 0.57471264 0.70238095 0.45652174]
[0.16129032 1. 0.75 0.23913043]]
'''
sklearn中对应API: from sklearn.preprocessing import MinMaxScaler