Why should data standardization? When the need for data standardization, data standardization and when not?

Question 1:

  • Why should data standardization?

In real life, a target variable (y) may be considered to be a more characteristic variables (x) influence and control, then the dimensions and values of these characteristic variables of the order will be different, such as x1 = 10000, x2 = 1, x3 = 0.5 features can clearly be seen that x1 and x2, x3 dimensionless gaps; x1 impact on the target variable ratio will x2, x3 impact on the target variable is greater (so that the target may be variables controlled by x1, x2, x3 less influence, once the value of x1 problems will directly affect the predicted target variable, the predicted value of the target variable x1 by the monopoly of power, there will predict high-risk) through standardization treatment, characterized in that different variables have the same dimensions (that is to say the value of the characteristic is controlled within a certain range), so that the target variable can be controlled by a plurality of characteristic variables of the same size, so that, in the fall using a gradient when learning method parameters, characteristics of different parameters on the degree of influence on the same. For example, in the process of training the neural network, the data were normalized, to accelerate the convergence of the weighting parameter weights.
Briefly: the standardization of data is designed to eliminate the differences between the features, characteristics wholeheartedly facilitate learning weights.

Question 2:

  • When the need for data standardization, data standardization and when not?

From (1) we can know when scale features on the different dimensions of the original data (in) does not coincide, the data needs to be standardized preprocessing step, otherwise no data standardization.
The following example these types of problems are generally required for data standardization:

a Regression
b Machine learning algorithms
c Train the neural network
d Clustering Problem
e Classification
f Principal component analysis (PCA) issue

On data standardization methods and knowledge you can view this blogger the other blog post (*** python (classic) data normalization methods, clustering, classification summary: *** https://blog.csdn.net / data_bug / article / details / 81586412 )

Released five original articles · won praise 16 · views 10000 +

Guess you like

Origin blog.csdn.net/data_bug/article/details/87695229