Feature Engineering - Data Normalization and Standardization

definition

Normalized:

X i X m i n X m a x X m i n

standardization:
X i μ σ

in μ and σ represent the mean and standard deviation of the sample, X m a x is the maximum value, X m i n is the minimum value.

Nature

The essence of normalization and normalization is a linear transformation .
Linear transformation has many good properties, and these properties determine why changes to the data will not cause " failures ", but can improve the performance of the data .

Difference between the two

Normalized scaling is "flattened" and unified to an interval (determined only by extreme values ), while normalized scaling is more "elastic" and "dynamic", and has a lot to do with the distribution of the overall sample .

  • Normalization: Scaling is only related to the difference between the maximum and minimum values.
  • Standardization : The scaling is related to each point, and it is reflected in the variance. In contrast to normalization, all data points in normalization contribute (through mean and standard deviation).
  • Normalization : The output range is between 0-1
  • Normalize : The output range is negative infinity to positive infinity

Application scenarios

There are requirements for the range of output results, using normalized
data is relatively stable, there is no extreme maximum and minimum value, using normalization
(if there are indicators with different dimensions in the sample, it is better to normalize)
data exist outliers and More noise, using standardization, you can indirectly avoid the influence of outliers and extreme values ​​through centralization

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325524722&siteId=291194637