理解normalization||Standardization||Feature scaling

Feature scaling

Feature scaling is a method used to normalize the range of independent variables or features of data.

In some machine learning algorithms, objective functions will not work without normalization.

Gradient descent converges much faster with feature scaling than without it.
- Rescaling(min-max normalization) Normalization
  
  It’s a method consists in rescaling the range of features to scale the range in [0,1] or [-1,1].
  
  The general formula for a min-max of [0,1] is given as:
  $x`=\frac{x-min(x)}{max(x)-min(x)}$
Mean normalization

$x`=\frac{x-average(x)}{max(x)-min(x)}$
Standardization (Z-score Normalization)

$x`=\frac{x-\hat x}{\sigma}$
Scaling to unit length

$x`=\frac{x}{||x||}$
Why Should we Use Feature Scaling

Gradient Descent Based Algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled.

Distance-Based Algorithms like KNN, K-means, and SVM are most affected by the range of feetures.

Tree-Based Algorithms, are fairly insensitive to the scale of the features. A decision tree is only splitting a node based on a single feature, This split on a feature is not influenced by other features.
Normalization vs. Standardization

These are two of the most commonly used feature scaling techniques in machine learning but a level of ambiguity exists in their understanding.

Standardization can be helpful in cases where the data follows a Gaussian distribution;and unlike normalization, standardization does not have a bounding range, Which means even if you have outliers in your data, they will not be affected by standardization;

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution.
Normalization

Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.
$\frac{X-X_{min}}{X_{max}-X_{min}}$
Standardization

Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation.
$\frac{X-\mu}{\sigma}$
References

Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization

理解normalization||Standardization||Feature scaling

Rescaling(min-max normalization) Normalization

Mean normalization

Standardization (Z-score Normalization)

Scaling to unit length

Why Should we Use Feature Scaling

Normalization vs. Standardization

Normalization

Standardization

References

猜你喜欢