理解normalization||Standardization||Feature scaling

  • Feature scaling

    Feature scaling is a method used to normalize the range of independent variables or features of data.

    In some machine learning algorithms, objective functions will not work without normalization.

    Gradient descent converges much faster with feature scaling than without it.

    • Rescaling(min-max normalization) Normalization

      It’s a method consists in rescaling the range of features to scale the range in [0,1] or [-1,1].

      The general formula for a min-max of [0,1] is given as:
      x ‘ = x − m i n ( x ) m a x ( x ) − m i n ( x ) x`=\frac{x-min(x)}{max(x)-min(x)} x=max(x)min(x)xmin(x)

  • Mean normalization

    x ‘ = x − a v e r a g e ( x ) m a x ( x ) − m i n ( x ) x`=\frac{x-average(x)}{max(x)-min(x)} x=max(x)min(x)xaverage(x)

  • Standardization (Z-score Normalization)

    x ‘ = x − x ^ σ x`=\frac{x-\hat x}{\sigma} x=σxx^

  • Scaling to unit length

    x ‘ = x ∣ ∣ x ∣ ∣ x`=\frac{x}{||x||} x=xx

  • Why Should we Use Feature Scaling

    Gradient Descent Based Algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled.

    Distance-Based Algorithms like KNN, K-means, and SVM are most affected by the range of feetures.

    Tree-Based Algorithms, are fairly insensitive to the scale of the features. A decision tree is only splitting a node based on a single feature, This split on a feature is not influenced by other features.

  • Normalization vs. Standardization

    These are two of the most commonly used feature scaling techniques in machine learning but a level of ambiguity exists in their understanding.

    Standardization can be helpful in cases where the data follows a Gaussian distribution;and unlike normalization, standardization does not have a bounding range, Which means even if you have outliers in your data, they will not be affected by standardization;

    Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution.

  • Normalization

    Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.
    X − X m i n X m a x − X m i n \frac{X-X_{min}}{X_{max}-X_{min}} XmaxXminXXmin

  • Standardization

    Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation.
    X − μ σ \frac{X-\mu}{\sigma} σXμ

  • References

  1. Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization

猜你喜欢

转载自blog.csdn.net/The_Time_Runner/article/details/109171542