Talking about Feature Scaling

定义：Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.（来源于wikipedia）

Simply put, it is mainly used to map all eigenvalue ranges to the same range such as (0,1), (-1,1), (-0.5,0.5), etc.

Feature scaling (data normalization) is a commonly used step in data mining or machine learning . This step sometimes has a huge impact on the efficiency and accuracy of the algorithm.

Impact on accuracy : Obviously, the necessity of this step depends on the characteristics of the data features. If there are >= 2 features, and the value range of different features varies greatly, it is necessary to use Feature scaling. For example, in credit card fraud detection, this step is unnecessary if we only use the user's income as a learning feature. However, if we use both the user's income and the user's age at the same time, this step before modeling is likely to improve the detection accuracy, because the value range of the user's income feature may be [50000, 60000] Even bigger, but the user's age can only be around [20,100]. At this time, if I use the K nearest neighbor method for detection, the similarity of the user's income feature will have a greater impact on the detection result than the user's age. However, in fact, these two features may be equally important for fraud detection. Therefore, if we normalize both features before detection is implemented, they can be treated truly equally in our detection method.

Impact on Efficiency : Another example from Ng's ML course,

The example is shown above. In this example, we want to use linear regression to predict the house price based on the size of the house and the number of bedrooms in the house. The optimization method used is batch gradient descent. In the process of building the model, if the two features of the size of the house and the number of bedrooms in the house are not normalized, our optimization problem will be carried out in a very skewed area (as shown in the left image), which will make the batch gradient descent. Convergence is slow. When we normalize it, the problem will be transformed into a partial circular space optimization. At this time, the convergence speed of batch gradient descent will be greatly improved.

practice:

Commonly used feature scaling methods are as follows:

xi' = (xi - a) / b；

Where a can be the mean of the feature xi, and b can be the maximum value of xi, (maximum-minimum value), standard deviation, etc.

Summary :

The principle and method of this step are very simple, but if this step is missing in data mining or machine learning, it will sometimes have a huge impact on the learning efficiency and accuracy. Therefore, before learning to model, it is necessary to carefully consider whether Perform Feature scaling

Reference to: http://blog.csdn.net/memray/article/details/9023737

http://en.wikipedia.org/wiki/Feature_scaling

https://class.coursera.org/ml/

If there is any error, please correct me

Email: [email protected]

Talking about Feature Scaling

Guess you like