Min-max, a common method of data standardization

1. The purpose of data standardization

  Data standardization means to scale the data to a small specific interval. The general purpose is to remove the unit limit of the data and convert it into a pure value of the dimension, so that indicators of different units or orders of magnitude can be compared and weighted. . The normalization of data is a typical case.

1.1 Normalization of data

  1. Convert the number to a decimal between (0,1)
  2. Convert a dimensional expression to a dimensionless expression

1.2 Benefits of normalization

  The advantage of normalization is that in a multi-index evaluation system, due to the nature of each evaluation index, it usually has a different dimension and order of magnitude. When the levels of the various indicators are very different, if the original indicator values ​​are directly used for analysis, the role of indicators with higher values ​​in the comprehensive analysis will be highlighted, and the role of indicators with low value levels will be relatively weakened. Therefore, in order to ensure the reliability of the results, it is necessary to standardize the original data.
  Empirically speaking, normalization is to make the features between different dimensions have a certain degree of numerical comparison, which can greatly improve the accuracy of the classifier.

2. Min-max standardization

2.1 Definition

  For the sequence x 1, x 2,..., Xn x_1,x_2,...,x_nx1,x2,...,xn进行变换:
y i = x i − m i n 1 ≤ j ≤ n { x j } m a x 1 ≤ j ≤ n { x j } − m i n 1 ≤ j ≤ n { x j } y_i=\frac{x_i-min_{1≤j≤n\{x_j\}}}{max_{1≤j≤n\{x_j\}}-min_{1≤j≤n\{x_j\}}} Yi=max1jn{ xj}m i n1jn{ xj}xim i n1jn{ xj}
  则 新 序列y 1, y 2 ,. . . , yn ∈ [0, 1] y_1, y_2, ..., y_n∈ [0,1]Y1,Y2,...,Yn[0,1 ] and dimensionless. Normalized processing can be considered first when general data is needed.
  Min-max standardization is to perform a linear transformation on the original data and map the value to [0,1].

2.2 Implementation code

import numpy as np
import math

"""
around(arr,decimals=?)?maintain x decimals
"""

class DataNormalization:
    def __init__(self):
        self.arr = np.array([1,2,3,4,5,6,7,8,9])
        self.x_max = self.arr.max() 
        self.x_min = self.arr.min()
        self.x_mean = self.arr.mean()
        self.x_std = self.arr.std()

    def Min_MaxNorm(self):
        arr = np.around(((self.arr - self.x_min) / (self.x_max - self.x_min)), decimals = 4)
        print("Min_MaxNorm:{}".format(arr))

if __name__ == "__main__":
    a = DataNormalization()
    a.Min_MaxNorm()

The results of the operation are as follows:

Min_MaxNorm:[0.    0.125 0.25  0.375 0.5   0.625 0.75  0.875 1.   ]

Reference link for this article: Common methods of data standardization (Min-Max standardization, Z-Score standardization, etc.)

Guess you like

Origin blog.csdn.net/weixin_43624728/article/details/113546847