Standardized Methods of Data Processing

Normalization

        1. Change the data to a decimal between (0, 1). The main purpose is to facilitate data processing, because mapping the data to the range of 0 to 1 can make the processing process more convenient and fast.

        2. Transform a dimensional expression into a dimensionless expression and become a scalar . The normalized data is in the same order of magnitude, which can eliminate the influence of dimension and dimension unit between indicators, and improve the comparability between different data indicators.

        Main algorithm:

        1. Linear transformation, that is, min-max normalization (common method)

        y = (x-min) / (max-min)

        2. Logarithmic function transformation

        y=log10 ( x)

        3. Inverse cotangent function conversion

        y=atan(x)*2/PI

 

 Standardization


        The normalization of data is to scale the data so that it falls within a small specific interval.

        Main method:

        1. z-score standardization , that is, zero-mean standardization (common method)

        y = (x-μ) / σ

        It is a statistical process that transforms data into a standard normal distribution with a mean of 0 and a standard deviation of 1, based on the assumption of a normal distribution. But this method can be used even if the data does not follow a normal distribution. It is especially useful when the maximum and minimum values ​​of the data are unknown, or there are outliers.


When processing, data outside 3 standard deviations should be removed


        2. Decimal scaling standardization

        y=x/10^j (j ensures max(|y|)<1)

        Normalize by shifting the decimal places of x

        3. Logarithmic Logistic Mode

        y=1/(1+e^(-x))

 

        Regularization

        The solution of a set of well-posed problems "adjacent" to the original ill-posed problem is used to approximate the solution of the original problem. This method is called a regularization method. How to establish an effective regularization method is an important part of the research on ill-posed problems in the field of inverse problems. Common regularization methods include Tikhonov regularization based on variational principles, various iterative methods, and other improved methods.

 

        总的来说,归一化是为了消除不同数据之间的量纲,方便数据比较和共同处理,比如在神经网络中,归一化可以加快训练网络的收敛性;标准化是为了方便数据的下一步处理,而进行的数据缩放等变换,并不是为了方便与其他数据一同处理或比较,比如数据经过零-均值标准化后,更利于使用标准正态分布的性质,进行处理;正则化而是利用先验知识,在处理过程中引入正则化因子(regulator),增加引导约束的作用,比如在逻辑回归中使用正则化,可有效降低过拟合的现象。



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325606139&siteId=291194637