What is linear normalization? Use python to implement linear normalization of data.

Linear Normalization is a common data preprocessing method, also known as Min-Max normalization. It scales the original data to a specific range by linearly transforming it. Commonly used is scaling the data to the range of [0, 1] or [-1, 1].

Specifically, for the original data XXX , its linear normalization can be expressed as:

X n o r m = X − X m i n X m a x − X m i n X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}} Xnorm=XmaxXminXXmin

Among them, X min X_{min}Xmin X m a x X_{max} Xmaxrepresent the minimum and maximum values ​​in the original data respectively. Through this formula, the original data XXX scales to[0, 1] [0, 1][0,1 ] , so that the new normalized dataX norm X_{norm}XnormSatisfies 0 ≤ X norm ≤ 1 0 \leq X_{norm} \leq 10Xnorm1

It should be noted that linear normalization assumes that the distribution of data is uniform and is sensitive to extreme values ​​and may be affected by outliers. Therefore, when using linear normalization for data preprocessing, it is necessary to evaluate the specific problem and consider the use of other data preprocessing methods.

python implementation

In Python, you can use the NumPy library to implement linear normalization of data. Here is a simple example code:

import numpy as np

def linear_normalization(data):
    # 计算最大值和最小值
    min_val = np.min(data)
    max_val = np.max(data)

    # 线性归一化
    normalized_data = (data - min_val) / (max_val - min_val)

    return normalized_data

# 示例数据
data = np.array([1, 2, 3, 4, 5])

# 线性归一化
normalized_data = linear_normalization(data)

print(normalized_data)

Running the above code will output the normalized data [0. 0.25 0.5 0.75 1. ].

In the above code, linear_normalizationthe function accepts a NumPy array containing raw data as input, and uses the np.min()and np.max()functions to calculate the minimum and maximum values ​​of the data respectively. Then, (data - min_val) / (max_val - min_val)perform linear normalization calculation to obtain the normalized data normalized_dataand return it.

Note that in actual applications, the data may need to be processed, such as converting to floating point type or processing multi-dimensional data. In addition, if the data set is large, you can also consider using more efficient methods to calculate the minimum and maximum values ​​to increase the calculation speed.

Guess you like

Origin blog.csdn.net/weixin_45277161/article/details/133065564