Machine Learning Eigenvalue Transformation (using spark.ml)

There are many ways to deal with features under pyspark.ml.feature.

When dealing with eigenvalues, there is a concept of normalization (scaling), which transforms the eigenvalues ​​into a uniform range of the measure.

Here are a few methods:

1.MinMaxScaler

Scale the data between a given minimum and maximum value, usually between 0 and 1

2.MaxAbsScaler

The largest absolute value is scaled to the unit size. But it scales the training set to [-1,1] by dividing by the max value. This means that the data is already zero-centered or sparse with very, very many zeros.

3.StandardScaler

Standardized performance is less effective when individual characteristics are too or obviously do not follow a Gaussian normal distribution. In practice, the distribution shape of feature data is often ignored, the mean value of each feature is removed, and the standard deviation of discrete features is divided, so as to rank and then realize data centralization.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326192486&siteId=291194637