There are many ways to deal with features under pyspark.ml.feature.
When dealing with eigenvalues, there is a concept of normalization (scaling), which transforms the eigenvalues into a uniform range of the measure.
Here are a few methods:
1.MinMaxScaler
Scale the data between a given minimum and maximum value, usually between 0 and 1
2.MaxAbsScaler
The largest absolute value is scaled to the unit size. But it scales the training set to [-1,1] by dividing by the max value. This means that the data is already zero-centered or sparse with very, very many zeros.
3.StandardScaler
Standardized performance is less effective when individual characteristics are too or obviously do not follow a Gaussian normal distribution. In practice, the distribution shape of feature data is often ignored, the mean value of each feature is removed, and the standard deviation of discrete features is divided, so as to rank and then realize data centralization.