Feature Engineering (2) - Data Preprocessing Interval Scaling Method

https://www.deeplearn.me/1383.html

 

Principle of interval scaling method

The most common is to use the maximum and minimum values ​​for processing. The processing formula is as follows

y=xminmaxminy=x−minmax−min

In the above formula, min represents the minimum value of the data, and max represents the maximum value of the data.

  1. from sklearn.preprocessing importMinMaxScaler
  2. tmp=MinMaxScaler().fit_transform(irisdata.data)
  3. print tmp[0:5]

Partial results are as follows:
[[.22222222 0.625 0.06779661 0.04166667]
[0.16666667 0.06779661 0.04166667 .41666667]
[.11111111 0.5 0.05084746 0.04166667]
[.08333333 .45833333 .08474576 0.04166667]
[.19444444 .66666667 0.06779661 0.04166667]] Spark
scaling method in interval

    1. >>>from pyspark.mllib.linalg importVectors
    2. >>>from pyspark.sql importSQLContext
    3. >>>sqlContext=SQLContext(sc)
    4. >>>df = sqlContext.createDataFrame([(Vectors.dense([0.0]),),(Vectors.dense([2.0]),)],["a"])
    5. >>> mmScaler =MinMaxScaler(inputCol="a", outputCol="scaled")
    6. >>> model = mmScaler.fit(df)
    7. >>> model.transform(df).show()
    8. +-----+------+
    9. | a|scaled|
    10. +-----+------+
    11. |[0.0]| [0.0]|
    12. |[2.0]| [1.0]|

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325859329&siteId=291194637