https://www.deeplearn.me/1383.html
Principle of interval scaling method
The most common is to use the maximum and minimum values for processing. The processing formula is as follows
y=x−minmax−miny=x−minmax−min
In the above formula, min represents the minimum value of the data, and max represents the maximum value of the data.
- from sklearn.preprocessing importMinMaxScaler
- tmp=MinMaxScaler().fit_transform(irisdata.data)
- print tmp[0:5]
Partial results are as follows:
[[.22222222 0.625 0.06779661 0.04166667]
[0.16666667 0.06779661 0.04166667 .41666667]
[.11111111 0.5 0.05084746 0.04166667]
[.08333333 .45833333 .08474576 0.04166667]
[.19444444 .66666667 0.06779661 0.04166667]] Spark
scaling method in interval
- >>>from pyspark.mllib.linalg importVectors
- >>>from pyspark.sql importSQLContext
- >>>sqlContext=SQLContext(sc)
- >>>df = sqlContext.createDataFrame([(Vectors.dense([0.0]),),(Vectors.dense([2.0]),)],["a"])
- >>> mmScaler =MinMaxScaler(inputCol="a", outputCol="scaled")
- >>> model = mmScaler.fit(df)
- >>> model.transform(df).show()
- +-----+------+
- | a|scaled|
- +-----+------+
- |[0.0]| [0.0]|
- |[2.0]| [1.0]|