Spark ml之Binarizer

  • Binarizer 二值化器
  • Binarization 二值化是将数值特征阀值化为二进制(0/1)特征的过程。
  • Binarizer(ML提供的二元化方法),二元化涉及的参数有inputCol,outputCol和threshold阀值, 输入的特征值大于阀值将二值化为1.0,小于等于阀值的将二值化为0.0,inputCol支持向量Vector和双精度Double类型

示例:

import org.apache.spark.ml.feature.Binarizer
import org.apache.spark.sql.SparkSession

/**
  *
  * @author wangjuncheng
  * Binarizer 二值化器
  *
  **/
object Binarizer extends App {

  val spark = SparkSession
    .builder()
    .master("local[*]")
    .appName("ml_learn")
    //  .enableHiveSupport()
    .config("", "")
    .getOrCreate()

  val data = Array((0,0.1),(1,0.8),(2,0.2))
  val dataframe = spark.createDataFrame(data).toDF("id","feature")

  //Binarizer
  val binarizer = new Binarizer()
    .setInputCol("feature")
    .setOutputCol("binarized_feature")
    .setThreshold(0.5)

  val binarizerDataFrame = binarizer.transform(dataframe)

  //结果
  println(binarizer.getThreshold)
  binarizerDataFrame.show()
  
  spark.stop()
}

输出结果:
在这里插入图片描述

发布了14 篇原创文章 · 获赞 1 · 访问量 684

猜你喜欢

转载自blog.csdn.net/qq_33891419/article/details/103804499