Spark UDF用户自定义函数

自定义一个函数实现查询字符串长度。首先创建测试的DataFrame:

	val spark = SparkSession.builder().master("local").appName("UDF").getOrCreate()
    val nameList: List[String] = List[String]("zhangsan", "lisi", "wangwu", "zhaoliu", "tianqi")
    import spark.implicits._
    val nameDF: DataFrame = nameList.toDF("name")
    nameDF.createOrReplaceTempView("students")

注册函数,包含函数名和参数类型:

spark.udf.register("STRLEN",(name:String)=>{
      name.length
    })
//    spark.udf.register("STRLEN",(name:String,i:Int)=>{
//      name.length+i
//    })

使用自定义函数:

spark.sql("select name ,STRLEN(name) as length from students order by length desc").show(100)
//    spark.sql("select name ,STRLEN(name,10) as length from students order by length desc").show(100)
发布了197 篇原创文章 · 获赞 245 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/qq_36299025/article/details/97821558