Flink from entry to real fragrance (6, Flink implements UDF functions-to achieve more fine-grained control flow)

Flink provides various data conversion operations, but there are many data structures, rules, etc. that need to be processed in the actual business process. You need to write your own business code. At this time, the function class provided by flink (Function Class)

Flink exposes the interfaces of all udf functions (implementation methods are interfaces or abstract classes), such as MapFunction, FilterFunction, ProcessFunction, etc.

A little chestnut, if you want to filter the data starting with sensor3
or create a new scala Object UDFTest1 in com.mafei.apitest, the
other code is the same as before, read the file and do some simple processing, here is a custom function class MyFilterFunction , When using, just add the .filter method in the logic,

package com.mafei.apitest

import org.apache.flink.api.common.functions.{FilterFunction, ReduceFunction, RichFilterFunction}
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation}

//获取传感器数据

case class SensorReadingTest1(id: String,timestamp: Long, temperature: Double)

object UdfTest1 {
  def main(args: Array[String]): Unit = {
    //创建执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    case class Person(name: String, age: Int)

    val inputStream= env.readTextFile("/opt/java2020_study/maven/flink1/src/main/resources/sensor.txt")
    env.setParallelism(1)

//    inputStream.print()
    //先转换成样例类类型
    val dataStream = inputStream
      .map(data => {
        val arr = data.split(",") //按照,分割数据,获取结果
        SensorReadingTest1(arr(0), arr(1).toLong, arr(2).toDouble) //生成一个传感器类的数据,参数中传toLong和toDouble是因为默认分割后是字符串类别
        //      }).filter(new MyFilterFunction)
        //      }).filter(_.id.startsWith("sensor1"))   //如果特别简单的逻辑,也可以匿名类直接这样子写,和写一个函数是一样的效果

        //      }).filter(new RichFilterFunction[SensorReadingTest1] {
        //      override def filter(t: SensorReadingTest1): Boolean =
        //        t.id.startsWith("sensor3")
        //    })   //匿名类的实现效果,和上面2种效果都是一样的

      }).filter(new KeywordFilterFunction("sensor3"))  //也可以把要过滤的参数传进去

    dataStream.print()
    env.execute("udf test")

  }

}

//自定义一个函数类,做过滤,实现接口中的filter方法即可
class MyFilterFunction extends FilterFunction[SensorReadingTest1] {

  override def filter(t: SensorReadingTest1): Boolean = t.id.startsWith("sensor3")

}

//自定义的函数类,和上面一样,增加了传参,
class KeywordFilterFunction(keyword: String) extends FilterFunction[SensorReadingTest1]{
  override def filter(t: SensorReadingTest1): Boolean =
    t.id.startsWith(keyword)
}

Code structure and running effect diagram

Flink from entry to real fragrance (6, Flink implements UDF functions-to achieve more fine-grained control flow)

RichMap

Mainly do some data processing and other operations, the code demonstrates the difference between MapperDemo and RichMapDemo and the running effect

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/functions/RichMapFunction.html

package com.mafei.apitest

import org.apache.flink.api.common.functions.{FilterFunction, MapFunction, RichMapFunction}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation}

//获取传感器数据

case class SensorReadingTest2(id: String,timestamp: Long, temperature: Double)

object UdfTest2 {
  def main(args: Array[String]): Unit = {
    //创建执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    case class Person(name: String, age: Int)

    val inputStream= env.readTextFile("/opt/java2020_study/maven/flink1/src/main/resources/sensor.txt")
    env.setParallelism(1)

//    inputStream.print()
    //先转换成样例类类型
    val dataStream = inputStream
      .map(data => {
        val arr = data.split(",") //按照,分割数据,获取结果
        SensorReadingTest2(arr(0), arr(1).toLong, arr(2).toDouble) //生成一个传感器类的数据,参数中传toLong和toDouble是因为默认分割后是字符串类别
      }).map(new RichMapDemo())

    dataStream.print()
    env.execute("udf test")
  }

}

class MapperDemo extends MapFunction[SensorReadingTest2, String]{
  override def map(t: SensorReadingTest2): String = t.id+"测试加一些字符串"
}

//富函数,比上面类多了open和close等方法,可以做些数据库连接等操作
class RichMapDemo extends RichMapFunction[SensorReadingTest2, String]{

  //这里主要是一些初始化操作,启动调用时,整个过程只会调用一次,类似于类初始化加载的变量,像数据库连接等等
  override def open(parameters: Configuration): Unit = {
    println("进行了数据库连接。。。。。。。。。。")
    //获取运行时上下文
    getRuntimeContext()
  }

  //每条数据都会经过这个方法
  override def map(in: SensorReadingTest2): String = in.id+"测试富函数加一些字符串"

  override def close(): Unit = {
    //跟open类似,当任务停止时会执行,可以做一些如释放数据库连接等等
    print("关闭了数据库连接。。。。。。")
  }
}

Operation effect: It can be seen that in the whole process, there is only one database connection operation

A database connection was made. . . . . . . . . .
sensor1 test rich functions plus some strings
sensor2 test rich functions plus some strings
sensor3 test rich functions plus some strings
sensor4 test rich functions plus some strings
sensor4 test rich functions plus some strings
sensor4 test rich functions plus some strings
closed Database Connectivity. . . . . .

Guess you like

Origin blog.51cto.com/mapengfei/2547238