Flink from entry to real fragrance (5, Flink various data conversion operations (transform))

map:

val streamMap = stream.map { x=> x*2}

flatMap

That is, the nested collection is converted and flattened into a non-nested collection.
For example: If a List wants to break up and extract data separated by spaces, you can do this:
List("ab", "cd").flatMap(line => line. The
final result of split(" ")) is: List(a,b,c,d
val streamFlatMap = stream.flatMap{
x => x.split(" ")
}

filter

It is similar to if for filtering, giving a lambda expression, and returning a value of type bool. According to whether the final result is true, it is judged whether the current result is retained
val streamFilter = stream.filter{
x => x == 1
}

KeyBy is
Flink from entry to real fragrance (5, Flink various data conversion operations (transform))
first grouped and then aggregated.
DataStream -> KeyedStream: Logically split a stream into disjoint partitions. Each partition contains the same key element and is implemented internally in the form of a hash.

Rolling aggregation operators (Rolling Aggregation)
These operators can perform aggregation for each branch of KeyedStream.
sum()
min()
max()
minBy()
maxBy()

Reduce

Split and Select

Split

DataStream -> SplitStream: According to certain characteristics, split a DataStram into two or more DataStreams

Select

Flink from entry to real fragrance (5, Flink various data conversion operations (transform))
SplitStream->DataStream: Obtain one or more DataStreams from a SplitStream.

Requirements: The sensor data is divided into two streams according to the temperature (30 degrees as the boundary).

Connect and CoMap

Streams are merged, but only the two streams are merged into one stream. The merged two streams are still independent and do not affect each other.

Flink from entry to real fragrance (5, Flink various data conversion operations (transform))

DataStream, DataStream -> ConnectedStreams: Connect two data streams that maintain their types. After the two data streams are Connected, they are only placed in the same stream. The internal data and forms of the two data streams remain unchanged. The streams are independent of each other.
CoMap,CoFlatMap

Flink from entry to real fragrance (5, Flink various data conversion operations (transform))

ConnectedStreams -> DataStream: Acting on ConnectedStreams, the function is the same as map and flatMap, each Stream in ConnectedStreams is processed with map and flatMap respectively.

Union

Stream merging is to directly merge the data of two streams into one stream. The data of the two streams must be consistent, otherwise it cannot be used.

Flink from entry to real fragrance (5, Flink various data conversion operations (transform))

surroundings:

Create a new package, com.mafei.apitest, create a new scala Object class, TransformTest

package com.mafei.apitest

import org.apache.flink.api.common.functions.ReduceFunction
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, createTypeInformation}

//获取传感器数据

case class SensorReadingTest(id: String,timestamp: Long, temperature: Double)

object TransformTest {
  def main(args: Array[String]): Unit = {
    //创建执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val inputStream= env.readTextFile("/opt/java2020_study/maven/flink1/src/main/resources/sensor.txt")
    env.setParallelism(1)

//    inputStream.print()
    //先转换成样例类类型
    val dataStream = inputStream
      .map(data =>{
        val arr = data.split(",")   //按照,分割数据,获取结果
        SensorReadingTest(arr(0), arr(1).toLong,arr(2).toDouble)  //生成一个传感器类的数据,参数中传toLong和toDouble是因为默认分割后是字符串类别
      })
    /**
     * dataStream.print()  输出样例
      1> SensorReadingTest(sensor4,1603766240,40.1)
      4> SensorReadingTest(sensor4,1603766284,44.0)
      2> SensorReadingTest(sensor1,1603766281,41.0)
      3> SensorReadingTest(sensor3,1603766283,43.0)
      2> SensorReadingTest(sensor2,1603766282,42.0)

     */

    //分组聚合,输出每个传感器当前最小值
    val aggStream = dataStream
      .keyBy("id")   //根据id来进行分组
//      .min("temperature")  //获取每一组中temperature 为最小的数据
      .min("temperature")  //获取每一组中temperature 为最小的数据

    /**
    aggStream.print()

      1> SensorReadingTest(sensor1,1603766281,41.0)
      2> SensorReadingTest(sensor3,1603766283,43.0)
      4> SensorReadingTest(sensor2,1603766282,42.0)
      1> SensorReadingTest(sensor4,1603766240,40.1)   // 所有sensor的数据只会输出最小的值
      1> SensorReadingTest(sensor4,1603766240,40.1)   // 所有sensor的数据只会输出最小的值
     */

    //需要输出当前最小的温度值,以及最近的时间戳,要用到reduce
    val resultStream = dataStream
      .keyBy("id")
//      .reduce((curState, newData)=>{
//        SensorReadingTest(curState.id,newData.timestamp, curState.temperature.min(newData.timestamp))
//      })
      .reduce( new MyreduceFunction)   //如果不用上面的lambda表达式,也可以自己写实现类,一样的效果,二选一

    /**

    print(resultStream.print())
    SensorReadingTest(sensor2,1603766282,42.0)
    SensorReadingTest(sensor3,1603766283,43.0)
    SensorReadingTest(sensor4,1603766240,40.1)
    SensorReadingTest(sensor4,1603766284,40.1)  //可以看到虽然sensor4的时间戳还是在更新,但是temperature 一直是最小的一个
    SensorReadingTest(sensor4,1603766249,40.1)
     */

    // 多流转换操作
    //分流,将传感器温度数据分成低温、高温两条流
    val splitStream = dataStream
      .split(data =>{
        if (data.temperature > 30.0 ) Seq("high") else Seq("low")
      })

    val highStream = splitStream.select("high")
    val lowStream = splitStream.select("low")

    val allStream = splitStream.select("high", "low")

    /**
     *
     * 数据输出样例: 大于30的都在high里面,小于30都在low
     * highStream.print("high")
     * lowStream.print("low")
     * allStream.print("all")
     *
     * all> SensorReadingTest(sensor1,1603766281,41.0)
     * high> SensorReadingTest(sensor1,1603766281,41.0)
     * all> SensorReadingTest(sensor2,1603766282,42.0)
     * high> SensorReadingTest(sensor2,1603766282,42.0)
     * all> SensorReadingTest(sensor4,1603766284,20.0)
     * low> SensorReadingTest(sensor4,1603766284,20.0)
     * all> SensorReadingTest(sensor4,1603766249,40.2)
     * high> SensorReadingTest(sensor4,1603766249,40.2)
     * all> SensorReadingTest(sensor3,1603766283,43.0)
     * high> SensorReadingTest(sensor3,1603766283,43.0)
     * all> SensorReadingTest(sensor4,1603766240,40.1)
     * high> SensorReadingTest(sensor4,1603766240,40.1)
     */

    //合流,connect
    val warningStream = highStream.map(data =>(data.id, data.temperature))

    val connectedStreams = warningStream.connect(lowStream)

    //用coMap对数据进行分别处理
    val coMapResultStream = connectedStreams
      .map(
        warningData =>(warningData._1,warningData._2,"warning"),
        lowTempData => (lowTempData.id, "healthy")
      )

    /**
     * coMapResultStream.print()
     *
     * (sensor1,41.0,warning)
     * (sensor4,healthy)
     * (sensor2,42.0,warning)
     * (sensor4,40.2,warning)
     * (sensor3,43.0,warning)
     * (sensor4,40.1,warning)
     */

    env.execute("stream test")

  }

}

class MyreduceFunction extends  ReduceFunction[SensorReadingTest]{
  override def reduce(t: SensorReadingTest, t1: SensorReadingTest): SensorReadingTest =
    SensorReadingTest(t.id, t1.timestamp, t.temperature.min(t1.temperature))
}

Used data sensor.txt
sensor1,1603766281,41
sensor2,1603766282,42
sensor3,1603766283,43
sensor4,1603766240,40.1
sensor4,1603766284,44
sensor4,1603766249,40.2

The final code structure:

Flink from entry to real fragrance (5, Flink various data conversion operations (transform))

Guess you like

Origin blog.51cto.com/mapengfei/2547236