Flink from entry to real fragrance (17, use flink table api to output to file and kafka)

For streaming queries, you need to declare how to convert between tables and external connectors.
The message types exchanged with external systems are specified by the update model. The following three types can be used depending on the output target. For example, if you output to a file, you can’t use the update and withdraw mode, because you don’t know, you can only append, but you can use it if you change to mysql

  1. Append mode (Append)-the file system only supports append mode
    tables only for insert operations, and only exchanges insert messages with external connectors
  2. Withdrawal mode (Retract)-delete first and then insert, to realize the update operation
    table and external connector exchange. Add (Add) and withdraw (Retract) messages.
    Insert operation (insert) is encoded as add message; delete (delete) is encoded as retract message; The update (update) code is the previous retract and the next add message
  3. Update insert mode (upsert)
    Both update and insert are encoded as upsert messages; delete is encoded as delete messages

Chestnut 1-Read from a file, do a wave of operations and write to another file

/**
 *
 * @author mafei
 * @date 2020/11/22
 */

package com.mafei.apitest.tabletest

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.DataTypes
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, FileSystem, Schema}

object FileOutputTest {
  def main(args: Array[String]): Unit = {
    //1 、创建环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val tableEnv = StreamTableEnvironment.create(env)
    //2、读取文件
    val filePath = "/opt/java2020_study/maven/flink1/src/main/resources/sensor.txt"
    tableEnv.connect(new FileSystem().path(filePath))
      .withFormat(new Csv()) //因为txt里头是以,分割的跟csv一样,所以可以用oldCsv
      .withSchema(new Schema() //这个表结构要跟你txt中的内容对的上
        .field("id", DataTypes.STRING())
        .field("timestamp", DataTypes.BIGINT())
        .field("temper", DataTypes.DOUBLE())
      ).createTemporaryTable("inputTable")

    val sensorTable = tableEnv.from("inputTable")

    //做简单转换
    val simpleTramsformTable = sensorTable
      .select("id,temper")
      .filter("id='sensor1'")

    //聚合转换

    val aggTable = sensorTable
      .groupBy('id)
      .select('id, 'id.count as 'count)

    //直接打印输出效果:
    simpleTramsformTable.toAppendStream[(String, Double)].print("simpleTramsformTable: ")

    //聚合的结果就不能用toAppendStream   因为他实现的是后面再来一条数据,表中就会增加一条,但是聚合的不是,是要更新之前的结果
    aggTable.toRetractStream[(String, Long)].print("aggTable")
    /**
     * 输出的效果:
     * aggTable> (true,(sensor1,1))
     * simpleTramsformTable: > (sensor1,1.0)
     * aggTable> (true,(sensor2,1))
     * aggTable> (true,(sensor3,1))
     * aggTable> (true,(sensor4,1))
     * aggTable> (false,(sensor4,1))  //false代表重新计算了
     * aggTable> (true,(sensor4,2))
     * aggTable> (false,(sensor4,2))
     * aggTable> (true,(sensor4,3))
     */

    // 输出到文件中
    val outputPath = "/opt/java2020_study/maven/flink1/src/main/resources/output.txt"

    tableEnv.connect(new FileSystem().path(outputPath))
        .withFormat(new Csv())
        .withSchema(
          new Schema()
            .field("id", DataTypes.STRING())
            .field("temper", DataTypes.DOUBLE())
        )
        .createTemporaryTable("outputTable")
    simpleTramsformTable.insertInto("outputTable")
    env.execute("file ouput")
  }
}

Code structure and operation effect

Flink from entry to real fragrance (17, use flink table api to output to file and kafka)

The second chestnut, read from one topic of kakfa and write to another topic

/**
 *
 * @author mafei
 * @date 2020/11/23
 */

package com.mafei.apitest.tabletest

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.DataTypes
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, Kafka, Schema}

object KafkaOutputTest {
  def main(args: Array[String]): Unit = {
    //1 、创建环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val tableEnv = StreamTableEnvironment.create(env)

    //2、从kafka中读取数据
    tableEnv.connect(
      new Kafka()
        .version("0.11")
        .topic("sourceTopic")
        .startFromLatest()
        .property("zookeeper.connect", "localhost:2181")
        .property("bootstrap.servers", "localhost:9092")
    ).withFormat(new Csv())
      .withSchema(new Schema() // 这个表结构要跟你kafka中的内容对的上
        .field("id", DataTypes.STRING())
        .field("timestamp", DataTypes.BIGINT())
        .field("temperature", DataTypes.DOUBLE())
      )
      .createTemporaryTable("kafkaInputTable")

    val sensorTable = tableEnv.from("kafkaInputTable")

    //做简单转换
    val simpleTramsformTable = sensorTable
      .select("id,temperature")
      .filter("id='sensor1'")

    tableEnv.connect(
      new Kafka()
        .version("0.11")
        .topic("sinkTopic")
        .startFromLatest()
        .property("zookeeper.connect", "localhost:2181")
        .property("bootstrap.servers", "localhost:9092")
    ).withFormat(new Csv())
      .withSchema(new Schema() //这个表结构要跟你kafka中的内容对的上
        .field("id", DataTypes.STRING())
        .field("temper", DataTypes.DOUBLE())
      )
      .createTemporaryTable("kafkaOutputTable")

    simpleTramsformTable.insertInto("kafkaOutputTable")
    env.execute("kafka sink test by table api")
  }
}

At this time, two windows can be opened, one window is written to the topic "sourceTopic", the Flink program will read from this topic and write to the topic "sinkTopic", and then start a consumer command line to consume this topic. See the effect

Guess you like

Origin blog.51cto.com/mapengfei/2554700