Flink various data sources (source)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/qq_40713537/article/details/102698549

 

Step 1: Create stream processing environment:

 val env = StreamExecutionEnvironment.getExecutionEnvironment

Step two: read data

First Read data set from existing

 val stream1 = env.fromCollection(List(
            SensorReading("sensor_1", 1547718159, 4),
            SensorReading("sensor_2", 1547718261, 1),
            SensorReading("sensor_9", 1547718272, 6.6),
            SensorReading("sensor_10", 1547718285, 8.1)
        ))

The second Read data of any type

env.fromElements(1, 34, "date")

The third Read data from the file

 val stream = env.readTextFile("D:\\test.txt")
        

The fourth Read data from kafka

1) add dependencies

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
            <version>1.7.2</version>
        </dependency>

2) Code 

        // 定义相关的配置
        val properties = new Properties()
        properties.setProperty("bootstrap.servers", "localhost:9092")
        properties.setProperty("group.id", "consumer-group")
        properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
        properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
        properties.setProperty("auto.offset.reset", "latest")
        
        val stream = env.addSource(
            new FlinkKafkaConsumer011[String]("sensor", new SimpleStringSchema(), properties)
           //[String]为数据类型
           //"sensor"为kafka的topic
           //new SimpleStringSchema()为数据类型(此处为string)的反序列化器,
           //properties为配置
        )
     

Fifth Custom source

1) Custom Data Source

//自定义类,继承SourceFunction,每条数据是一个(int,Long)类型的元组(根据业务自定义)
class MySensorSource2() extends SourceFunction[(Int, Long)] {
    // 定义一个标识,表示数据源是否继续运行
    var running: Boolean = true
    
    //定义不运行的标志
    override def cancel(): Unit = {
        running = false
    }
    
    
    override def run(sourceContext: SourceFunction.SourceContext[(Int, Long)]): Unit = {
        
        // 无限循环,产生随机的数据流
        while (running) {
            
            //为防止生成过快,线程没调用一次休眠100毫秒
            Thread.sleep(100)
            
            sourceContext.collect(
                //生成随机数,第一个是三位的整数,第二位是时间戳
                new Random().nextInt(999), System.currentTimeMillis().toLong
            )
        }
    }
}

2) reading data from a data source

val stream5 = env.addSource(new MySensorSource2())
stream5.print()

Output:

Guess you like

Origin blog.csdn.net/qq_40713537/article/details/102698549