Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Step 1: Create stream processing environment:
val env = StreamExecutionEnvironment.getExecutionEnvironment
Step two: read data
First Read data set from existing
val stream1 = env.fromCollection(List(
SensorReading("sensor_1", 1547718159, 4),
SensorReading("sensor_2", 1547718261, 1),
SensorReading("sensor_9", 1547718272, 6.6),
SensorReading("sensor_10", 1547718285, 8.1)
))
The second Read data of any type
env.fromElements(1, 34, "date")
The third Read data from the file
val stream = env.readTextFile("D:\\test.txt")
The fourth Read data from kafka
1) add dependencies
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>1.7.2</version>
</dependency>
2) Code
// 定义相关的配置
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "consumer-group")
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.setProperty("auto.offset.reset", "latest")
val stream = env.addSource(
new FlinkKafkaConsumer011[String]("sensor", new SimpleStringSchema(), properties)
//[String]为数据类型
//"sensor"为kafka的topic
//new SimpleStringSchema()为数据类型(此处为string)的反序列化器,
//properties为配置
)
Fifth Custom source
1) Custom Data Source
//自定义类,继承SourceFunction,每条数据是一个(int,Long)类型的元组(根据业务自定义)
class MySensorSource2() extends SourceFunction[(Int, Long)] {
// 定义一个标识,表示数据源是否继续运行
var running: Boolean = true
//定义不运行的标志
override def cancel(): Unit = {
running = false
}
override def run(sourceContext: SourceFunction.SourceContext[(Int, Long)]): Unit = {
// 无限循环,产生随机的数据流
while (running) {
//为防止生成过快,线程没调用一次休眠100毫秒
Thread.sleep(100)
sourceContext.collect(
//生成随机数,第一个是三位的整数,第二位是时间戳
new Random().nextInt(999), System.currentTimeMillis().toLong
)
}
}
}
2) reading data from a data source
val stream5 = env.addSource(new MySensorSource2())
stream5.print()
Output: