FlinkStream自定义Source
前言
Flink程序在客户端进行编译优化的时候Operator Chain减少了不同算子之间数据传输序列化与反序列话的开销,提升了性能。
1.获取执行环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
#底层实际上是调用以下2个方法
createLocalEnvironment()
createExecutionEnvironment()
----------------------------
public static ExecutionEnvironment getExecutionEnvironment() {
return contextEnvironmentFactory == null ?
createLocalEnvironment() : contextEnvironmentFactory.createExecutionEnvironment();
}
2.addSource(SourceFuntion接口的实现类,或者传一个函数返回SourceFuntion实例)
public class FlinkKafkaConsumer<T> extends FlinkKafkaConsumerBase<T> {
public abstract class FlinkKafkaConsumerBase<T> extends RichParallelSourceFunction<T>
3.transformation
4.sink
Flink也支持checkpoint检查点机制
简单的自定义Source使用案例
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val people: DataStream[People] = env.addSource(new MyPeopleSource())
people.print().setParallelism(1)
env.execute("myPeopleSource")
}
package com.shufang.flink.examples
import com.shufang.flink.bean
import org.apache.flink.streaming.api.functions.source.SourceFunction
import scala.util.Random
import com.shufang.flink.bean.People
/**
* 这个类主要是用来模拟数据,生成自定义source
*
* @PublicEvolving
* 可以按照以下Source自定义Source
* public class SocketTextStreamFunction implements SourceFunction<String>
*/
class MyPeopleSource extends SourceFunction[People] {
//定义一个控制Source开关的
var isRunning: Boolean = true
override def run(ctx: SourceFunction.SourceContext[People]): Unit = {
//定义以及随机数生成器
val random = new Random()
//用来模拟不同人的不同分数
val current_score = 1.to(10).map {
case i =>
("people_" + i, 60 + random.nextGaussian() * 20)
}
//用无限循环来生成数据流
while (isRunning) {
current_score.foreach(
ps => {
val name: String = ps._1
val score: Double = ps._2
val people:bean.People = People(random.nextInt(10),name,score)
ctx.collect(people)
}
)
//模拟数据生成间隔
Thread.sleep(1500)
}
}
/**
* 表示在isRunning为false时,source关闭
*/
override def cancel(): Unit = {
isRunning = false
}
}