Article directory
1. Docker builds flink
1. Select the appropriate flink version
I won’t introduce docker installation, go to dockerHub to search for flink images, choose the appropriate version to install https://hub.docker.com/_/flink/tags
Use the docker command docker pull flink: 1.16.0-scala_2.12-java8
to pull the image
1.16.0-scala_2.12-java8 image version description, flink 1.16.0, flink built-in scala version 2.12, Java version 8
It is recommended to simply start the flink container JobManager and TaskManager to copy the configuration file for easy mounting
# 创建 docker 网络,方便 JobManager 和 TaskManager 内部访问
docker network create flink-network
# 创建 JobManager
docker run \
-itd \
--name=jobmanager \
--publish 8081:8081 \
--network flink-network \
--env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" \
flink:1.16.0-scala_2.12-java8 jobmanager
# 创建 TaskManager
docker run \
-itd \
--name=taskmanager \
--network flink-network \
--env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" \
flink:1.16.0-scala_2.12-java8 taskmanager
The startup is successful.
Access port 8081
and copy the configuration file as follows
# jobmanager 容器
docker cp jobmanager:/opt/flink/conf ./JobManager/
# taskmanager 容器
docker cp taskmanager:/opt/flink/conf ./TaskManager/
2. Recreate the JobManager and TaskManager containers and mount the configuration files
Modify the web port number of JobManager/conf/flink-conf.yaml to 18081
Modify the TaskManager/conf/flink-conf.yaml container task slot to 5
to start the container mount configuration file
# 启动 jobmanager
docker run -itd -v /root/docker/flink/JobManager/conf/:/opt/flink/conf/ --name=jobmanager --publish 18081:18081 --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" --network flink-network flink:1.16.0-scala_2.12-java8 jobmanager
# 启动 taskmanager
docker run -itd -v /root/docker/flink/TaskManager/conf/:/opt/flink/conf/ --name=taskmanager --network flink-network --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" flink:1.16.0-scala_2.12-java8 taskmanager
parameter explanation
- FLINK_PROPERTIES=“jobmanager.rpc.address: jobmanager” rpc address, which must be set. The rpc addresses responsible for jobmanager and taskmanager are randomly generated and will fail to connect. Of course, you can also directly modify the configuration file flink-conf.yaml
The following two containers start successfully, you can see that the web port is 18081, and taskmanager starts one, including 5 task slots
2. A simple example of flink
Official website reference address: https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/configuration/overview/#getting-started
1. Create project structure
Use the maven command to specify the prototype Flink Maven Archetype to quickly create a Flink program skeleton that contains the necessary dependencies, and customize the project groupId, artifactId, package and other information
mvn archetype:generate ^
-DarchetypeGroupId=org.apache.flink ^
-DarchetypeArtifactId=flink-quickstart-java ^
-DarchetypeVersion=1.16.0 ^
-DgroupId=com.ye ^
-DartifactId=flink-study ^
-Dversion=0.1 ^
-Dpackage=com.ye ^
-DinteractiveMode=false
Download successfully and open the project directory
As follows: Note that the startup parameters need to be set for running, otherwise the startup will not find the class , because the flink related packages in the pom.xml file have been added to <scope>provided</scope>
indicate that only used in the production environment.<scope>provided</scope>
<scope>runtime</scope>
Stream processing and batch processing need to be distinguished in the lower version of flink (looks like 1.12), and stream processing is currently used
2. Simple example of batch processing
The following code is used to count the number of occurrences of words
public class DataBatchJob {
/* 下面示例统计单词出现的次数 */
public static void main(String[] args) throws Exception {
// 获取 flink 环境
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 添加数据源
DataStreamSource<String> streamSource = env.fromElements("hello world", "hello flink", "flink", "hello", "world");
// 对传入的流数据分组
SingleOutputStreamOperator<Tuple2<String, Integer>> streamOperator = streamSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
// value 传入的数据,out
// Tuple2 二元组
// out 传出的值
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] split = value.split(" ");
for (String s : split) {
out.collect(Tuple2.of(s, 1));
}
}
});
// 按二元组的第 0 个位置分组
KeyedStream<Tuple2<String, Integer>, Tuple> keyBy = streamOperator.keyBy(0);
// 按二元组的第 1 个位置求和
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = keyBy.sum(1);
sum.print();
env.execute("统计单词出现的次数");
}
}
The execution results are as follows
Upload flink cluster
3. Simple example of stream processing
The following example sums the input values greater than 500 and values less than 500 through the socket text source
public class DataStreamJob {
private static final Logger logger = LoggerFactory.getLogger(DataStreamJob.class);
/* 下面示例对大于 500 和小于 500 的分别求和 */
public static void main(String[] args) throws Exception {
// 获取 flink 环境
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 添加 socket 文本流数据源
//DataStreamSource<String> streamSource = env.fromElements("200", "100", "6000", "500", "2000", "300", "1500", "900");
DataStreamSource<String> streamSource = env.socketTextStream("127.0.0.1", 7777);
// 对大于 500 和小于 500 进行分组
KeyedStream<String, String> stringKeyedStream = streamSource.keyBy(new KeySelector<String, String>() {
@Override
public String getKey(String s) throws Exception {
int i = Integer.parseInt(s);
return i > 500 ? "ge" : "lt";
}
});
// 开 10 秒滚动窗口,每 10 秒为一批数据 【00:00:00 ~ 00:00:10)、【00:00:10 ~ 00:00:20)左闭右开区间
WindowedStream<String, String, TimeWindow> windowedStream = stringKeyedStream.window(TumblingProcessingTimeWindows.of(Time.seconds(10)));
// 窗口处理函数,泛型 String, Integer, String, TimeWindow 依次对应 输入类型、输出类型、 KEY类型(即keyBy 返回的类型), 窗口
SingleOutputStreamOperator<Integer> outputStreamOperator = windowedStream.process(new ProcessWindowFunction<String, Integer, String, TimeWindow>() {
/*
* key: 分组的 key
* context: 上下文信息
* elements: 传过来的一批数据
* out: 数据输出
* */
@Override
public void process(String key, ProcessWindowFunction<String, Integer, String, TimeWindow>.Context context, Iterable<String> elements, Collector<Integer> out) throws Exception {
System.out.println(key);
AtomicInteger sum = new AtomicInteger();
elements.forEach(item -> sum.addAndGet(Integer.parseInt(item)));
out.collect(sum.get());
}
});
// 输出
outputStreamOperator.print();
env.execute("分组求和");
}
}
Open Socket text stream test on window or Linux
4. Upload the flink cluster
Packaging project: You can modify the startup class in pom.xml, or you can set the startup class parameters in the command startup or ui interface upload
①, UI interface to submit tasks
Use the ui interface to upload the jar to the flink cluster, click submit to run
②, order to submit the task
# 如果集群( 即JobManager) 在当前服务器可以使用如下命令
$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
# 如果集群( 即JobManager) 不在当前服务器,在 TaskManager 服务器提交作业可以使用如下命令
# -m 指定 JobManager 服务器地址
# -c 指定作业入口程序
# -p 指定并行度
$ bin/flink run -m 192.168.1.1:8081 -c com.ye.StreamWordCount -p 2 <jarFile>
# 撤销任务
$ bin/flink cancle <jobId>
5. Web-ui submit, view and revoke tasks
Batch run completed
Stream processing is running
Three, to be resolved
The flink cluster started by docker found that the stdout of the UI interface has no print output