Docker builds flink and uploads tasks

1. Docker builds flink

1. Select the appropriate flink version

I won’t introduce docker installation, go to dockerHub to search for flink images, choose the appropriate version to install https://hub.docker.com/_/flink/tags

Use the docker command docker pull flink: 1.16.0-scala_2.12-java8to pull the image
insert image description here

1.16.0-scala_2.12-java8 image version description, flink 1.16.0, flink built-in scala version 2.12, Java version 8

It is recommended to simply start the flink container JobManager and TaskManager to copy the configuration file for easy mounting

# 创建 docker 网络,方便 JobManager 和 TaskManager 内部访问
 docker network create flink-network

# 创建 JobManager 
 docker run \
  -itd \
  --name=jobmanager \
  --publish 8081:8081 \
  --network flink-network \
  --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" \
  flink:1.16.0-scala_2.12-java8 jobmanager 
  
# 创建 TaskManager 
 docker run \
  -itd \
  --name=taskmanager \
  --network flink-network \
  --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" \
  flink:1.16.0-scala_2.12-java8 taskmanager 

The startup is successful.
insert image description here
Access port 8081
insert image description here
and copy the configuration file as follows

# jobmanager 容器
 docker cp jobmanager:/opt/flink/conf ./JobManager/
# taskmanager 容器
docker cp taskmanager:/opt/flink/conf ./TaskManager/
2. Recreate the JobManager and TaskManager containers and mount the configuration files

Modify the web port number of JobManager/conf/flink-conf.yaml to 18081
insert image description here

Modify the TaskManager/conf/flink-conf.yaml container task slot to 5
insert image description here
to start the container mount configuration file

# 启动 jobmanager   
docker run -itd  -v /root/docker/flink/JobManager/conf/:/opt/flink/conf/ --name=jobmanager --publish 18081:18081 --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" --network flink-network flink:1.16.0-scala_2.12-java8 jobmanager
# 启动 taskmanager   
docker run -itd  -v /root/docker/flink/TaskManager/conf/:/opt/flink/conf/ --name=taskmanager --network flink-network --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager"  flink:1.16.0-scala_2.12-java8 taskmanager

parameter explanation

  • FLINK_PROPERTIES=“jobmanager.rpc.address: jobmanager” rpc address, which must be set. The rpc addresses responsible for jobmanager and taskmanager are randomly generated and will fail to connect. Of course, you can also directly modify the configuration file flink-conf.yaml

The following two containers start successfully, you can see that the web port is 18081, and taskmanager starts one, including 5 task slots
insert image description here

2. A simple example of flink

Official website reference address: https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/configuration/overview/#getting-started

1. Create project structure

Use the maven command to specify the prototype Flink Maven Archetype to quickly create a Flink program skeleton that contains the necessary dependencies, and customize the project groupId, artifactId, package and other information

mvn archetype:generate ^
  -DarchetypeGroupId=org.apache.flink ^
  -DarchetypeArtifactId=flink-quickstart-java ^
  -DarchetypeVersion=1.16.0	^
  -DgroupId=com.ye ^
  -DartifactId=flink-study ^
  -Dversion=0.1 ^
  -Dpackage=com.ye ^
  -DinteractiveMode=false

Download successfully and open the project directory

insert image description here
As follows: Note that the startup parameters need to be set for running, otherwise the startup will not find the class , because the flink related packages in the pom.xml file have been added to <scope>provided</scope>indicate that only used in the production environment.<scope>provided</scope><scope>runtime</scope>
insert image description here

Stream processing and batch processing need to be distinguished in the lower version of flink (looks like 1.12), and stream processing is currently used

2. Simple example of batch processing

The following code is used to count the number of occurrences of words

public class DataBatchJob {
    
    
    /* 下面示例统计单词出现的次数 */
    public static void main(String[] args) throws Exception {
    
    
        // 获取 flink 环境
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 添加数据源
        DataStreamSource<String> streamSource = env.fromElements("hello world", "hello flink", "flink", "hello", "world");
        // 对传入的流数据分组
        SingleOutputStreamOperator<Tuple2<String, Integer>> streamOperator = streamSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
    
    
            // value 传入的数据,out
            // Tuple2 二元组
            // out 传出的值
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
    
    
                String[] split = value.split(" ");
                for (String s : split) {
    
    
                    out.collect(Tuple2.of(s, 1));
                }
            }
        });
        // 按二元组的第 0 个位置分组
        KeyedStream<Tuple2<String, Integer>, Tuple> keyBy = streamOperator.keyBy(0);
        // 按二元组的第 1 个位置求和
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = keyBy.sum(1);
        sum.print();
        env.execute("统计单词出现的次数");
    }
}

The execution results are as follows
insert image description here
Upload flink cluster

3. Simple example of stream processing

The following example sums the input values ​​greater than 500 and values ​​less than 500 through the socket text source

public class DataStreamJob {
    
    

    private static final Logger logger = LoggerFactory.getLogger(DataStreamJob.class);

    /* 下面示例对大于 500 和小于 500 的分别求和 */
    public static void main(String[] args) throws Exception {
    
    
        
        // 获取 flink 环境
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 添加 socket 文本流数据源
        //DataStreamSource<String> streamSource = env.fromElements("200", "100", "6000", "500", "2000", "300", "1500", "900");
        DataStreamSource<String> streamSource = env.socketTextStream("127.0.0.1", 7777);

        // 对大于 500 和小于 500 进行分组
        KeyedStream<String, String> stringKeyedStream = streamSource.keyBy(new KeySelector<String, String>() {
    
    
            @Override
            public String getKey(String s) throws Exception {
    
    
                int i = Integer.parseInt(s);
                return i > 500 ? "ge" : "lt";
            }
        });
        // 开 10 秒滚动窗口,每 10 秒为一批数据 【00:00:00 ~ 00:00:10)、【00:00:10 ~ 00:00:20)左闭右开区间
        WindowedStream<String, String, TimeWindow> windowedStream = stringKeyedStream.window(TumblingProcessingTimeWindows.of(Time.seconds(10)));
        
        // 窗口处理函数,泛型 String, Integer, String, TimeWindow 依次对应 输入类型、输出类型、 KEY类型(即keyBy 返回的类型), 窗口
        SingleOutputStreamOperator<Integer> outputStreamOperator = windowedStream.process(new ProcessWindowFunction<String, Integer, String, TimeWindow>() {
    
    
            /*
            * key: 分组的 key
            * context: 上下文信息
            * elements: 传过来的一批数据
            * out: 数据输出
            * */
            @Override
            public void process(String key, ProcessWindowFunction<String, Integer, String, TimeWindow>.Context context, Iterable<String> elements, Collector<Integer> out) throws Exception {
    
    
                System.out.println(key);
                AtomicInteger sum = new AtomicInteger();
                elements.forEach(item -> sum.addAndGet(Integer.parseInt(item)));
                out.collect(sum.get());
            }
        });
        // 输出
        outputStreamOperator.print();
        env.execute("分组求和");
    }
}

Open Socket text stream test on window or Linux
insert image description here

4. Upload the flink cluster

Packaging project: You can modify the startup class in pom.xml, or you can set the startup class parameters in the command startup or ui interface upload
insert image description here

①, UI interface to submit tasks

Use the ui interface to upload the jar to the flink cluster, click submit to run

insert image description here

②, order to submit the task
# 如果集群( 即JobManager) 在当前服务器可以使用如下命令
	$ bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
# 如果集群( 即JobManager) 不在当前服务器,在 TaskManager 服务器提交作业可以使用如下命令
	# -m 指定 JobManager 服务器地址
	# -c 指定作业入口程序
	# -p 指定并行度
	$ bin/flink run -m 192.168.1.1:8081 -c com.ye.StreamWordCount -p 2 <jarFile>
# 撤销任务	
	$ bin/flink cancle <jobId>
5. Web-ui submit, view and revoke tasks

Batch run completed insert image description here
Stream processing is running
insert image description here

Three, to be resolved

The flink cluster started by docker found that the stdout of the UI interface has no print output
insert image description here

Guess you like

Origin blog.csdn.net/qq_41538097/article/details/129113866