Flink series three Flink actual combat

Table of contents

​edit

foreword

1. Install the flink environment

2. Create the first demo of flink in idea

2.1. Execute the following maven command

2.2. Fill in 'groupId', 'artifactId', 'version', 'package'

2.3. Select Yes to generate the created project

3. Develop the first flink program

3.1. Develop a simple statistical program

3.2. Compile directly to get the jar package

4. Start the environment

4.1. Start the downloaded flink environment

4.2. Create a server-side Tcp monitor

4.3. Open the calculation log

4.4. Enter text in the establishment of nc listening port

4.5, there are statistics in the output log


insert image description here

foreword

        As a streaming computing framework, Flink can be used for batch processing, that is, to process static data sets and historical data sets; it can also be used for stream processing, that is, to process some real-time data streams in real time and generate data in real time Streaming results, as long as the data comes continuously, Flink can continue to calculate. I will not elaborate on the detailed introduction in this article. Interested students can reply to previous articles: Flink Series 2 Overview of Flink Stateful Stream Processing , Flink Series 1 Development Machine Installation . This article is the third in the Flink series, we try to install and operate Flink locally.

1. Install the flink environment

        First, you need to install apache-flink in your local environment, just execute the following command, it is more convenient to use docker installation.

brew install apache-flink

2. Create the first demo of flink in idea

2.1. Execute the following maven command

        Execute the following command to create a project. The function of this command is to use Maven to build a Java quick start project template based on Apache Flink. After execution, the corresponding dependency package will be downloaded.

mvn archetype:generate                               \
-DarchetypeGroupId=org.apache.flink              \
-DarchetypeArtifactId=flink-quickstart-java      \
-DarchetypeVersion=1.8.0     \
-DarchetypeCatalog=local

Explain the specific meaning:

  • mvnIs Maven's command-line tool.
  • archetype:generateIndicates that a new project is generated using the prototype template.
  • -DarchetypeGroupIdThe group ID of the project template is specified, which is the default template group ID provided by the Apache Flink team for the project.
  • -DarchetypeArtifactIdThe Artifact ID of the project template is specified, which is the default template Artifact ID provided by the Apache Flink team for the project.
  • -DarchetypeVersionSpecifies the version number of the project template.
  • -DarchetypeCatalogSpecifies the local template directory.
  • The backslash (\) is the line breaking character of the command, which means that the command is continuous, but it needs to be divided into multiple lines for format considerations.

2.2. Fill in 'groupId', 'artifactId', 'version', 'package'

Define value for property 'groupId': com.lly.flink.java
Define value for property 'artifactId': flink-traning
Define value for property 'version' 1.0-SNAPSHOT: : 1.0.0
Define value for property 'package' com.lly.flink.java: : 
Confirm properties configuration:
groupId: com.lly.flink.java
artifactId: flink-traning
version: 1.0.0
package: com.lly.flink.java

2.3. Select Yes to generate the created project

         Pay special attention, you must choose "Y" here to ensure the smooth production of the project.

 Y: : Y
 [INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: flink-quickstart-java:1.8.0
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.lly.flink.java
[INFO] Parameter: artifactId, Value: flink-traning
[INFO] Parameter: version, Value: 1.0.0
[INFO] Parameter: package, Value: com.lly.flink.java
[INFO] Parameter: packageInPathFormat, Value: com/lly/flink/java
[INFO] Parameter: package, Value: com.lly.flink.java
[INFO] Parameter: version, Value: 1.0.0
[INFO] Parameter: groupId, Value: com.lly.flink.java
[INFO] Parameter: artifactId, Value: flink-traning
[WARNING] CP Don't override file /Users/liluyang/flink-traning/src/main/resources
[INFO] Project created from Archetype in dir: /Users/liluyang/flink-traning
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:17 min
[INFO] Finished at: 2020-11-05T12:42:42+08:00
[INFO] ------------------------------------------------------------------------

3. Develop the first flink program

3.1. Develop a simple statistical program

package com.lly.flink.java;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * @author lly
 * @date 2020-11-05
 **/
public class SocketTextStreamWordCount {
    public static void main(String[] args) throws Exception {
        //参数检查
        if (args.length != 2) {
            System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>");
            return;
        }

        String hostname = args[0];
        Integer port = Integer.parseInt(args[1]);


        // set up the streaming execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //获取数据
        DataStreamSource<String> stream = env.socketTextStream(hostname, port);

        //计数
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter())
                .keyBy(0)
                .sum(1);

        sum.print();

        env.execute("Java WordCount from SocketTextStream Example");
    }

    public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
            String[] tokens = s.toLowerCase().split("\\W+");

            for (String token : tokens) {
                if (token.length() > 0) {
                    collector.collect(new Tuple2<String, Integer>(token, 1));
                }
            }
        }
    }
}

        Here I briefly explain some of this code, hoping that students who are just starting to learn can understand it more deeply. This code uses Flink to implement the function of reading data through network streams and counting the number of occurrences of words. The specific implementation details are as follows:

  1. Declare a SocketTextStreamWordCountclass and define mainthe method of this class as the entry point of the program.

  2. In mainthe method, first check the incoming command line parameters, if the number of parameters is not 2, then output the usage instructions and return directly.

  3. Then obtain the host address and port number for subsequent socket connection establishment.

  4. Then create an environment object for Flink stream processing StreamExecutionEnvironment, which is used to set the execution environment and create data streams.

  5. Call socketTextStreamthe method to get a DataStreamSource<String>object, which is used to get the data stream from the socket connection.

  6. Operate the obtained data stream flatMap, use LineSplitterthe class as a converter to split each line of text data into words, and convert the words into "单词,1"tuple format for subsequent statistics.

  7. Use the method on the transformed data stream keyByto group by the first field, which is the word.

  8. Use the method on the grouped data sumto sum the second field (that is, the number of occurrences), and return a SingleOutputStreamOperator<Tuple2<String, Integer>>result stream of type .

  9. Finally call printthe method to print the result to the console.

  10. Finally, call executethe method, pass in a string "Java WordCount from SocketTextStream Example" as the task name, and start executing the entire Flink application.

  11. Declare a static inner class LineSplitter, implement Flink's FlatMapFunctioninterface, and override flatMapthe method. This method splits the input text line according to non-word characters (such as spaces, commas, etc.), and converts each word into a tuple, where the first field is a word, and the second field is 1, indicating the Word occurs 1 time. 

3.2. Compile directly to get the jar package

4. Start the environment

4.1. Start the downloaded flink environment

flink run -c business class package path jar package path IP port example

flink run -c 业务类包路径 jar包路径 IP 端口
示例:
flink run -c com.lly.flink.SocketTextStreamWordCount /Users/liluyang/flink-traning/target/original-flink-traning-1.0.0.jar 127.0.0.1 9000

Job ID will be generated after successful startup

Job has been submitted with JobID b04bad9f4c05efd67344179ee676b513

After the startup is successful, visit: http://localhost:8081/, you can directly ask the operation background of flink, and the operation background can intuitively see the execution status and basic operations of the job

 4.2. Create a server-side Tcp monitor

Create a server to listen and accept connections

nc -l 9000

4.3. Open the calculation log

cd /usr/local/Cellar/apache-flink/1.10.0/libexec/log

 4.4. Enter text in the establishment of nc listening port

liluyang@liluyangdeMacBook-Pro ~ % nc -l 9000



cda
cda
dsas
assgasg
nihao 
nihao 
nihao
nihao
1
1
1
1
1
1
1

4.5, there are statistics in the output log

liluyang@liluyangdeMacBook-Pro log % tail -100f flink-liluyang-taskexecutor-0-liluyangdeMacBook-Pro.local.out
(cda,1)
(cda,2)
(dsas,1)
(assgasg,1)
(nihao,1)
(nihao,2)
(nihao,3)
(nihao,4)
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)

So far: Install Flink on a Mac computer and run it. Then, a simple Flink program is used to introduce how to build and run a Flink program.

Guess you like

Origin blog.csdn.net/lly576403061/article/details/131653579
Recommended