Table of contents
1. Install the flink environment
2. Create the first demo of flink in idea
2.1. Execute the following maven command
2.2. Fill in 'groupId', 'artifactId', 'version', 'package'
2.3. Select Yes to generate the created project
3. Develop the first flink program
3.1. Develop a simple statistical program
3.2. Compile directly to get the jar package
4.1. Start the downloaded flink environment
4.2. Create a server-side Tcp monitor
4.4. Enter text in the establishment of nc listening port
4.5, there are statistics in the output log
foreword
As a streaming computing framework, Flink can be used for batch processing, that is, to process static data sets and historical data sets; it can also be used for stream processing, that is, to process some real-time data streams in real time and generate data in real time Streaming results, as long as the data comes continuously, Flink can continue to calculate. I will not elaborate on the detailed introduction in this article. Interested students can reply to previous articles: Flink Series 2 Overview of Flink Stateful Stream Processing , Flink Series 1 Development Machine Installation . This article is the third in the Flink series, we try to install and operate Flink locally.
1. Install the flink environment
First, you need to install apache-flink in your local environment, just execute the following command, it is more convenient to use docker installation.
brew install apache-flink
2. Create the first demo of flink in idea
2.1. Execute the following maven command
Execute the following command to create a project. The function of this command is to use Maven to build a Java quick start project template based on Apache Flink. After execution, the corresponding dependency package will be downloaded.
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.8.0 \
-DarchetypeCatalog=local
Explain the specific meaning:
mvn
Is Maven's command-line tool.archetype:generate
Indicates that a new project is generated using the prototype template.-DarchetypeGroupId
The group ID of the project template is specified, which is the default template group ID provided by the Apache Flink team for the project.-DarchetypeArtifactId
The Artifact ID of the project template is specified, which is the default template Artifact ID provided by the Apache Flink team for the project.-DarchetypeVersion
Specifies the version number of the project template.-DarchetypeCatalog
Specifies the local template directory.- The backslash (\) is the line breaking character of the command, which means that the command is continuous, but it needs to be divided into multiple lines for format considerations.
2.2. Fill in 'groupId', 'artifactId', 'version', 'package'
Define value for property 'groupId': com.lly.flink.java
Define value for property 'artifactId': flink-traning
Define value for property 'version' 1.0-SNAPSHOT: : 1.0.0
Define value for property 'package' com.lly.flink.java: :
Confirm properties configuration:
groupId: com.lly.flink.java
artifactId: flink-traning
version: 1.0.0
package: com.lly.flink.java
2.3. Select Yes to generate the created project
Pay special attention, you must choose "Y" here to ensure the smooth production of the project.
Y: : Y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: flink-quickstart-java:1.8.0
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.lly.flink.java
[INFO] Parameter: artifactId, Value: flink-traning
[INFO] Parameter: version, Value: 1.0.0
[INFO] Parameter: package, Value: com.lly.flink.java
[INFO] Parameter: packageInPathFormat, Value: com/lly/flink/java
[INFO] Parameter: package, Value: com.lly.flink.java
[INFO] Parameter: version, Value: 1.0.0
[INFO] Parameter: groupId, Value: com.lly.flink.java
[INFO] Parameter: artifactId, Value: flink-traning
[WARNING] CP Don't override file /Users/liluyang/flink-traning/src/main/resources
[INFO] Project created from Archetype in dir: /Users/liluyang/flink-traning
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:17 min
[INFO] Finished at: 2020-11-05T12:42:42+08:00
[INFO] ------------------------------------------------------------------------
3. Develop the first flink program
3.1. Develop a simple statistical program
package com.lly.flink.java;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
/**
* @author lly
* @date 2020-11-05
**/
public class SocketTextStreamWordCount {
public static void main(String[] args) throws Exception {
//参数检查
if (args.length != 2) {
System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>");
return;
}
String hostname = args[0];
Integer port = Integer.parseInt(args[1]);
// set up the streaming execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//获取数据
DataStreamSource<String> stream = env.socketTextStream(hostname, port);
//计数
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter())
.keyBy(0)
.sum(1);
sum.print();
env.execute("Java WordCount from SocketTextStream Example");
}
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
String[] tokens = s.toLowerCase().split("\\W+");
for (String token : tokens) {
if (token.length() > 0) {
collector.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}
Here I briefly explain some of this code, hoping that students who are just starting to learn can understand it more deeply. This code uses Flink to implement the function of reading data through network streams and counting the number of occurrences of words. The specific implementation details are as follows:
-
Declare a
SocketTextStreamWordCount
class and definemain
the method of this class as the entry point of the program. -
In
main
the method, first check the incoming command line parameters, if the number of parameters is not 2, then output the usage instructions and return directly. -
Then obtain the host address and port number for subsequent socket connection establishment.
-
Then create an environment object for Flink stream processing
StreamExecutionEnvironment
, which is used to set the execution environment and create data streams. -
Call
socketTextStream
the method to get aDataStreamSource<String>
object, which is used to get the data stream from the socket connection. -
Operate the obtained data stream
flatMap
, useLineSplitter
the class as a converter to split each line of text data into words, and convert the words into"单词,1"
tuple format for subsequent statistics. -
Use the method on the transformed data stream
keyBy
to group by the first field, which is the word. -
Use the method on the grouped data
sum
to sum the second field (that is, the number of occurrences), and return aSingleOutputStreamOperator<Tuple2<String, Integer>>
result stream of type . -
Finally call
print
the method to print the result to the console. -
Finally, call
execute
the method, pass in a string "Java WordCount from SocketTextStream Example" as the task name, and start executing the entire Flink application. -
Declare a static inner class
LineSplitter
, implement Flink'sFlatMapFunction
interface, and overrideflatMap
the method. This method splits the input text line according to non-word characters (such as spaces, commas, etc.), and converts each word into a tuple, where the first field is a word, and the second field is 1, indicating the Word occurs 1 time.
3.2. Compile directly to get the jar package
4. Start the environment
4.1. Start the downloaded flink environment
flink run -c business class package path jar package path IP port example
flink run -c 业务类包路径 jar包路径 IP 端口
示例:
flink run -c com.lly.flink.SocketTextStreamWordCount /Users/liluyang/flink-traning/target/original-flink-traning-1.0.0.jar 127.0.0.1 9000
Job ID will be generated after successful startup
Job has been submitted with JobID b04bad9f4c05efd67344179ee676b513
After the startup is successful, visit: http://localhost:8081/, you can directly ask the operation background of flink, and the operation background can intuitively see the execution status and basic operations of the job
4.2. Create a server-side Tcp monitor
Create a server to listen and accept connections
nc -l 9000
4.3. Open the calculation log
cd /usr/local/Cellar/apache-flink/1.10.0/libexec/log
4.4. Enter text in the establishment of nc listening port
liluyang@liluyangdeMacBook-Pro ~ % nc -l 9000
cda
cda
dsas
assgasg
nihao
nihao
nihao
nihao
1
1
1
1
1
1
1
4.5, there are statistics in the output log
liluyang@liluyangdeMacBook-Pro log % tail -100f flink-liluyang-taskexecutor-0-liluyangdeMacBook-Pro.local.out
(cda,1)
(cda,2)
(dsas,1)
(assgasg,1)
(nihao,1)
(nihao,2)
(nihao,3)
(nihao,4)
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
So far: Install Flink on a Mac computer and run it. Then, a simple Flink program is used to introduce how to build and run a Flink program.