环境准备
- linux上安装:
你可以通过该地址 https://flink.apache.org/downloads.html 下载到最新版本的 Flink。
安装好之后进去bin目录下运行./start-cluster.sh
如果提示你运行成功,则可以打开UI界面进去看下,访问地址http://192.168.10.45:8081/
- IDEA创建项目
创建一个maven项目,依赖可以参考我的,如下:
<dependencies>
<!-- Apache Flink dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- log4j -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- 使用 maven-shade 插件创建一个包含所有必要的依赖项的 fat Jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<!--注意:这里一定要换成你自己的 Job main 方法的启动类-->
<mainClass>com.msy.main.streamMain</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
代码编写
public class streamMain {
public static void main(String[] args) throws Exception {
//创建流运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(ParameterTool.fromArgs(args));
env.fromElements(WORDS)
.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
String[] splits = value.toLowerCase().split("\\W+");
for (String split : splits) {
if (split.length() > 0) {
out.collect(new Tuple2<>(split, 1));
}
}
}
})
.keyBy(0)
.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
return new Tuple2<>(value1.f0, value1.f1 + value1.f1);
}
})
.print();
//Streaming 程序必须加这个才能启动程序,否则不会有结果
env.execute("word count streaming demo");
}
private static final String[] WORDS = new String[]{
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer"
};
}
运行
- 本地运行,直接执行main方法即可
- 打包上UI界面运行,打包后进去UI界面,点击Submit New Job后上传jar包,然后填写执行的main方法路径即可(其他参数可以暂时不设置)。
- 可以在Job Manager中看到作业提交的情况
- 可以在Task Manager中看到作业中打印的日志
(因为JobManager负责具体作业的调度,TaskManager负责具体作业的执行,TaskManager中有Task,Task中有具体的Slat,前一章有讲过)
代码分析
-
创建好 StreamExecutionEnvironment(流程序的运行环境)
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
-
给流程序的运行环境设置全局的配置(从参数 args 获取)
env.getConfig().setGlobalJobParameters(ParameterTool.fromArgs(args));
-
构建数据源,WORDS 是个字符串数组
env.fromElements(WORDS)
-
将字符串进行分隔然后收集,组装后的数据格式是 (word、1),1 代表 word 出现的次数为 1
flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception { String[] splits = value.toLowerCase().split("\\W+"); for (String split : splits) { if (split.length() > 0) { out.collect(new Tuple2<>(split, 1)); } } } })
-
根据 word 关键字进行分组(0 代表对第一个字段分组,也就是对 word 进行分组,后面在算子章节会细讲)
keyBy(0)
-
对单个 word 进行计数操作
reduce(new ReduceFunction<Tuple2<String, Integer>>() { @Override public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception { return new Tuple2<>(value1.f0, value1.f1 + value2.f1); } })
-
打印所有的数据流,格式是 (word,count),count 代表 word 出现的次数
print()
-
开始执行 Job
env.execute(" word count streaming demo");