Flink基础(二):WordCount入门

环境准备

  • linux上安装:
    你可以通过该地址 https://flink.apache.org/downloads.html 下载到最新版本的 Flink。
    安装好之后进去bin目录下运行 ./start-cluster.sh 如果提示你运行成功,则可以打开UI界面进去看下,访问地址 http://192.168.10.45:8081/
  • IDEA创建项目
    创建一个maven项目,依赖可以参考我的,如下:
<dependencies>
    <!-- Apache Flink dependencies -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    
	<!-- log4j -->
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.7</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
        <scope>runtime</scope>
    </dependency>
</dependencies>

<build>
    <plugins>
        <!-- Java Compiler -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>${java.version}</source>
                <target>${java.version}</target>
            </configuration>
        </plugin>

        <!-- 使用 maven-shade 插件创建一个包含所有必要的依赖项的 fat Jar -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <artifactSet>
                            <excludes>
                                <exclude>org.apache.flink:force-shading</exclude>
                                <exclude>com.google.code.findbugs:jsr305</exclude>
                                <exclude>org.slf4j:*</exclude>
                                <exclude>log4j:*</exclude>
                            </excludes>
                        </artifactSet>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <!--注意:这里一定要换成你自己的 Job main 方法的启动类-->
                                <mainClass>com.msy.main.streamMain</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

代码编写

public class streamMain {
    public static void main(String[] args) throws Exception {
        //创建流运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.getConfig().setGlobalJobParameters(ParameterTool.fromArgs(args));
        env.fromElements(WORDS)
                .flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                        String[] splits = value.toLowerCase().split("\\W+");

                        for (String split : splits) {
                            if (split.length() > 0) {
                                out.collect(new Tuple2<>(split, 1));
                            }
                        }
                    }
                })
                .keyBy(0)
                .reduce(new ReduceFunction<Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
                        return new Tuple2<>(value1.f0, value1.f1 + value1.f1);
                    }
                })
                .print();
        //Streaming 程序必须加这个才能启动程序,否则不会有结果
        env.execute("word count streaming demo");
    }

    private static final String[] WORDS = new String[]{
            "To be, or not to be,--that is the question:--",
            "Whether 'tis nobler in the mind to suffer"
    };
}

运行

  • 本地运行,直接执行main方法即可
  • 打包上UI界面运行,打包后进去UI界面,点击Submit New Job后上传jar包,然后填写执行的main方法路径即可(其他参数可以暂时不设置)。
  • 可以在Job Manager中看到作业提交的情况
  • 可以在Task Manager中看到作业中打印的日志
    (因为JobManager负责具体作业的调度,TaskManager负责具体作业的执行,TaskManager中有Task,Task中有具体的Slat,前一章有讲过)

代码分析

  • 创建好 StreamExecutionEnvironment(流程序的运行环境)

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
  • 给流程序的运行环境设置全局的配置(从参数 args 获取)

    env.getConfig().setGlobalJobParameters(ParameterTool.fromArgs(args));
    
  • 构建数据源,WORDS 是个字符串数组

    env.fromElements(WORDS)
    
  • 将字符串进行分隔然后收集,组装后的数据格式是 (word、1),1 代表 word 出现的次数为 1

    flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
            String[] splits = value.toLowerCase().split("\\W+");
    
            for (String split : splits) {
                if (split.length() > 0) {
                    out.collect(new Tuple2<>(split, 1));
                }
            }
        }
    })
    
  • 根据 word 关键字进行分组(0 代表对第一个字段分组,也就是对 word 进行分组,后面在算子章节会细讲)

    keyBy(0)
    
  • 对单个 word 进行计数操作

    reduce(new ReduceFunction<Tuple2<String, Integer>>() {
        @Override
        public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
            return new Tuple2<>(value1.f0, value1.f1 + value2.f1);
        }
    })
    
  • 打印所有的数据流,格式是 (word,count),count 代表 word 出现的次数

    print()
    
  • 开始执行 Job

    env.execute(" word count streaming demo");
    
发布了20 篇原创文章 · 获赞 9 · 访问量 551

猜你喜欢

转载自blog.csdn.net/weixin_42155491/article/details/104878839