1 前言

1.1 概览

Stateful Computations over Data Streams(数据流的状态计算)

Apache Flink是一个框架和分布式处理引擎，用于对无界和有界数据流进行有状态计算。Flink设计为在所有常见的集群环境中运行，以内存速度和任何规模执行计算。

在这里，我们解释了Flink架构的重要方面。

处理无界和有界数据

任何类型的数据都是作为事件流产生的。信用卡交易，传感器测量，机器日志或网站或移动应用程序上的用户交互，所有这些数据都作为流生成。

数据可以作为无界或有界流处理。

无界流有一个开始但没有定义的结束。它们不会在生成时终止并提供数据。必须持续处理无界流，即必须在摄取事件后立即处理事件。无法等待所有输入数据到达，因为输入是无界的，并且在任何时间点都不会完成。处理无界数据通常要求以特定顺序（例如事件发生的顺序）摄取事件，以便能够推断结果完整性。
有界流具有定义的开始和结束。可以在执行任何计算之前通过摄取所有数据来处理有界流。处理有界流不需要有序摄取，因为可以始终对有界数据集进行排序。有界流的处理也称为批处理。

Apache Flink擅长处理无界和有界数据集。精确控制时间和状态使Flink的运行时能够在无界流上运行任何类型的应用程序。有界流由算法和数据结构内部处理，这些算法和数据结构专为固定大小的数据集而设计，从而产生出色的性能。

通过探索在Flink之上构建的用例来使自己信服。

Deploy Applications Anywhere

Apache Flink是一个分布式系统，需要计算资源才能执行应用程序。Flink与所有常见的集群资源管理器（如Hadoop YARN，Apache Mesos和Kubernetes）集成，但也可以设置为作为独立集群运行。

Flink旨在很好地适用于之前列出的每个资源管理器。这是通过特定于资源管理器的部署模式实现的，这些模式允许Flink以其惯用的方式与每个资源管理器进行交互。

部署Flink应用程序时，Flink会根据应用程序配置的并行性自动识别所需资源，并从资源管理器请求它们。如果发生故障，Flink会通过请求新资源来替换发生故障的容器。提交或控制应用程序的所有通信都通过REST调用进行。这简化了Flink在许多环境中的集成。

Run Applications at any Scale

Flink旨在以任何规模运行有状态流应用程序。应用程序可以并行化为数千个在集群中分布和同时执行的任务。因此，应用程序可以利用几乎无限量的CPU，主内存，磁盘和网络IO。而且，Flink可以轻松维护非常大的应用程序状态。其异步和增量检查点算法确保对处理延迟的影响最小，同时保证一次性状态一致性。

用户报告了在其生产环境中运行的Flink应用程序的可扩展性数字令人印象深刻，例如

应用程序每天处理数万亿个事件，
应用程序维护多个TB的状态，以及
应用程序在数千个内核的运行。

Leverage In-Memory Performance

有状态Flink应用程序针对本地状态访问进行了优化。任务状态始终保留在内存中，或者，如果状态大小超过可用内存，则保存在访问高效的磁盘上数据结构中。因此，任务通过访问本地（通常是内存中）状态来执行所有计算，从而产生非常低的处理延迟。Flink通过定期和异步检查本地状态到持久存储来保证在出现故障时的一次状态一致性。

1.2 架构

1.3 Flink特性

支持批处理和数据流程序处理
优雅流畅的支持java(主要使用java 语言编写)和scala api
同时支持高吞吐量和低延迟
支持事件处理和无序处理通过SataStream API，基于DataFlow数据流模型
在不同的时间语义(时间时间，处理时间)下支持灵活的窗口(时间，技术，会话，自定义触发器)
仅处理一次的容错担保
自动反压机制
图处理(批) 机器学习(批) 复杂事件处理(流)
在dataSet(批处理)API中内置支持迭代程序(BSP)
高效的自定义内存管理，和健壮的切换能力在in-memory和out-of-core中
兼容hadoop的mapreduce和storm
集成YARN,HDFS,Hbase 和其它hadoop生态系统的组件

1.4 应用场景

多种数据源(有时不可靠)：当数据是由数以百万计的不同用户或设备产生的，它是安全的假设数据会按照事件产生的顺序到达，和在上游数据失败的情况下，一些事件可能会比他们晚几个小时，迟到的数据也需要计算，这样的结果是准确的。
应用程序状态管理：当程序变得更加的复杂，比简单的过滤或者增强的数据结构，这个时候管理这些应用的状态将会变得比较难(例如：计数器，过去数据的窗口，状态机，内置数据库)。flink提供了工具，这些状态是有效的，容错的，和可控的，所以你不需要自己构建这些功能。
数据的快速处理：有一个焦点在实时或近实时用例场景中，从数据生成的那个时刻，数据就应该是可达的。在必要的时候，flink完全有能力满足这些延迟。
海量数据处理：这些程序需要分布在很多节点运行来支持所需的规模。flink可以在大型的集群中无缝运行，就像是在一个小集群一样。

2 安装

2.1安装flink

2.1.1 准备工作

2.1.1.1基本要求

2.1.1.1.1操作系统

Centos7.2以上

2.1.1.1.2 jdk、maven和hdp版本

jdk-1.8.0_141以上
maven-3.3.5 以上
hdp-2.6.4.0-91

2.1.1.1.3 获取flink的lib包

第一种方法：自己编译flink-1.6.0 （推荐，耗时间，兼容性好）

# wget https://github.com/apache/flink/archive/release-1.6.0.tar.gz
cd /data02/maven/soft/flink-release-1.6.0
mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.3.2.4.0-91

编译好的目录： /data02/maven/soft/flink-release-1.6.0/

编译环境：

jdk-1.8
flink-1.6.0
hdp-hadoop-2.7.3.2.4.0-91
maven-3.5.4

编译时间：1小时左右

第二种方法：直接下载已经编译好的包（推荐，不耗时，兼容性可能不太好）

# cd /opt/

#wget http://www.us.apache.org/dist/flink/flink-1.6.0/flink-1.6.0-bin-hadoop27-scala_2.11.tgz

#tar -zxvf flink-1.6.0-bin-hadoop27-scala_2.11.tgz

#mv flink-release-1.6.0 flink

#chown -R hdfs:hdfs flink/*

注意：每台flink集群的设备都需要这些包

2.1.1.1.4修改目录权限

chmod 755 /var/run

chmod 755 /etc/profile

2.1.1.1.5运行用户

安装过程中，需要用root用户。

2.1.1.1.6每台设备上必须要有所有设备的hosts和ip映射

172.16.5.117 bdp03nn01

172.16.5.118 bdp03nn02

172.16.5.119 bdp03dn01

2.1.1.2版本选择

结合flink的发布说明，需要考虑到兼容性和可扩展性.

Apache flink : 1.6.0

2.1.1.3角色规划

主机	角色	备注
172.16.5.117 bdp03nn01	Flink gateway 和client
172.16.5.118 bdp03nn02	Flink gateway
172.16.5.119 bdp03dn01	Flink gateway

2.1.2 安装步骤

在ambari server的设备上下载flink的服务包

2.1.2.1.下载ambari-flink-service

1)在ambari server 所在设备上执行

#VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'`

#sudo git clone https://github.com/highfei2011/ambari-flink-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/FLINK

注意：

https://github.com/highfei2011/ambari-flink-service.git

这是我的github 资源，如果需要更新flink的版本，那么一样需要修改github仓库的对应参数。

2)进入目录：

#cd /opt/

3)在每台机群设备上下载

#wget http://www.us.apache.org/dist/flink/flink-1.6.0/flink-1.6.0-bin-hadoop27-scala_2.11.tgz

4)在每台机群设备上解压到/opt下

tar –zxvf flink-1.6.0-bin-hadoop27-scala_2.11.tgz

mv /opt/flink-1.6.0 /opt/flink

chown -R flink:flink /opt/flink/

chmod 777 -R /opt/flink/

sudo mkdir –p /opt/flink/conf

5)每台设备添加环境变量

在每台设备上添加/etc/profile环境变量

export HADOOP_CLASSPATH=`hadoop classpath`

export CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH

export FLINK_HOME=/opt/flink/

export PATH=$FLINK_HOME/bin:$PATH

export PATH

source /etc/profile

2.1.2.2.重启ambari-server

在 ambari-server 设备上执行

#sudo systemctl restart ambari-server

或者

#sudo service ambari-server restart

2.1.2.3.安装flink

选择Action ---》 Add service

选择flink-1.6.0

选择flink 启动的一台设备

添加完成

2.1.2.4.重启集群

2.1.2.5.参数修改

在 Ambari 上修改 Flink 的参数

A、网络缓冲区大小

如果以非常高的并行度运行Flink，则可能需要增加网络缓冲区的数量，默认，Flink取JVM堆大小的10％用作网络缓冲区，最小为64MB，最大为1GB，可通过以下参数配置。为什么需要网络缓冲区？

见https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#configuring-the-network-buffers

taskmanager.network.memory.min 网络缓冲区最小字节数(默认64M)

taskmanager.network.memory.max 网络缓冲区的最大字节数(默认1G)

taskmanager.network.memory.fraction 用于网络缓冲区的JVM内存的占比(默认0.1)

默认:

taskmanager.network.memory.fraction: 0.1

taskmanager.network.memory.min: 67108864

taskmanager.network.memory.max: 1073741824

启动异常:org.apache.flink.configuration.IllegalConfigurationException: Invalid configuration value for (taskmanager.network.memory.fraction, taskmanager.network.memory.min, taskmanager.network.memory.max) : (0.1, 67108864, 1073741824) - Network buffer memory size too large: 67108864 >= 8388608 (total JVM memory size)

B、yarn参数

yarn.nodemanager.resource.memory-mb=30G 每个nodemanager最大可用内存

yarn.scheduler.maximum-allocation-mb=30G 单个容器可申请的最大内存

yarn.scheduler.minimum-allocation-mb=1024M 单个容器最小内存

containerized.heap-cutoff-min=400

参考：https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html

C、Flink on yarn 参数

flink在yarn上可以直接运行起来

<name>yarn.client.failover-proxy-provider</name>

<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

</property>

flink在yarn上不能运行起来

<name>yarn.client.failover-proxy-provider</name>

<value>org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider</value>

</property>

并重启yarn集群

D、添加环境变量

export HADOOP_CLASSPATH=`hadoop classpath`

2.2测试Flink

2.2.1 命令行测试

2.2.2 flink run job

资源管理器：flink on yarn

设备：bdp03nn01

提交用户：hdfs

执行命令：

flink run --jobmanager yarn-cluster \

-yn 1 \

-ytm 768 \

-yjm 768 \

/opt/flink/examples/batch/WordCount.jar \

--input hdfs://bdp03nn01:8020/user/hdfs/demo/input/word \

--output hdfs://bdp03nn01:8020/user/hdfs/demo/output/wc/

等待执行完成

查看输出结果

# hdfs dfs -cat /user/hdfs/demo/output/wc

运行时可以查看web ui

http://host:8081

需要映射8081端口

172.16.5.117 bdp03nn01

172.16.5.118 bdp03nn02

172.16.5.119 bdp03dn01

2.2.3 flink sql client

# sql-client.sh embedded

参考文档：

https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/sqlClient.html#starting-the-sql-client-cli

2.2.2 编程测试（batch测试）

idea 2018-1.1、jdk-1.8、flink-1.6.0、maven-3.4.5

项目构建的需要条件：

https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/scala_api_quickstart.html#maven

https://flink.apache.org/downloads.html

2.2.2.1 pom.xm文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cn.acewell</groupId>
    <artifactId>dev-flink-1</artifactId>
    <version>1.0-SNAPSHOT</version>
    <!--  ====================================================================  -->
    <!--  ===============             Properties 信息           ===============  -->
    <!--  ====================================================================  -->
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <!-- jvm options -->
        <PermGen>128m</PermGen>
        <MaxPermGen>1024m</MaxPermGen>
        <CodeCacheSize>1024m</CodeCacheSize>
        <!--add  maven release-->
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <encoding>UTF-8</encoding>
        <!--maven-scala-plugin-->
        <maven.scala.plugin>2.10</maven.scala.plugin>
        <!-- log4j日志文件管理包版本 -->
        <slf4j.version>1.7.7</slf4j.version>
        <log4j.version>1.2.17</log4j.version>

        <!--jvm version-->
        <jvm.version>1.8</jvm.version>
        <!--flink version-->
        <apache.flink.version>1.7.2</apache.flink.version>
        <!--scala version-->
        <scala.version>2.11</scala.version>
        <!--kafka-flink version-->
        <apache.kafka.flink.version>0.10_2.11</apache.kafka.flink.version>
        <!--Alibaba fastjson-->
        <fastjson.version>1.2.47</fastjson.version>
        <!-- roundeights version-->
        <roundeights.version>1.2.0</roundeights.version>
        <!-- joda-time version-->
        <joda-time.version>2.9.1</joda-time.version>
        <!-- scalaj version-->
        <scalaj.version>2.3.0</scalaj.version>
        <!-- json4s version-->
        <json4s.version>3.3.0</json4s.version>
        <!-- flinkspector version -->
        <flinkspector.version>0.9.1</flinkspector.version>

    </properties>
    <repositories>
        <!--  ====================================================================  -->
        <!--  =============== 配置依赖库地址（用于加载CDH依赖的jar包） ===============  -->
        <!--  ====================================================================  -->
        <repository>
            <id>horton-works-releases</id>
            <url>http://repo.hortonworks.com/content/groups/public/</url>
        </repository>
        <repository>
            <id>apache maven</id>
            <url>https://repo.maven.apache.org/maven2/</url>
        </repository>
        <repository>
            <id>mvn repository</id>
            <url>https://mvnrepository.com/artifact/</url>
        </repository>
        <repository>
            <id>CDH</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <organization>
        <name>Acewill</name>
    </organization>

    <developers>
        <developer>
            <name>Jeff Yang</name>
            <email>[email protected]</email>
        </developer>
    </developers>
    <!-- =================================================================== -->
    <!-- ===================== Project dependencies ======================== -->
    <!-- =================================================================== -->
    <dependencies>
        <!--  ====================================================================  -->
        <!--  ===============        引入对应Flink test  包               ==========  -->
        <!--  ====================================================================  -->
        <dependency>
            <groupId>io.flinkspector</groupId>
            <artifactId>flinkspector-datastream_2.11</artifactId>
            <version>${flinkspector.version}</version>
        </dependency>
        <dependency>
            <groupId>io.flinkspector</groupId>
            <artifactId>flinkspector-dataset_2.11</artifactId>
            <version>${flinkspector.version}</version>
        </dependency>
        <!--  ====================================================================  -->
        <!--  ===============        引入对应Flink文件管理包               ==========  -->
        <!--  ====================================================================  -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.10_2.11</artifactId>
            <version>${apache.flink.version}</version>
        </dependency>
        <!--  ====================================================================  -->
        <!--  ===============        引入对应日志文件管理包                  ==========  -->
        <!--  ====================================================================  -->
        <!-- log start -->
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4j.version}</version>
        </dependency>
        <!-- 格式化对象，方便输出日志 -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <!--  ====================================================================  -->
        <!--  ===============        引入对应版本的flink包                ==========  -->
        <!--  ====================================================================  -->

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${apache.flink.version}</version>
        </dependency>

        <!-- flink-table -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table_2.11</artifactId>
            <version>${apache.flink.version}</version>
        </dependency>
        <!--  ====================================================================  -->
        <!--  ===============        引入对应版本的flink 监控包             ==========  -->
        <!--  ====================================================================  -->

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web_2.11</artifactId>
            <version>${apache.flink.version}</version>

        </dependency>

    </dependencies>

    <!--  ====================================================================  -->
    <!--  ===============              maven打包                ===============  -->
    <!--  ====================================================================  -->
    <build>
        <finalName>dev-flink-1.6</finalName>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <outputDirectory>target/java-${java.version}/classes</outputDirectory>
        <testOutputDirectory>target/java-${java.version}/test-classes</testOutputDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.21.0</version>
                <executions>
                    <!--execute all the unit tests-->
                    <execution>
                        <id>default-test</id>
                        <phase>test</phase>
                        <goals>
                            <goal>test</goal>
                        </goals>
                        <configuration>
                            <includes>
                                <include>**/*Test.*</include>
                            </includes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>${jvm.version}</source>
                    <target>${jvm.version}</target>
                </configuration>
            </plugin>

            <!-- ==================================将依赖也打包于jar中 =======================================-->
           <!-- <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>-->

        </plugins>
    </build>

</project>

2.2.2.2 编写测试类和工具类

WordCount.java

package cn.acewill.flink.batch;

import cn.acewill.flink.utils.WordCountData;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.util.Collector;

/**
 * @author Created by yangjf on 20180920.
 * Update date:
 * Time: 下午6:12
 * Project: dev-flink-1.6
 * Package: cn.acewell.flink.batch
 * Describe :
 * Frequency:
 * Result of Test: test ok
 * Command:
 * <p>
 * Email:  [email protected]
 * Status：Using online
 * <p>
 * Please note:
 * Must be checked once every time you submit a configuration file is correct!
 * Data is priceless! Accidentally deleted the consequences!
 */
public class WordCount {
    // *************************************************************************
    //     PROGRAM
    // *************************************************************************

    public static void main(String[] args) throws Exception {

        final ParameterTool params = ParameterTool.fromArgs(args);

        // set up the execution environment
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        // make parameters available in the web interface
        env.getConfig().setGlobalJobParameters(params);


        // get input data
        DataSet<String> text;
        if (params.has("input")) {
            // read the text file from given input path
            text = env.readTextFile(params.get("input"));
        } else {
            // get default test text data
            System.out.println("Executing WordCountWindow example with default input data set.");
            System.out.println("Use --input to specify file input.");
            text = WordCountData.getDefaultTextLineDataSet(env);
        }

        DataSet<Tuple2<String, Integer>> counts =
                // split up the lines in pairs (2-tuples) containing: (word,1)
                text.flatMap(new Tokenizer())
                        // group by the tuple field "0" and sum up tuple field "1"
                        .groupBy(0)
                        .sum(1)
                        .setParallelism(2)
                // 流水线的同时作业数量 https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html
                // 如果设置了 setParallelism 则可以理解： 这类任务（transformations/operators, data sources, and sinks）都有2个同时执行
                ;

        // emit result
        if (params.has("output")) {
            counts.writeAsCsv(params.get("output"), "\n", " ");
            // execute program
            env.execute("WordCountWindow Example");
        } else {
            System.out.println("Printing result to stdout. Use --output to specify output path.");
            counts.print();
        }

    }

    // *************************************************************************
    //     USER FUNCTIONS
    // *************************************************************************

    /**
     * Implements the string tokenizer that splits sentences into words as a user-defined
     * FlatMapFunction. The function takes a line (String) and splits it into
     * multiple pairs in the form of "(word,1)" ({@code Tuple2<String, Integer>}).
     */
    public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {

        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
            // normalize and split the line
            String[] tokens = value.toLowerCase().split("\\W+");

            // emit the pairs
            for (String token : tokens) {
                if (token.length() > 0) {
                    out.collect(new Tuple2<>(token, 1));
                }
                try{
                    Thread.sleep(1000);
                }catch (Exception s){
                    s.printStackTrace();

                }
            }
        }
    }
}

WordCountData.java

package cn.acewill.flink.utils;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
/**
 * @author Created by yangjf on 20180920.
 * Update date:
 * Time: 下午6:14
 * Project: dev-flink-1.6
 * Package: cn.acewell.flink.utils
 * Describe :
 *    Provides the default data sets used for the WordCountWindow example program.
 *    The default data sets are used, if no parameters are given to the program.
 * Frequency: Calculate once a day.
 * Result of Test: test ok
 * Command:
 * <p>
 * Email:  [email protected]
 * Status：Using online
 * <p>
 * Please note:
 * Must be checked once every time you submit a configuration file is correct!
 * Data is priceless! Accidentally deleted the consequences!
 */


public class WordCountData {

    public static final String[] WORDS = new String[]{
            "To be, or not to be,--that is the question:--",
            "Whether 'tis nobler in the mind to suffer",
            "The slings and arrows of outrageous fortune",
            "Or to take arms against a sea of troubles,",
            "And by opposing end them?--To die,--to sleep,--",
            "No more; and by a sleep to say we end",
            "The heartache, and the thousand natural shocks",
            "That flesh is heir to,--'tis a consummation",
            "Devoutly to be wish'd. To die,--to sleep;--",
            "To sleep! perchance to dream:--ay, there's the rub;",
            "For in that sleep of death what dreams may come,",
            "When we have shuffled off this mortal coil,",
            "When we have shuffled off this mortal coil,",
            "When we have shuffled off this mortal coil,",
            "When we have shuffled off this mortal coil,",
            "Must give us pause: there's the respect",
            "Must give us pause: there's the respect",
            "Must give us pause: there's the respect",
            "Must give us pause: there's the respect",
            "That makes calamity of so long life;",
            "For who would bear the whips and scorns of time,",
            "The oppressor's wrong, the proud man's contumely,",
            "The pangs of despis'd love, the law's delay,",
            "The insolence of office, and the spurns",
            "The insolence of office, and the spurns",
            "The insolence of office, and the spurns",
            "That patient merit of the unworthy takes,",
            "When he himself might his quietus make",
            "With a bare bodkin? who would these fardels bear,",
            "To grunt and sweat under a weary life,",
            "But that the dread of something after death,--",
            "The undiscover'd country, from whose bourn",
            "No traveller returns,--puzzles the will,",
            "And makes us rather bear those ills we have",
            "Than fly to others that we know not of?",
            "Thus conscience does make cowards of us all;",
            "And thus the native hue of resolution",
            "Is sicklied o'er with the pale cast of thought;",
            "And enterprises of great pith and moment,",
            "With this regard, their currents turn awry,",
            "And lose the name of action.--Soft you now!",
            "The fair Ophelia!--Nymph, in thy orisons",
            "Be all my sins remember'd."
    };

    public static DataSet<String> getDefaultTextLineDataSet(ExecutionEnvironment env) {
        return env.fromElements(WORDS);
    }
}

2.2.2.3 运行WordCount.java

查看统计结果：

3 添加监控

Grafana +Prometheus

3.1 安装 Grafana

Ambari 自带 Grafana 所以只需要配置即可。

4 FAQ

4.1 ambari安装flink能在生产上使用吗？

开发service 的作者不建议在生产上使用，但是目前大部分公司都已在生产上使用过了。

4.2 启动flink的方式

cd ${FLINK_HOME}

后台运行：

yarn-session.sh -n 1 -s 1 -jm 768 -tm 1024 -qu default -nm flinkapp-from-ambari -d >> /var/log/flink/flink-test.log

./bin/flink run --jobmanager yarn-cluster --yarnqueue offline --yarnjobManagerMemory 1024 --yarncontainer 2 --yarntaskManagerMemory 1024 --yarnslots 3 ./examples/batch/WordCount.jar --input hdfs:///user/hdfs/demo/data/wc.txt --output hdfs:///user/hdfs/demo/result/wc

5 参考文档

编译flink:

https://community.hortonworks.com/articles/2659/exploring-apache-flink-with-hdp.html

http://doc.flink-china.org/1.1.0/setup/building.html

Flink 教程：https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/setup_quickstart.html

已编译的：

http://www.gtlib.gatech.edu/pub/apache/flink/flink-1.6.0/

http://www.us.apache.org/dist/flink/flink-1.6.0/

安装参考：https://community.hortonworks.com/articles/2659/exploring-apache-flink-with-hdp.html

run on yarn : https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/deployment/yarn_setup.html

Flink样例：

https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/batch/examples.html#word-count

https://ci.apache.org/projects/flink/flink-docs-release-1.6/examples/

Flink培训：http://training.data-artisans.com/

Flink参考项目：https://github.com/highfei2011/flink-training-exercises

Flink metrics：https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/metrics.html#system-metrics

Grafana plugns: https://grafana.com/dashboards/5151

Flink 配置：https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html

highfei2011 博客专家

发布了508 篇原创文章 · 获赞 613 · 访问量 201万+

他的留言板关注

[Flink课程]---- 9.1 使用Ambari 搭建Flink 集群