Share: Basic API Flink

  Flink a DataSet with DataStream represent data sets. DateSet for batch processing, data representing limited; DataStream for the data stream, data representative of unbounded. Data in the dataset is not changed, that the elements which can not be added or deleted. We created a data source DataSet or DataStream, the data set operation generates a new dataset map, filter, etc. conversion (Transform) operation.

  Obtaining execution environment

  Create the input data

  Conversion operation performed on the data sets (hereinafter collectively referred to: transform)

  The output data

  Triggers to enforce our will introduce the basic API Flink written procedures involved below.

  Input and output

  First, you need to obtain execution environment, Flink provides the following three ways:

  getExecutionEnvironment()createLocalEnvironment()createRemoteEnvironment(Stringhost,intport,String...jarFiles)

  The first case study to create the following code execution environment

  Batch:

  

 

  Stream processing:

  

 

  words.txt contents of the file:

  

 

  The above code creates an execution environment, while taking advantage of env created input source. You may call the print output method on the data set data to the console, of course, other methods may be invoked writeAsText outputs data to other media. The above processing of the last line of code that calls the stream execute a method, the stream processing method need to explicitly call trigger the execution of the program.

  There are two ways to run the above code, one is executed directly in the IDE, run like a normal Java program, Flink will start a local program execution environment. Another way is to be packaged, submitted to Flink cluster operation. The above example of the basic skeleton of a Flink contains the basic program, but not the data set more transform operations Below we briefly basic transform operation.

  map operation

  MapReduce map operation here is similar to the map, the data parsing process. Examples are as follows

  Batch:

  

 

  Streaming

  

 

  Here in addition to batch processing and the types of data streams of different sets, the rest are the same wording. Each map is to become a word (word 1) tuple. Similar transform and map as well as filter, filter unwanted record, readers can try on their own.

  Specified key

  Big data processing often need to be processed in accordance with a dimension, which is the need to specify the key. GroupBy using the specified key in the DataSet, in use keyBy DataStream specified key. Here we introduce an example to keyBy.

  Flink data model is not based on the key-value, key is virtual, it can be seen as a function defined on the data.

  In Tuple defined key

  KeyedStreamTuple2String, Integer, Tuple keyed = words.keyBy (0); // 0 represents the first element (tuple) in Tuple2

  KeyedStreamTuple2String, Integer, Tuple keyed = words.keyBy (0,1); // 0,1 representative of the first tuple element and a second key \

  For nested tuple

  DataStreamTuple3Tuple2Integer, Float,String,Long ds;

  ds.keyBy (0) will put Tuple2Integer, Float as a whole key.

  With specified key field expression

  

 

  Herein designated as WC object field word key. Field expression syntax is as follows:

  Java object field name used as key, as an example

  (Starting from 0) the specified key type for use Tuple field names (f0, F1, ...) or offset, f0, for example, representatives of Tuple 5 and the first field and the sixth field

  Java 对象和 Tuple 嵌套的字段作为 key,例如:f1.user.zip 表示 Tuple 第二个字段中的 user 对象中的 zip 字段作为 key

  通配符 * 代表选择所有类型作为 key字段表达式的举例

  

 

  

  count: WC类的 count 字段

  complex: complex 的所有字段(递归地)

  complex.word.f2: ComplexNestedClass 类中 word 三元组的第三个字段

  complex.hadoopCitizen: complex类中的 hadoopCitizen 字段使用 Key Selector 指定 key

  通过 key 选择器函数来制定 key,key 选择器的输入为每个元素,输出为指定的 key,例子如下

  

 

  可以看到实现的效果与 keyBy(0) 是一样的。

  以上便是 Flink 指定 key 的方法。

  总结

  这篇文章主要介绍了 Flink 程序的基本骨架。获得环境、创建输入源、对数据集做 transform 以及输出。由于数据处理经常会按照不同维度(不同的 key)进行统计,因此,本篇内容重点介绍了 Flink 中如何指定 key。后续将会继续介绍 Flink API 的使用。

  

Guess you like

Origin blog.csdn.net/qianfeng_dashuju/article/details/94738777