[flink] Flink introductory tutorial demo

What is flink? Students who are new to this term may find it difficult to understand. Searching online for tutorials is also a set of official words. If you are familiar with stream, it may be easier to understand that it is stream processing. The blogger is also just learning, so he made a simple introduction summary, and the follow-up learning articles will continue to improve

Popular explanation of what is flink and its application scenarios

Flink is a stream processing framework with high performance. In layman's terms, data is converted into streams for processing, which can be executed in multiple processes, and the distributed architecture supports cluster deployment

So what is the actual application scenario? Or a popular example, we can read the content in the text file through flink stream reading, statistics and other operations, which is the most basic operation; we can also monitor the server port, continuously obtain data from the port and process it; we can also use The messages in the message queue are read; in addition, it is no problem to be used in IOT scenarios. For example, if a social networking site wants to count the rankings of likes in real time, it can be processed through flink. In other words, where there is data, it can be processed by flink.

Flink is based on memory, so it is efficient;
like most components, memory is not safe, so there will be a persistent function checkpoint
Flink itself serves big data, so avoiding the risk of downtime can support cluster deployment

Of course, killing a chicken is a sledgehammer. Flink is generally used only when there is a large amount of data.

Flink processing flow and core API

Before that, let's look at the previous generation architecture before the emergence of flink:
insert image description here
batch processing: ordered low-speed
stream processing: out-of-order high-speed
lambda architecture has two sets of processing methods, and the emergence of flink can realize batch stream processing.


Flink's four-layer API

  • Both stream processing and batch processing are based on DataStream and DataSet
  • Early flink batch processing was based on the DataSet API. Starting from version 1.12, batch stream processing can be realized by using DataStream uniformly.
    insert image description here

Quick start of flink code

The following is a quick start to the application of flink in the springboot environment. Be careful not to import the wrong package.
Our demo business scenario is to count the number of occurrences of each word in words.txt.

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.AggregateOperator;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.operators.FlatMapOperator;
import org.apache.flink.api.java.operators.UnsortedGrouping;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.PostConstruct;

/**
 * DataSet API 批处理 (有序 低速)
 *
 */

/**
 * flink 分层api
 *
 *   SQL                          最高层语言
 *   table API                   声明式领域专用语言
 *   DataStream / DataSet API   核心Apis
 *   (流处理和批处理 基于这两者  早期flink批处理都是基于DataSet API  在1.12版本开始 统一使用 DataStream 就可实现批流处理)
 *   有状态流处理                 底层APIs
 */
@RestController
public class DataSetAPIBatchWordCount {
    
    

    @PostConstruct
    public void test() throws Exception {
    
    
        // 1. 创建一个执行环境
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        // 2. 从文件中读取数据
        // 继承自Operator  Operator 继承自DataSet ,  DataSource基于DataSet
        DataSource<String> lineDataSource = env.readTextFile("input/words.txt");

        // 3. 逻辑处理: 将每行数据进行分词 转换成二元组类型
        FlatMapOperator<String, Tuple2<String, Long>> wordAndOneTuple = lineDataSource.flatMap(
                // 将每行打散 放到一个收集器里
                (String line, Collector<Tuple2<String, Long>> out) -> {
    
    
                    // 将一行文本进行分词
                    String[] words = line.split(" ");
                    // 将每个单词转换成二元组分组
                    for (String word : words) {
    
    
                        // 每来一个单词 计数1
                        out.collect(Tuple2.of(word, 1L));
                    }
                    // 因为有泛型擦除 所以需要指定回类型
                }).returns(Types.TUPLE(Types.STRING, Types.LONG));

        // 4. 按照word进行分组 groupBy可以传入索引位置 0表示索引 of(word 0)
        UnsortedGrouping<Tuple2<String, Long>> wordAndOneGroup = wordAndOneTuple.groupBy(0);

        // 5. 分组内 进行累加 1表示索引 of(word 索引0 , 1L 索引1);
        AggregateOperator<Tuple2<String, Long>> sum = wordAndOneGroup.sum(1);

        // 6. 打印输出
        sum.print();


    }


}


import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.PostConstruct;

/**
 * DataStream API 批处理
 * (启动jar包时 指定模式)
 */
@RestController
public class DataStreamAPIBatchWordCount {
    
    

    @PostConstruct
    public void test() throws Exception {
    
    

        // 1. 创建流式的执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 2. 读取文件 (有界流)
        DataStreamSource<String> lineDataStreamSource = env.readTextFile("input/words.txt");

        // 3. 转换计算
        SingleOutputStreamOperator<Tuple2<String, Long>> wordAndOneTuple = lineDataStreamSource.flatMap((String line, Collector<Tuple2<String, Long>> out) -> {
    
    
            String[] words = line.split(" ");
            for (String word : words) {
    
    
                out.collect(Tuple2.of(word, 1L));
            }
        }).returns(Types.TUPLE(Types.STRING, Types.LONG));

        // 4. 分组操作  wordAndOneTuple.keyBy(0) 根据0索引位置分组
        KeyedStream<Tuple2<String, Long>, String> wordAndOneKeyedStream = wordAndOneTuple.keyBy(item -> item.f0);

        // 5. 求和
        SingleOutputStreamOperator<Tuple2<String, Long>> sum = wordAndOneKeyedStream.sum(1);

        // 6. 打印
        sum.print();

        // 7. 启动执行 上面步骤只是定义了流的执行流程
        env.execute();

        // 数字表示子任务编号 (默认是cpu的核心数 同一个词会出现在同一个子任务上进行叠加)
//        3> (java,1)
//        9> (test,1)
//        5> (hello,1)
//        3> (java,2)
//        5> (hello,2)
//        9> (test,2)
//        9> (world,1)
//        9> (test,3)


    }
}

The text file is located in the input directory of the root directory

insert image description here

test
hello test
world
hello java
java
test

Run: start the main method in the application


Important concepts of flink

JobManger
TaskManger

JobManger is a scheduling center that collects client data into tasks and distributes them to TaskManger for execution.
TaskManger is where tasks are actually executed.
JobManger can be understood as master, TaskManger can be understood as worker (slaver)

Guess you like

Origin blog.csdn.net/qq_36268103/article/details/129304436