A complete case of big data stream computing Flink end-to-end

Chapter 1 Introduction

In the previous article, I introduced the preparation of Flink related environment, and completed the establishment of a simple Flink development environment; this article introduces a complete end-to-end case covering Flink computing : client=>Web API service= >Kafka=>Flink=>MySQL. This time we still take Flink Table API/SQL as an example, and deploy it in docker-compose. (Only the key part of the code is given in the article. For the complete code, please refer to the follow-up github uploaded by the author).

 

Chapter 2 docker-compose

2.1 Add docker-compose.yml file

version: '2'
services:
  jobmanager:
    image: zihaodeng/flink:1.11.1
    volumes:
      - D:/21docker/flinkDeploy:/opt/flinkDeploy
    hostname: "jobmanager"
    expose:
      - "6123"
    ports:
      - "4000:4000"
    command: jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
  taskmanager:
    image: zihaodeng/flink:1.11.1
    volumes:
      - D:/21docker/flinkDeploy:/opt/flinkDeploy
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: taskmanager
    links:
      - jobmanager:jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
  zookeeper:
    container_name: zookeeper
    image: zookeeper:3.6.1
    ports:
      - "2181:2181"
  kafka:
    container_name: kafka
    image: wurstmeister/kafka:2.12-2.5.0
    volumes:
      - D:/21docker/var/run/docker.sock:/var/run/docker.sock
    ports:
      - "9092:9092"
    depends_on:
      - zookeeper
    environment:
      #KAFKA_ADVERTISED_HOST_NAME: kafka
      HOSTNAME_COMMAND: "route -n | awk '/UG[ \t]/{print $$2}'"
      KAFKA_CREATE_TOPICS: "order-log:1:1"
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      #KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:9092
      #KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
  mysql:
    image: mysql:5.7
    container_name: mysql
    volumes:
      - D:/21docker/mysql/data/db:/var/lib/mysql/
      - D:/21docker/mysql/mysql-3346.sock:/var/run/mysql.sock
      - D:/21docker/mysql/data/conf:/etc/mysql/conf.d
    ports:
      - 3306:3306
    command:
      --default-authentication-plugin=mysql_native_password
      --lower_case_table_names=1
    environment:
      MYSQL_ROOT_PASSWORD: 123456
      TZ: Asia/Shanghai

2.2 docker-compose start

$ docker-compose up -d

Check operation

In this section, docker-compose is ready, and it will be very convenient to start the working environment later. Next, start to prepare the corresponding program.

 

Chapter 3 Creating WebApi Project

3.1 Create WebApi (Restful API) interface project

Use springboot to quickly build an API project. The author here uses the Restful Api interface format; part of the code is as follows (see the author github for the complete code).

Create a Post interface for the client to call

@RestController
@RequestMapping("/order")
public class OrderController {

    @Autowired
    private Sender sender;

    @PostMapping
    public String insertOrder(@RequestBody Order order) {
        sender.producerKafka(order);
        return "{\"code\":0,\"message\":\"insert success\"}";
    }
}

Create the Sender class and send data to Kafka 

public class Sender {
    @Autowired
    private KafkaTemplate<String,String> kafkaTemplate;

    private static Random rand = new Random();

    public void producerKafka(Order order){
        order.setPayTime(String.valueOf(new Timestamp(System.currentTimeMillis()+ rand.nextInt(100))));//EventTime
        kafkaTemplate.send("order-log", JSON.toJSONString(order));
    }
}

 

Chapter 4 Creating Flink Jobs

Here, Flink Table API/SQL is used to implement a tumble window calculation: the data read in Kafka is written into MySQL after the window calculation is summarized.

public class Kafka2MysqlByEnd2End {
    public static void main(String[] args) throws Exception {
        // Kafka source
        String sourceSQL="CREATE TABLE order_source (\n" +
                "   payTime VARCHAR,\n" +
                "   rt as TO_TIMESTAMP(payTime),\n" +
                "   orderId BIGINT,\n" +
                "   goodsId INT,\n" +
                "   userId INT,\n" +
                "   amount DECIMAL(23,10),\n" +
                "   address VARCHAR,\n" +
                "   WATERMARK FOR rt as rt - INTERVAL '2' SECOND\n" +
                " ) WITH (\n" +
                "   'connector' = 'kafka-0.11',\n" +
                "   'topic'='order-log',\n" +
                "   'properties.bootstrap.servers'='kafka:9092',\n" +
                "   'format' = 'json',\n" +
                "   'scan.startup.mode' = 'latest-offset'\n" +
                ")";

        //Mysql sink
        String sinkSQL="CREATE TABLE order_sink (\n" +
                "   goodsId BIGINT,\n" +
                "   goodsName VARCHAR,\n" +
                "   amount DECIMAL(23,10),\n" +
                "   rowtime TIMESTAMP(3),\n" +
                "   PRIMARY KEY (goodsId) NOT ENFORCED\n" +
                " ) WITH (\n" +
                "   'connector' = 'jdbc',\n" +
                "   'url' = 'jdbc:mysql://mysql:3306/flinkdb?characterEncoding=utf-8&useSSL=false',\n" +
                "   'table-name' = 'good_sale',\n" +
                "   'username' = 'root',\n" +
                "   'password' = '123456',\n" +
                "   'sink.buffer-flush.max-rows' = '1',\n" +
                "   'sink.buffer-flush.interval' = '1s'\n" +
                ")";

        // 创建执行环境
        EnvironmentSettings settings = EnvironmentSettings
                .newInstance()
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        //TableEnvironment tEnv = TableEnvironment.create(settings);
        StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();

        //sEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, Time.of(1, TimeUnit.SECONDS)));
        //sEnv.enableCheckpointing(1000);
        //sEnv.setStateBackend(new FsStateBackend("file:///tmp/chkdir",false));

        StreamTableEnvironment tEnv= StreamTableEnvironment.create(sEnv,settings);

        Configuration configuration = tEnv.getConfig().getConfiguration();
        //设置并行度为1
        configuration.set(CoreOptions.DEFAULT_PARALLELISM, 1);

        //注册souuce
        tEnv.executeSql(sourceSQL);
        //注册sink
        tEnv.executeSql(sinkSQL);

        //UDF 在作业中定义UDF
        tEnv.createFunction("exchangeGoods", ExchangeGoodsName.class);

        String strSQL=" SELECT " +
                "   goodsId," +
                "   exchangeGoods(goodsId) as goodsName, " +
                "   sum(amount) as amount, " +
                "   tumble_start(rt, interval '5' seconds) as rowtime " +
                " FROM order_source " +
                " GROUP BY tumble(rt, interval '5' seconds),goodsId";

        //查询数据 插入数据
        tEnv.sqlQuery(strSQL).executeInsert("order_sink");

        //执行作业
        tEnv.execute("统计各商品营业额");
    }
}

 

Chapter 5 Creating MySQL Database Tables

5.1 Build a library

mysql> create database flinkdb;

5.2 Create a table

mysql> create table good_sale(goodsId bigint primary key, goodsName varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci, amount decimal(23,10), rowtime timestamp);

Note: The primary key is the same as the primary key defined by the flink job ddl. The primary key is defined in the flink ddl, and the connector is in the upsert mode (Flink will determine whether to insert a new row or update an existing row according to the primary key to ensure idempotence), if not The definition is append mode (Insert insert mode, if the primary key conflicts, the insert will fail).

At this point, the environment and code have been prepared, and then start to run verification.

 

Chapter 6 Running Jobs

6.1 Local debugging verification

6.1.1 Start the cluster

Using the method of Chapter 2, start our prepared cluster environment:

$ docker-compose up -d

 

6.1.2 Start job

Start the WebApi project and Flink job project in the local IDEA .

6.1.3 Initiate a request

Here an http tool is used to simulate the client to initiate a request and send data in json format:

6.1.4 View the running status of flink jobs

View the log directly in the IDEA console

6.1.5 View results

View the results in MySQL

 

At this step, you can see that the job debugged by the local IDE has successfully pulled the data from the source, and after the calculation of flink, the data is written to MySQL. Next, we will complete the job submission to the cluster to run.

6.2 Cluster operation verification

6.2.1 Packaging

Packaging the WebAPI project jar package

Go to the lotemall-webapi (author's WebApi project) directory to execute the packaging command

mvn clean package -DskipTests

Prepare the Dockerfile and place it in the same directory as the jar package. Run the following command to create Images

$ docker build -t lotemall-webapi .

Package the Flink job jar package

Go to the flink-kafka2mysql (author's Flink job project) directory to execute the packaging command

Note that because we submit to the container for use, the configuration IP of the source and sink connection should be changed from the original localhost to the container name

mvn clean package -DskipTests

Put the packaged jar package in the mount directory of docker

6.2.2 Start the cluster

Using the method of Chapter 2, start our prepared cluster environment: 

$ docker-compose up -d

6.2.3 Start WebAPI project

Run the docker run command to start the WebAPI project

$ docker run --link kafka:kafka --net flink-online_default -e TZ=Asia/Shanghai -d -p 8090:8080 lotemall-webapi

6.2.4 Run Flink job

Enter the flink jobmanager container and run the job

$ docker exec -it flink-online_jobmanager_1 /bin/bash
$ bin/flink run -c sql.Kafka2MysqlByEnd2End /opt/flinkDeploy/flink-kafka2mysql-0.0.1.jar -d

Or submit the job jar package through the web interface

After the job is submitted, check the Flink web interface, you can see that the job we submitted has started running

6.2.5 Initiate a request

Also use the http tool to simulate the client to initiate a request and send data in json format:

6.2.6 View the running status of Flink jobs

Submit to the cluster to run, you can directly view the job running status on the Flink web interface

 View the generated Watermark

6.2.7 View results 

View the results in MySQL

The record whose goodsName is "cake" in the first line is the result of Flink calculation on this cluster. 

 

In summary , this article introduces the convenient deployment of clusters using docker-compose and completes a complete Flink stream computing case.

Guess you like

Origin blog.csdn.net/dzh284616172/article/details/109251225