Flink's sink combat two: kafka

This article is the second in a series of "Flink's sink combat". The previous article "One of Flink's sink combat: a preliminary exploration" has a basic understanding of sink. This chapter will experience the operation of sinking data to kafka

Version and environment preparation

The environment and version of this actual combat are as follows:

  1. JDK:1.8.0_211
  2. Strong : 1.9.2
  3. Maven:3.6.0
  4. Operating system: macOS Catalina 10.15.3 (MacBook Pro 13-inch, 2018)
  5. IDEA:2018.3.5 (Ultimate Edition)
  6. Kafka:2.4.0
  7. Zookeeper:3.5.5

Please ensure that the above environment and services are ready;

Source code download

If you do n’t want to write code, the source code of the entire series can be downloaded from GitHub. The address and link information are shown in the following table (https://github.com/zq2599/blog_demos):

name link Remarks
Project Homepage https://github.com/zq2599/blog_demos The project's homepage on GitHub
git repository address (https) https://github.com/zq2599/blog_demos.git The warehouse address of the project source code, https protocol
git repository address (ssh) [email protected]:zq2599/blog_demos.git The warehouse address of the project source code, ssh protocol

There are multiple folders in this git project, the application of this chapter is under the flinksinkdemo folder, as shown in the red box in the figure below:
Insert picture description here
preparation is complete, start development;

Ready to work

Before formal coding, go to the official website to check the relevant information to understand the basic situation:

  1. Address: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/kafka.html
  2. The kafka I used here is version 2.4.0. Find the corresponding libraries and classes in the official documentation, as shown in the red box below:
    Insert picture description here

kafka ready

  1. Create a topic named test006 with four partitions, refer to the command:
./kafka-topics.sh \
--create \
--bootstrap-server 127.0.0.1:9092 \
--replication-factor 1 \
--partitions 4 \
--topic test006
  1. Consume test006 messages in the console, refer to the command:
./kafka-console-consumer.sh \
--bootstrap-server 127.0.0.1:9092 \
--topic test006
  1. At this time, if a message comes in on the topic, it will be output on the console;
  2. Next start coding;

Create project

  1. Create a flink project with the maven command:
mvn \
archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.9.2
  1. When prompted, groupid input com.bolingcavalry , artifactId input flinksinkdemo , it can create a maven project;
  2. Add kafka dependency library in pom.xml:
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-kafka_2.11</artifactId>
  <version>1.9.0</version>
</dependency>
  1. After the project is created, start writing the code for the flink task;

Sink for sending string messages

First try to send a message of type string:

  1. Create an implementation class for the KafkaSerializationSchema interface. This latter class should be used as a parameter for creating sink objects:
package com.bolingcavalry.addsink;

import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.nio.charset.StandardCharsets;

public class ProducerStringSerializationSchema implements KafkaSerializationSchema<String> {

    private String topic;

    public ProducerStringSerializationSchema(String topic) {
        super();
        this.topic = topic;
    }

    @Override
    public ProducerRecord<byte[], byte[]> serialize(String element, Long timestamp) {
        return new ProducerRecord<byte[], byte[]>(topic, element.getBytes(StandardCharsets.UTF_8));
    }
}
  1. Create task class KafkaStrSink, please pay attention to the parameters of FlinkKafkaProducer object, FlinkKafkaProducer.Semantic.EXACTLY_ONCE means strictly once:
package com.bolingcavalry.addsink;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class KafkaStrSink {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //并行度为1
        env.setParallelism(1);

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.50.43:9092");

        String topic = "test006";
        FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(topic,
                new ProducerStringSerializationSchema(topic),
                properties,
                FlinkKafkaProducer.Semantic.EXACTLY_ONCE);

        //创建一个List,里面有两个Tuple2元素
        List<String> list = new ArrayList<>();
        list.add("aaa");
        list.add("bbb");
        list.add("ccc");
        list.add("ddd");
        list.add("eee");
        list.add("fff");
        list.add("aaa");

        //统计每个单词的数量
        env.fromCollection(list)
           .addSink(producer)
           .setParallelism(4);

        env.execute("sink demo : kafka str");
    }
}
  1. Use mvn command to compile and build, get the file flinksinkdemo-1.0-SNAPSHOT.jar in the target directory ;
  2. Submit flinksinkdemo-1.0-SNAPSHOT.jar on the flink web page, and formulate the execution class, as shown below:
    Insert picture description here
  3. After the submission is successful, if flink has four available slots, the task will be executed immediately, and the message will be received at the terminal that consumes the kafak message, as shown below:
    Insert picture description here
  4. The task execution status is as follows:
    Insert picture description here

Sink to send object message

Let's try how to send the message of the object type. Here the object selects the commonly used Tuple2 object:

  1. Create an implementation class of the KafkaSerializationSchema interface, which will be used as an input parameter of the sink object. Please pay attention to the comment in the code to catch the exception: printStackTrace () in production environment
package com.bolingcavalry.addsink;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;
import javax.annotation.Nullable;

public class ObjSerializationSchema implements KafkaSerializationSchema<Tuple2<String, Integer>> {

    private String topic;
    private ObjectMapper mapper;

    public ObjSerializationSchema(String topic) {
        super();
        this.topic = topic;
    }

    @Override
    public ProducerRecord<byte[], byte[]> serialize(Tuple2<String, Integer> stringIntegerTuple2, @Nullable Long timestamp) {
        byte[] b = null;
        if (mapper == null) {
            mapper = new ObjectMapper();
        }
        try {
            b= mapper.writeValueAsBytes(stringIntegerTuple2);
        } catch (JsonProcessingException e) {
            // 注意,在生产环境这是个非常危险的操作,
            // 过多的错误打印会严重影响系统性能,请根据生产环境情况做调整
            e.printStackTrace();
        }
        return new ProducerRecord<byte[], byte[]>(topic, b);
    }
}
  1. Create flink task class:
package com.bolingcavalry.addsink;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class KafkaObjSink {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //并行度为1
        env.setParallelism(1);

        Properties properties = new Properties();
        //kafka的broker地址
        properties.setProperty("bootstrap.servers", "192.168.50.43:9092");

        String topic = "test006";
        FlinkKafkaProducer<Tuple2<String, Integer>> producer = new FlinkKafkaProducer<>(topic,
                new ObjSerializationSchema(topic),
                properties,
                FlinkKafkaProducer.Semantic.EXACTLY_ONCE);

        //创建一个List,里面有两个Tuple2元素
        List<Tuple2<String, Integer>> list = new ArrayList<>();
        list.add(new Tuple2("aaa", 1));
        list.add(new Tuple2("bbb", 1));
        list.add(new Tuple2("ccc", 1));
        list.add(new Tuple2("ddd", 1));
        list.add(new Tuple2("eee", 1));
        list.add(new Tuple2("fff", 1));
        list.add(new Tuple2("aaa", 1));

        //统计每个单词的数量
        env.fromCollection(list)
            .keyBy(0)
            .sum(1)
            .addSink(producer)
            .setParallelism(4);
        
        env.execute("sink demo : kafka obj");
    }
}
  1. Compile and build like the previous task, submit the jar to flink, and specify the execution class to be com.bolingcavalry.addsink.KafkaObjSink ;
  2. The console output of consuming kafka messages is as follows:
    Insert picture description here
  3. The execution status can be seen on the web page as follows:
    Insert picture description here
    At this point, flink will send the calculation result as a kafka message and the actual battle is completed. I hope to provide you with a reference. In the next chapter, we will continue to experience the officially provided sink capability;

Welcome to pay attention to my public number: programmer Xinchen

Insert picture description here

Published 376 original articles · praised 986 · 1.28 million views

Guess you like

Origin blog.csdn.net/boling_cavalry/article/details/105598224