[Introduction to Flink] Flink reads Kafka data demo

Flink operator operations are mainly divided into three parts: source (data reading), transform (data processing), sink (data output). This blog briefly talks about the demo of flink reading kafka data and printing on the console. Not much nonsense, go directly to the code demonstration.

pom.xml file content

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.fuyun</groupId>
    <artifactId>flinkLearning</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.12.0</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
            <!-- provided在这表示此依赖只在代码编译的时候使用,运行和打包的时候不使用 -->
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!--alibaba fastjson-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.51</version>
        </dependency>
    </dependencies>
    </dependencies>
</project>

SourceTest code

package com.fuyun.flink

import java.util.Properties

import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer

object SourceTest {
    
    
  def main(args: Array[String]): Unit = {
    
    
    // 创建流处理环境
    val senv = StreamExecutionEnvironment.getExecutionEnvironment

    // kafka配置信息
    val properties = new Properties()
    properties.setProperty("bootstrap.servers", "bigdata-training.fuyun.com:9092")
    properties.setProperty("group.id", "test")
    // 读取kafka数据
    val kafkaStream = senv
      .addSource(new FlinkKafkaConsumer[String]("flinkSource", new SimpleStringSchema(), properties))

    // 读取的数据简单处理
    val resultDataStream = kafkaStream//.flatMap(_.split("\\s+"))
        .filter(_.nonEmpty)

    // 打印控制台
    resultDataStream.print()
    // 驱动执行程序,传入程序名称
    senv.execute("source test")
  }
}

This FlinkKafkaConsumer constructor has 3 parameters. The first parameter defines the name of the target topic to be read.

The second parameter is a DeserializationSchema or KeyedDeserializationSchema. Messages in Kafka are stored as pure byte messages, so they need to be deserialized into Java or Scala objects. The SimpleStringSchema used in the above example is a built-in DeserializationSchema that can deserialize byte numbers into a String. Flink also provides an implementation of Apache Avro and text-based JSON encoding. We can also implement custom deserialization logic by implementing the two public interfaces DeserializationSchema and KeyedDeserializationSchema.

The third parameter is a Properties object, which is used to configure the Kafka client. This object must contain at least two entries, "bootstrap.servers" and "group.id".

Program execution flow:

Start Zookeeper first
${ZOOKEEPER_HOME}/bin/zkServer start

Start kafka
${KAFKA_HOME}/bin/kafka-server-start.sh -daemon config/server.properties

Create topic, the number of replicas is less than or equal to the number of Brokers
${KAFKA_HOME}/bin/kafka-topics.sh --create --zookeeper bigdata-training.fuyun.com:2181/kafka --replication-factor 1 --partitions 1 --topic flinkSource

Parameter description::
--createcreate topic
--zookeeper: zookeeper address
--replication-factor: number of replicas
--partition: number of partitions
topic: topic name

Check whether the topic is created successfully
${KAFKA_HOME}/bin/kafka-topics.sh --list --zookeeper bigdata-training.fuyun.com:2181/kafka

  1. Send data to Topic via Console
    ${KAFKA_HOME}/bin/kafka-console-producer.sh --broker-list bigdata-training.fuyun.com:9092 --topic flinkSource

IDEA starts the program, and enters words in the console of the virtual machine kafka, which can be output on IDEA.
Insert picture description here

  1. Create some test data through local programs

Create a PlayStart class

package com.fuyun.flink.model;

import java.util.Map;

public class PlayStart {
    
    
    public String userID;
    public long timestamp;
    public Map<String, Object> fields;
    public Map<String, String> tags;

    public PlayStart() {
    
    
    }

    public PlayStart(String userID, long timestamp, Map<String, Object> fields, Map<String, String> tags) {
    
    
        this.userID = userID;
        this.timestamp = timestamp;
        this.fields = fields;
        this.tags = tags;
    }

    @Override
    public String toString() {
    
    
        return "PlayStart{" +
                "userID='" + userID + '\'' +
                ", timestamp='" + timestamp + '\'' +
                ", fields=" + fields +
                ", tags=" + tags +
                '}';
    }

    public String getUserID() {
    
    
        return userID;
    }

    public long getTimestamp() {
    
    
        return timestamp;
    }

    public Map<String, Object> getFields() {
    
    
        return fields;
    }

    public Map<String, String> getTags() {
    
    
        return tags;
    }

    public void setUserID(String userID) {
    
    
        this.userID = userID;
    }

    public void setTimestamp(long timestamp) {
    
    
        this.timestamp = timestamp;
    }

    public void setFields(Map<String, Object> fields) {
    
    
        this.fields = fields;
    }

    public void setTags(Map<String, String> tags) {
    
    
        this.tags = tags;
    }
}

Create the KafkaUtils class to send data to the topic corresponding to Kafka

package com.fuyun.flink.utils;

import com.alibaba.fastjson.JSON;
import com.fuyun.flink.model.PlayStart;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Random;

public class KafkaUtils {
    
    
    public static final String broker_list = "bigdata-training.fuyun.com:9092";
    public static final String topic = "flinkSource";  // kafka topic,Flink 程序中需要和这个统一
    public static void writeToKafka() throws InterruptedException {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", broker_list);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //key 序列化
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //value 序列化
        KafkaProducer producer = new KafkaProducer<String, String>(props);

        PlayStart playStart = new PlayStart();
        playStart.setTimestamp(System.currentTimeMillis());
        int user = new Random().nextInt(10000000);
        playStart.setUserID("user"+user);
        Map<String, String> tags = new HashMap<>();
        Map<String, Object> fields = new HashMap<>();

        int ip = new Random().nextInt(100000)%255;
        tags.put("user_agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");
        tags.put("ip", "192.168.99."+ip);

        int program_id = new Random().nextInt(100000);
        int content_id = new Random().nextInt(100);
        int duration = new Random().nextInt(1000);
        fields.put("program_id", program_id);
        fields.put("content_id", program_id+""+content_id);
        fields.put("play_duration", duration);

        playStart.setTags(tags);
        playStart.setFields(fields);

        ProducerRecord record = new ProducerRecord<String, String>(topic, null, null, JSON.toJSONString(playStart));
        producer.send(record);
        System.out.println("发送数据: " + JSON.toJSONString(playStart));

        producer.flush();
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        while (true) {
    
    
            Thread.sleep(3000);
            writeToKafka();
        }
    }
}

Start KafkaUtils program and SourceTest locally in IDEA
Insert picture description here

Insert picture description here

Guess you like

Origin blog.csdn.net/lz6363/article/details/113994011
Recommended