【Flink入门】Flink读取Kafka数据Demo

flink算子操作主要分为三大部分:source(数据读取)、transform(数据处理)、sink(数据输出),这篇博客简单聊聊flink读取kafka数据在控制台打印的Demo。废话不多说,直接上代码演示。

pom.xml文件内容

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.fuyun</groupId>
    <artifactId>flinkLearning</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.12.0</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
            <!-- provided在这表示此依赖只在代码编译的时候使用,运行和打包的时候不使用 -->
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>${flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!--alibaba fastjson-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.51</version>
        </dependency>
    </dependencies>
    </dependencies>
</project>

SourceTest代码

package com.fuyun.flink

import java.util.Properties

import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer

object SourceTest {
    
    
  def main(args: Array[String]): Unit = {
    
    
    // 创建流处理环境
    val senv = StreamExecutionEnvironment.getExecutionEnvironment

    // kafka配置信息
    val properties = new Properties()
    properties.setProperty("bootstrap.servers", "bigdata-training.fuyun.com:9092")
    properties.setProperty("group.id", "test")
    // 读取kafka数据
    val kafkaStream = senv
      .addSource(new FlinkKafkaConsumer[String]("flinkSource", new SimpleStringSchema(), properties))

    // 读取的数据简单处理
    val resultDataStream = kafkaStream//.flatMap(_.split("\\s+"))
        .filter(_.nonEmpty)

    // 打印控制台
    resultDataStream.print()
    // 驱动执行程序,传入程序名称
    senv.execute("source test")
  }
}

这个FlinkKafkaConsumer 构造函数有3 个参数,第一个参数定义的是读入的目标topic名。

第二个参数是一个DeserializationSchema 或KeyedDeserializationSchema。Kafka中的消息是以纯字节消息存储,所以需要被反序列化为Java或Scala 对象。在上例中用到的SimpleStringSchema 是一个内置的DeserializationSchema,可以将字节数字反序列化为一个String。Flink也提供了对Apache Avro以及基于text的JSON编码的实现。我们也可以通过实现DeserializationSchema 与KeyedDeserializationSchema 这两个公开的接口,用于实现自定义的反序列化逻辑。

第三个参数是一个Properties对象,用于配置Kafka的客户端。此对象至少要包含两个条目,“bootstrap.servers” 与"group.id"。

程序执行流程:

首先启动Zookeeper
${ZOOKEEPER_HOME}/bin/zkServer start

启动kafka
${KAFKA_HOME}/bin/kafka-server-start.sh -daemon config/server.properties

创建topic,副本数目小于等于Brokers数目
${KAFKA_HOME}/bin/kafka-topics.sh --create --zookeeper bigdata-training.fuyun.com:2181/kafka --replication-factor 1 --partitions 1 --topic flinkSource

参数说明:
--create:创建topic
--zookeeper:zookeeper地址
--replication-factor:副本数
--partition:分区数
topic:topic名称

查看topic是否创建成功
${KAFKA_HOME}/bin/kafka-topics.sh --list --zookeeper bigdata-training.fuyun.com:2181/kafka

  1. 通过Console向Topic发送数据
    ${KAFKA_HOME}/bin/kafka-console-producer.sh --broker-list bigdata-training.fuyun.com:9092 --topic flinkSource

IDEA启动程序,在虚拟机kafka的Console中输入单词,可以在IDEA上输出。
在这里插入图片描述

  1. 通过本地程序造一些测试数据

创建一个PlayStart类

package com.fuyun.flink.model;

import java.util.Map;

public class PlayStart {
    
    
    public String userID;
    public long timestamp;
    public Map<String, Object> fields;
    public Map<String, String> tags;

    public PlayStart() {
    
    
    }

    public PlayStart(String userID, long timestamp, Map<String, Object> fields, Map<String, String> tags) {
    
    
        this.userID = userID;
        this.timestamp = timestamp;
        this.fields = fields;
        this.tags = tags;
    }

    @Override
    public String toString() {
    
    
        return "PlayStart{" +
                "userID='" + userID + '\'' +
                ", timestamp='" + timestamp + '\'' +
                ", fields=" + fields +
                ", tags=" + tags +
                '}';
    }

    public String getUserID() {
    
    
        return userID;
    }

    public long getTimestamp() {
    
    
        return timestamp;
    }

    public Map<String, Object> getFields() {
    
    
        return fields;
    }

    public Map<String, String> getTags() {
    
    
        return tags;
    }

    public void setUserID(String userID) {
    
    
        this.userID = userID;
    }

    public void setTimestamp(long timestamp) {
    
    
        this.timestamp = timestamp;
    }

    public void setFields(Map<String, Object> fields) {
    
    
        this.fields = fields;
    }

    public void setTags(Map<String, String> tags) {
    
    
        this.tags = tags;
    }
}

创建KafkaUtils类向kafka对应的topic中发送数据

package com.fuyun.flink.utils;

import com.alibaba.fastjson.JSON;
import com.fuyun.flink.model.PlayStart;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Random;

public class KafkaUtils {
    
    
    public static final String broker_list = "bigdata-training.fuyun.com:9092";
    public static final String topic = "flinkSource";  // kafka topic,Flink 程序中需要和这个统一
    public static void writeToKafka() throws InterruptedException {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", broker_list);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //key 序列化
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //value 序列化
        KafkaProducer producer = new KafkaProducer<String, String>(props);

        PlayStart playStart = new PlayStart();
        playStart.setTimestamp(System.currentTimeMillis());
        int user = new Random().nextInt(10000000);
        playStart.setUserID("user"+user);
        Map<String, String> tags = new HashMap<>();
        Map<String, Object> fields = new HashMap<>();

        int ip = new Random().nextInt(100000)%255;
        tags.put("user_agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");
        tags.put("ip", "192.168.99."+ip);

        int program_id = new Random().nextInt(100000);
        int content_id = new Random().nextInt(100);
        int duration = new Random().nextInt(1000);
        fields.put("program_id", program_id);
        fields.put("content_id", program_id+""+content_id);
        fields.put("play_duration", duration);

        playStart.setTags(tags);
        playStart.setFields(fields);

        ProducerRecord record = new ProducerRecord<String, String>(topic, null, null, JSON.toJSONString(playStart));
        producer.send(record);
        System.out.println("发送数据: " + JSON.toJSONString(playStart));

        producer.flush();
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        while (true) {
    
    
            Thread.sleep(3000);
            writeToKafka();
        }
    }
}

在IDEA本地启动KafkaUtils程序和SourceTest
在这里插入图片描述

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/lz6363/article/details/113994011