Kafka Streams development of entry (2)

background

Previous We introduce the message into the operating map Kafka Streams, today we give another classic use of the conversion operation filter. Still is a combination of a specific example of a developed introduction.

Demo Function

Benpian filter usage presentation, i.e. for each message in real time according to a given filter processing filters or logic. Enter the topic today used message format is as follows:

{"name": "George R. R. Martin", "title": "A Song of Ice and Fire"}

{"name": "C.S. Lewis", "title": "The Silver Chair"}

We intend to filter out the name is "George RR Martin" of all messages sent on the output topic.

Initialize the project

Create a project directory:

mkdir filter-streams
cd filter-streams/

Configuration Item

Creating build.gradle file in the directory filter-streams, as follows:

buildscript {

    repositories {
        jcenter()
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:4.0.2'
    }
}

plugins {
    id 'java'
    id "com.google.protob" version "0.8.10"
}
apply plugin: 'com.github.johnrengelman.shadow'


repositories {
    mavenCentral()
    jcenter()

    maven {
        url 'http://packages.confluent.io/maven'
    }
}

group 'huxihx.kafkastreams'

sourceCompatibility = 1.8
targetCompatibility = '1.8'
version = '0.0.1'

dependencies {
    implementation 'com.google.protobuf:protobuf-java:3.0.0'
    implementation 'org.slf4j:slf4j-simple:1.7.26'
    implementation 'org.apache.kafka:kafka-streams:2.3.0'
    implementation 'com.google.protobuf:protobuf-java:3.9.1'

    testCompile group: 'junit', name: 'junit', version: '4.12'
}

protobuf {
    generatedFilesBaseDir = "$projectDir/src/"
    protoc {
        artifact = 'com.google.protobuf:protoc:3.0.0'
    }
}

jar {
    manifest {
        attributes(
                'Class-Path': configurations.compile.collect { it.getName() }.join(' '),
                'Main-Class': 'huxihx.kafkastreams.FilteredStreamsApp'
        )
    }
}

shadowJar {
    archiveName = "kstreams-transform-standalone-${version}.${extension}"
}

Then execute the following command to download the Gradle wrapper package:

gradle wrapper

After creating a named configuration in the movie-streams directory folder to save our configuration file parameters:

mkdir configuration

Create a file called dev.properties of:

application.id=filtering-app
bootstrap.servers=localhost:9092

input.topic.name=publications
input.topic.partitions=1
input.topic.replication.factor=1

output.topic.name=filtered-publications
output.topic.partitions=1output.topic.replication.factor=1

Create a message Schema

The next step is to create a schema for incoming and outgoing messages. Because we just do filter today, so the same format input and output, you can just create a schema. First, the file command in the filter-streams to create a folder to save schema:

mkdir -p src/main/proto

After creating publication.proto file, as follows:

syntax = "proto3";

huxihx.kafkastreams.proto package;

message Publication {
string name = 1;
string title = 2;
}

After saving the file to run the following command to compile the corresponding Java classes:

./gradlew build

At this point, you should see the generated Java class in src / main / java / huxihx / kafkastreams / proto: PublicationOuterClass.

Creating Serdes

This step Serdes and last article of the same, so will not go directly on the code:

mkdir -p src/main/java/huxihx/kafkastreams/serdes

The development of the main flow

Creating FilteredStreamsApp.java file in src / main / java / huxihx / kafkastreams:

package huxihx.kafkastreams;

import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerdes;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Produced;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.CountDownLatch;

public class FilteredStreamsApp {

    private Properties buildStreamsProperties(Properties envProps) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, envProps.getProperty("application.id"));
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, envProps.getProperty("bootstrap.servers"));
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        return props;
    }

    private void preCreateTopics(Properties envProps) throws Exception {
        Map<String, Object> config = new HashMap<>();
        config.put("bootstrap.servers", envProps.getProperty("bootstrap.servers"));
        try (AdminClient client = AdminClient.create(config)) {
            Set<String> existingTopics = client.listTopics().names().get();

            List<NewTopic> topics = new ArrayList<>();
            String inputTopic = envProps.getProperty("input.topic.name");
            if (!existingTopics.contains(inputTopic)) {
                topics.add(new NewTopic(inputTopic,
                        Integer.parseInt(envProps.getProperty("input.topic.partitions")),
                        Short.parseShort(envProps.getProperty("input.topic.replication.factor"))));

            }

            String outputTopic = envProps.getProperty("output.topic.name");
            if (!existingTopics.contains(outputTopic)) {
                topics.add(new NewTopic(outputTopic,
                        Integer.parseInt(envProps.getProperty("output.topic.partitions")),
                        Short.parseShort(envProps.getProperty("output.topic.replication.factor"))));
            }

            client.createTopics(topics);
        }
    }

    private Properties loadEnvProperties(String filePath) throws IOException {
        Properties envProps = new Properties();
        try (FileInputStream input = new FileInputStream(filePath)) {
            envProps.load(input);
        }
        return envProps;
    }

    private Topology buildTopology(Properties envProps, final Serde<PublicationOuterClass.Publication> publicationSerde) {
        final StreamsBuilder builder = new StreamsBuilder();

        final String inputTopic = envProps.getProperty("input.topic.name");
        final String outputTopic = envProps.getProperty("output.topic.name");

        builder.stream(inputTopic, Consumed.with(Serdes.String(), publicationSerde))
                .filter((key, publication) -> "George R. R. Martin".equals(publication.getName()))
                .to(outputTopic, Produced.with(Serdes.String(), publicationSerde));
        return builder.build();
    }

    public static void main(String[] args) throws Exception {
        if (args.length < 1) {
            throw new IllegalArgumentException("Environment configuration file must be specified.");
        }

        FilteredStreamsApp app = new FilteredStreamsApp();
        Properties envProps = app.loadEnvProperties(args[0]);
        Properties streamProps = app.buildStreamsProperties(envProps);

        app.preCreateTopics(envProps);

        Topology topology = app.buildTopology(envProps, new ProtobufSerdes<>(PublicationOuterClass.Publication.parser()));

        final KafkaStreams streams = new KafkaStreams(topology, streamProps);
        final CountDownLatch latch = new CountDownLatch(1);

        Runtime.getRuntime().addShutdownHook(new Thread("streams-jvm-shutdown-hook") {
            @Override
            public void run() {
                streams.close();
                latch.countDown();
            }
        });

        try {
            streams.start();
            latch.await();
        } catch (Exception e) {
            System.exit(1);
        }
        System.exit(0);
    }
}

Producer and Consumer writing tests

In src / main / java / huxihx / kafkastreams / tests / TestProducer.java and TestConsumer.java, contents are as follows:

package huxihx.kafkastreams.tests;

import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerializer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Arrays;
import java.util.List;
import java.util.Properties;

public class TestProducer {

    // 测试输入事件
    private static final List<PublicationOuterClass.Publication> TEST_PUBLICATIONS = Arrays.asList(
            PublicationOuterClass.Publication.newBuilder()
                    .setName("George R. R. Martin").setTitle("A Song of Ice and Fire").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("C.S. Lewis").setTitle("The Silver Chair").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("C.S. Lewis").setTitle("Perelandra").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("George R. R. Martin").setTitle("Fire & Blood").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("J. R. R. Tolkien").setTitle("The Hobbit").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("J. R. R. Tolkien").setTitle("The Lord of the Rings").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("George R. R. Martin").setTitle("A Dream of Spring").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("J. R. R. Tolkien").setTitle("The Fellowship of the Ring").build(),
            PublicationOuterClass.Publication.newBuilder()
                    .setName("George R. R. Martin").setTitle("The Ice Dragon").build());

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("acks", "all");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", new ProtobufSerializer<PublicationOuterClass.Publication>().getClass());

        try (final Producer<String, PublicationOuterClass.Publication> producer = new KafkaProducer<>(props)) {
            TEST_PUBLICATIONS.stream()
                    .map(publication -> new ProducerRecord<String, PublicationOuterClass.Publication>("publications", publication))
                    .forEach(producer::send);
        }
    }
}

package huxihx.kafkastreams.tests;

import com.google.protobuf.Parser;
import huxihx.kafkastreams.proto.PublicationOuterClass;
import huxihx.kafkastreams.serdes.ProtobufDeserializer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;

public class TestConsumer {

    public static void main(String[] args) {
        // 为输出事件构造protobuf deserializer
        Deserializer<PublicationOuterClass.Publication> deserializer = new ProtobufDeserializer<>();
        Map<String, Parser<PublicationOuterClass.Publication>> config = new HashMap<>();
        config.put("parser", PublicationOuterClass.Publication.parser());
        deserializer.configure(config, false);

        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "test-group");
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("auto.offset.reset", "earliest");
        KafkaConsumer<String, PublicationOuterClass.Publication> consumer = new KafkaConsumer<>(props, new StringDeserializer(), deserializer);
        consumer.subscribe(Arrays.asList("filtered-publications"));
        while (true) {
            ConsumerRecords<String, PublicationOuterClass.Publication> records = consumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, PublicationOuterClass.Publication> record : records)
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
        }
    }
}

test

First we build the project, run the following command:

./gradlew shadowJar

Kafka then start the cluster, run after Kafka Streams application:

java -jar build/libs/kstreams-transform-standalone-0.0.1.jar configuration/dev.properties

Then start TestProducer send a test event:

java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestProducer

Kafka Streams verification starts last TestConsumer filtered Publication specified message:

java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestConsumer

.......

offset = 0, key = null, value = name: "George R. R. Martin"
title: "A Song of Ice and Fire"

offset = 1, key = null, value = name: "George R. R. Martin"
title: "Fire & Blood"

offset = 2, key = null, value = name: "George R. R. Martin"
title: "A Dream of Spring"

offset = 3, key = null, value = name: "George R. R. Martin"
title: "The Ice Dragon"

to sum up

Next introduce rekey usage, that is, the value of real-time changes Key messages ~