background
Previous We introduce the message into the operating map Kafka Streams, today we give another classic use of the conversion operation filter. Still is a combination of a specific example of a developed introduction.
Demo Function
Benpian filter usage presentation, i.e. for each message in real time according to a given filter processing filters or logic. Enter the topic today used message format is as follows:
{"name": "George R. R. Martin", "title": "A Song of Ice and Fire"}
{"name": "C.S. Lewis", "title": "The Silver Chair"}
We intend to filter out the name is "George RR Martin" of all messages sent on the output topic.
Initialize the project
Create a project directory:
mkdir filter-streams
cd filter-streams/
Configuration Item
Creating build.gradle file in the directory filter-streams, as follows:
buildscript { repositories { jcenter() } dependencies { classpath 'com.github.jengelman.gradle.plugins:shadow:4.0.2' } } plugins { id 'java' id "com.google.protob" version "0.8.10" } apply plugin: 'com.github.johnrengelman.shadow' repositories { mavenCentral() jcenter() maven { url 'http://packages.confluent.io/maven' } } group 'huxihx.kafkastreams' sourceCompatibility = 1.8 targetCompatibility = '1.8' version = '0.0.1' dependencies { implementation 'com.google.protobuf:protobuf-java:3.0.0' implementation 'org.slf4j:slf4j-simple:1.7.26' implementation 'org.apache.kafka:kafka-streams:2.3.0' implementation 'com.google.protobuf:protobuf-java:3.9.1' testCompile group: 'junit', name: 'junit', version: '4.12' } protobuf { generatedFilesBaseDir = "$projectDir/src/" protoc { artifact = 'com.google.protobuf:protoc:3.0.0' } } jar { manifest { attributes( 'Class-Path': configurations.compile.collect { it.getName() }.join(' '), 'Main-Class': 'huxihx.kafkastreams.FilteredStreamsApp' ) } } shadowJar { archiveName = "kstreams-transform-standalone-${version}.${extension}" }
Then execute the following command to download the Gradle wrapper package:
gradle wrapper
After creating a named configuration in the movie-streams directory folder to save our configuration file parameters:
mkdir configuration
Create a file called dev.properties of:
application.id=filtering-app
bootstrap.servers=localhost:9092input.topic.name=publications
input.topic.partitions=1
input.topic.replication.factor=1output.topic.name=filtered-publications
output.topic.partitions=1output.topic.replication.factor=1
Create a message Schema
The next step is to create a schema for incoming and outgoing messages. Because we just do filter today, so the same format input and output, you can just create a schema. First, the file command in the filter-streams to create a folder to save schema:
mkdir -p src/main/proto
After creating publication.proto file, as follows:
syntax = "proto3";
huxihx.kafkastreams.proto package;
message Publication {
string name = 1;
string title = 2;
}
After saving the file to run the following command to compile the corresponding Java classes:
./gradlew build
At this point, you should see the generated Java class in src / main / java / huxihx / kafkastreams / proto: PublicationOuterClass.
Creating Serdes
This step Serdes and last article of the same, so will not go directly on the code:
mkdir -p src/main/java/huxihx/kafkastreams/serdes
The development of the main flow
Creating FilteredStreamsApp.java file in src / main / java / huxihx / kafkastreams:
package huxihx.kafkastreams; import huxihx.kafkastreams.proto.PublicationOuterClass; import huxihx.kafkastreams.serdes.ProtobufSerdes; import org.apache.kafka.clients.admin.AdminClient; import org.apache.kafka.clients.admin.NewTopic; import org.apache.kafka.common.serialization.Serde; import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.StreamsBuilder; import org.apache.kafka.streams.StreamsConfig; import org.apache.kafka.streams.Topology; import org.apache.kafka.streams.kstream.Consumed; import org.apache.kafka.streams.kstream.Produced; import java.io.FileInputStream; import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import java.util.Set; import java.util.concurrent.CountDownLatch; public class FilteredStreamsApp { private Properties buildStreamsProperties(Properties envProps) { Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, envProps.getProperty("application.id")); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, envProps.getProperty("bootstrap.servers")); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass()); return props; } private void preCreateTopics(Properties envProps) throws Exception { Map<String, Object> config = new HashMap<>(); config.put("bootstrap.servers", envProps.getProperty("bootstrap.servers")); try (AdminClient client = AdminClient.create(config)) { Set<String> existingTopics = client.listTopics().names().get(); List<NewTopic> topics = new ArrayList<>(); String inputTopic = envProps.getProperty("input.topic.name"); if (!existingTopics.contains(inputTopic)) { topics.add(new NewTopic(inputTopic, Integer.parseInt(envProps.getProperty("input.topic.partitions")), Short.parseShort(envProps.getProperty("input.topic.replication.factor")))); } String outputTopic = envProps.getProperty("output.topic.name"); if (!existingTopics.contains(outputTopic)) { topics.add(new NewTopic(outputTopic, Integer.parseInt(envProps.getProperty("output.topic.partitions")), Short.parseShort(envProps.getProperty("output.topic.replication.factor")))); } client.createTopics(topics); } } private Properties loadEnvProperties(String filePath) throws IOException { Properties envProps = new Properties(); try (FileInputStream input = new FileInputStream(filePath)) { envProps.load(input); } return envProps; } private Topology buildTopology(Properties envProps, final Serde<PublicationOuterClass.Publication> publicationSerde) { final StreamsBuilder builder = new StreamsBuilder(); final String inputTopic = envProps.getProperty("input.topic.name"); final String outputTopic = envProps.getProperty("output.topic.name"); builder.stream(inputTopic, Consumed.with(Serdes.String(), publicationSerde)) .filter((key, publication) -> "George R. R. Martin".equals(publication.getName())) .to(outputTopic, Produced.with(Serdes.String(), publicationSerde)); return builder.build(); } public static void main(String[] args) throws Exception { if (args.length < 1) { throw new IllegalArgumentException("Environment configuration file must be specified."); } FilteredStreamsApp app = new FilteredStreamsApp(); Properties envProps = app.loadEnvProperties(args[0]); Properties streamProps = app.buildStreamsProperties(envProps); app.preCreateTopics(envProps); Topology topology = app.buildTopology(envProps, new ProtobufSerdes<>(PublicationOuterClass.Publication.parser())); final KafkaStreams streams = new KafkaStreams(topology, streamProps); final CountDownLatch latch = new CountDownLatch(1); Runtime.getRuntime().addShutdownHook(new Thread("streams-jvm-shutdown-hook") { @Override public void run() { streams.close(); latch.countDown(); } }); try { streams.start(); latch.await(); } catch (Exception e) { System.exit(1); } System.exit(0); } }
Producer and Consumer writing tests
In src / main / java / huxihx / kafkastreams / tests / TestProducer.java and TestConsumer.java, contents are as follows:
package huxihx.kafkastreams.tests; import huxihx.kafkastreams.proto.PublicationOuterClass; import huxihx.kafkastreams.serdes.ProtobufSerializer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Arrays; import java.util.List; import java.util.Properties; public class TestProducer { // 测试输入事件 private static final List<PublicationOuterClass.Publication> TEST_PUBLICATIONS = Arrays.asList( PublicationOuterClass.Publication.newBuilder() .setName("George R. R. Martin").setTitle("A Song of Ice and Fire").build(), PublicationOuterClass.Publication.newBuilder() .setName("C.S. Lewis").setTitle("The Silver Chair").build(), PublicationOuterClass.Publication.newBuilder() .setName("C.S. Lewis").setTitle("Perelandra").build(), PublicationOuterClass.Publication.newBuilder() .setName("George R. R. Martin").setTitle("Fire & Blood").build(), PublicationOuterClass.Publication.newBuilder() .setName("J. R. R. Tolkien").setTitle("The Hobbit").build(), PublicationOuterClass.Publication.newBuilder() .setName("J. R. R. Tolkien").setTitle("The Lord of the Rings").build(), PublicationOuterClass.Publication.newBuilder() .setName("George R. R. Martin").setTitle("A Dream of Spring").build(), PublicationOuterClass.Publication.newBuilder() .setName("J. R. R. Tolkien").setTitle("The Fellowship of the Ring").build(), PublicationOuterClass.Publication.newBuilder() .setName("George R. R. Martin").setTitle("The Ice Dragon").build()); public static void main(String[] args) { Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("acks", "all"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", new ProtobufSerializer<PublicationOuterClass.Publication>().getClass()); try (final Producer<String, PublicationOuterClass.Publication> producer = new KafkaProducer<>(props)) { TEST_PUBLICATIONS.stream() .map(publication -> new ProducerRecord<String, PublicationOuterClass.Publication>("publications", publication)) .forEach(producer::send); } } }
package huxihx.kafkastreams.tests; import com.google.protobuf.Parser; import huxihx.kafkastreams.proto.PublicationOuterClass; import huxihx.kafkastreams.serdes.ProtobufDeserializer; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import org.apache.kafka.common.serialization.Deserializer; import org.apache.kafka.common.serialization.StringDeserializer; import java.time.Duration; import java.util.Arrays; import java.util.HashMap; import java.util.Map; import java.util.Properties; public class TestConsumer { public static void main(String[] args) { // 为输出事件构造protobuf deserializer Deserializer<PublicationOuterClass.Publication> deserializer = new ProtobufDeserializer<>(); Map<String, Parser<PublicationOuterClass.Publication>> config = new HashMap<>(); config.put("parser", PublicationOuterClass.Publication.parser()); deserializer.configure(config, false); Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test-group"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("auto.offset.reset", "earliest"); KafkaConsumer<String, PublicationOuterClass.Publication> consumer = new KafkaConsumer<>(props, new StringDeserializer(), deserializer); consumer.subscribe(Arrays.asList("filtered-publications")); while (true) { ConsumerRecords<String, PublicationOuterClass.Publication> records = consumer.poll(Duration.ofSeconds(1)); for (ConsumerRecord<String, PublicationOuterClass.Publication> record : records) System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); } } }
test
First we build the project, run the following command:
./gradlew shadowJar
Kafka then start the cluster, run after Kafka Streams application:
java -jar build/libs/kstreams-transform-standalone-0.0.1.jar configuration/dev.properties
Then start TestProducer send a test event:
java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestProducer
Kafka Streams verification starts last TestConsumer filtered Publication specified message:
java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestConsumer
.......
offset = 0, key = null, value = name: "George R. R. Martin"
title: "A Song of Ice and Fire"offset = 1, key = null, value = name: "George R. R. Martin"
title: "Fire & Blood"offset = 2, key = null, value = name: "George R. R. Martin"
title: "A Dream of Spring"offset = 3, key = null, value = name: "George R. R. Martin"
title: "The Ice Dragon"
to sum up
Next introduce rekey usage, that is, the value of real-time changes Key messages ~