KafkaStreams study notes-03

Chapter 3 Developing Kafka Streams

A lot of code in the book uses lambda expressions, so first fill in this knowledge.

lambda expression

Java is an object-oriented language and cannot directly pass code blocks, or call methods. If you need to reuse a block of code, you need to construct an instance of a class or interface to encapsulate this method, which is cumbersome, so lambda expressions are introduced. Lambda expressions seem to call methods directly, a bit biased towards functional programming.
Format:
(Parameter)-> Expression The
expression can be expanded and written in the code block {} curly brackets.
If there are no parameters, the brackets cannot be omitted unless the compiler can derive the parameter type
. The return value of the expression does not need to specify the type

Some interfaces in Java only encapsulate a method (functional interface), and when you use this method, you often need to implement an instance of the interface to call the method [previously commonly used anonymous inner class implementation]. In this case, you can also use lambda expressions, which looks more concise.

Thread thread1 = new Thread(new Runnable{
	@override
	public void run(){
		System.out.println("this is a java thread running");
		}
	}
);

Can be rewritten as lambda expression

Thread thread2 = new Thread(()->System.out.println("this is a java thread-lambda running"));

The lambda expression form does not declare the Runnable interface, because the compiler can infer from the context that the call is the Runnable interface instance. Because the parameters required by the Thread class during construction are instances of the Runnable interface.

It gives me the feeling that the type of lambda expression is an instance of a functional interface, and it omits the instantiated statement of the interface that encapsulates the method according to the context. So at other times, Java can call a lambda statement, which looks like calling a method directly, or you can assign a lambda expression to an interface.

The method reference further abbreviates the lambda expression.
Double colon operator ::
class name :: method name
Often combined with the lambda expression, the parameter is an object under the class name, the method is called, and the return value of the method is returned.

In general, the JDK upgrade compiler is getting smarter, and more and more things can be inferred, so there is less need to write the program source clearly. It can be abbreviated. The lambda expression is an example of a person.

Understanding the development steps

The core of the Kafka Stream API is the KStream object. Many methods use a coherent interface. Many interfaces are functional interfaces, and lambda expressions can be used.
The core of the fluent interface is that the object calling the method and the object returned by the method are the same. It is very convenient to process objects with chain programming. I feel that it is more intuitive to modify and change objects in different ways.
But the difference between the Kafka Stream API is that each time a KStream calls a method, it returns a copy of the KStream, not the original object. 【why? What are the advantages of this design? Where did the original object go? , How do I know that this object is a copy rather than the original object in use]

General development steps

  1. Define Kafka Streams configuration
  2. Create a Serde instance, which can be customized or use default
  3. Create a processor topology
  4. Create a world and start KStream

The book uses Yelling App as an example, but the order of explanation is 3-1-2-4. I read it twice and understood the meaning, and prepared the notes to be written in order, so that when I reviewed it myself, the idea was clearer.

Kafka Streams deployment

The Kafka Streams program is highly configurable, and two configuration items are required. Use the method props.put to configure as follows

props.put(StreamsConfig.APPLICATION_ID_CONFIG,"yelling_app_id");
props.put(StreamsConfig,BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");

The props.put method found in these two statements directly assigns value to the final variable in StreamsConfig, where StreamsConfig should be a final class, and the value of the variable is not modified in the program. application_id is the unique identifier of the program. This ID must be unique in the cluster. The host name behind: the port name configures the position of the current application in the cluster

Serde instance creation

The storage format of the Kafka theme is a byte array, the data transmission format on kafka streams is json, and the objects used in processing are objects. This requires the message (byte array) to be converted into json, and then into an object for processing by the processor. This process requires serde objects to be serialized or deserialized. The basic types of serialization or deserialization are implemented by Serds factory methods, such as:

Serde<String> stringSerde = Serdes.String();

The serde interface is generic, it should be the type of object that needs to be serialized or deserialized, and then create a corresponding instance, which can be returned by the method in the Serdes class. It is found here that Serdes is a class that should contain common types of serialization or deserialization methods. Other custom classes need to create custom Serde classes. Serde is an interface for receiving such a Serde instance, which is the container of the serializer and deserializer of the object.

Topology of Yelling App

All Topic information is converted to uppercase letters:
two Topics are used as data storage files and
three processors: source processor, converted to uppercase letters, and receiving processor. These three are the
entire topology that needs to be built with API . Chained

source processor

KStream<String, String > simpleFirstStream = builder.stream("src-topic", Comsumed.with(stringSerde, stringSerde));

It can be found in this statement that the KStream object is generic [is it a key-value pair? ], Created by builder.stream (), requires two parameters, the first parameter is Topic, the second parameter is the Serde object. The Serde object is determined by the Combined.with method. The parameters required by the latter are also Serde objects, which also conform to the principles of fluent interfaces. Note that the Serde object is used to serialize or deserialize the message. The Serde class provides some basic types of serialization or deserialization methods. Special types need to be customized (override methods).
Then the two classes of Comsumed and Produced can be understood as input and output in the IO stream, or read and write. When the message is read (input stream) corresponds to the consumption message. When the message is written (output stream) corresponds to the production message.

Uppercase character processor

KStream<String, String> upperCasedStream = simpleFirstStream.mapValues(String::toUpperCase);

It can be seen in this statement that the mapValues ​​method is called by the simpleFirstStream object of the source processor, and a new KStream object is obtained as an uppercase character processor. This processor is a copy of the source processor. What the mapValues ​​method does here is convert the message to uppercase characters.
The parameters in the mapValues ​​method use lambda expressions and call the toUpperCase method in the String class. This is a lazy expression, which is very clear for those who write programs, and a little vague for those who read programs. In fact, the parameter of this mapValues ​​method is to receive an instance of the ValueMaper <V, V1> interface. There is an apply method in this interface, that is to say, this interface is a functional interface, and its role is to process a value It comes out as another value (literally, map). When there is no lambda expression, all you need to do is instantiate this interface with an anonymous class and rewrite the apply method. The lambda expression is used directly here. In fact, the rewrite of the apply method is to call the toUpperCase method in the String class. The complete lambda expression can be (s)-> s.toUpperCase (), which uses the double colon method reference.
Because the method of mapValues ​​must receive the above example, lambda expressions do not allow writing so many troublesome things. The interface and rewriting methods are not reflected in the code, but I think the bottom layer is still implemented step by step by rewriting the interface and method.
Probably because I am Xiaobai, I think that the way of writing lambda expressions without interfaces is to write the code for a while, read the code crematorium ... [Close!

sink processor writes the processed message to the specified topic

upperCasedStream.to("out-topic",Produed.with(stringSerde,stringSerde));

This sentence found that actually no new KStream object was created, because there is no new topology, you can call the to method to write a message. The to method requires two parameters, a bit like builder.stream in the source processor. One of the required parameters is the output theme name, and the second parameter is the Serde instance. This time, Produced.with processes the given Serde instance.

At this point, a simple topology for a yelling app is created. The construction process is: create a source processor from the theme, call mapValues ​​to return an uppercase character processor, and call the to method to write the theme. Since each step returns a copy of KStream, you can use chain programming to adjust the above topology creation code to:

builder.stream("src-topic, Consumed.with(stringSerde,stringSerde))
.mapValues(String::toUpperCase)
.to("out-topic",Produced.with(stringSerde, stringSerde));

Program source code

/*
 * Copyright 2016 Bill Bejeck
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package bbejeck.chapter_3;

import bbejeck.clients.producer.MockDataProducer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.Consumed;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Printed;
import org.apache.kafka.streams.kstream.Produced;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Properties;

public class KafkaStreamsYellingApp {
	//log info
    private static final Logger LOG = LoggerFactory.getLogger(KafkaStreamsYellingApp.class);

    public static void main(String[] args) throws Exception {


        //Used only to produce data for this application, not typical usage
        MockDataProducer.produceRandomTextData();
		
		 //use properties class to configure application
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "yelling_app_id");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

        StreamsConfig streamsConfig = new StreamsConfig(props);
		 
		 //Serde instance
        Serde<String> stringSerde = Serdes.String();

		 //use builder to build topology
        StreamsBuilder builder = new StreamsBuilder();

        //source processor,read from topic
        KStream<String, String> simpleFirstStream = builder.stream("src-topic", Consumed.with(stringSerde, stringSerde));

		 //upperCase processor
        KStream<String, String> upperCasedStream = simpleFirstStream.mapValues(String::toUpperCase);

		 //sink processor, wrtie to topic
        upperCasedStream.to( "out-topic", Produced.with(stringSerde, stringSerde));
        
		 //console print?
        upperCasedStream.print(Printed.<String, String>toSysOut().withLabel("Yelling App"));

		 //build kafkaStream app
        KafkaStreams kafkaStreams = new KafkaStreams(builder.build(),streamsConfig);
        LOG.info("Hello World Yelling App Started");
		 
		 //start app
        kafkaStreams.start();
        Thread.sleep(35000);
        LOG.info("Shutting down the Yelling APP now");
  		 
  		 //close resources
        kafkaStreams.close();
        MockDataProducer.shutdown();

    }
}

to sum up

The general steps of the Kafka Streams application:

  1. Create StreamsConfig instance-configuration program
  2. Create Serde object-serialization deserializer
  3. Construct processor topology-KStreams node
  4. Start the Kafka Streams application

Things that have not been figured out:

  • The role of serde class, how to achieve serialization and deserialization
  • KStream <> The two types required by this generic type do not understand what is the relationship, is the key value right? What do these two types correspond to when creating an object? [Is it the two parameter types of the later builder () method, the first is the Topic name, it should be String, the second is the type of message processed, the String class in the yelling app example, and the purchase in the ZMart example class?
  • This simple example does not give me a sense of distribution. It is related to the physical cluster and how the broker establishes distributed processing. It does not reflect the messaging of other brokers or troubleshooting
  • How does each node, that is, the KStream object consume and produce messages, for example, when there are many messages, are they read one by one, and the size of each read is every time it is read Create a KStream object and corresponding topology once? [Obviously, this is not the case, but the answer cannot be found temporarily from the code. It should be stable after a topology is established. Because I compare the normal Java IO stream processing data, it is to establish a channel between the program and the file, and then set the processing rate and transfer the data. But here it feels like I just specified Topic, it seems that there is no statement to read and write data, I do n’t understand it here]
Published 9 original articles · Likes0 · Visits 858

Guess you like

Origin blog.csdn.net/weixin_43138930/article/details/105468448