Kafka - Preliminary Learning for Producers

Kafka - Preliminary Learning for Producers


1. kafka producer component

kafka.png-157.9kB

  1. We start by creating a ProducerRecord object, which needs to contain the target topic and what to send. We can also specify keys or partitions. When sending a ProducerRecord object, the producer first serializes the key and value objects into byte arrays so they can be transmitted over the network.
  2. Next, the data is passed to the partitioner. If the partition was previously specified in the ProducerRecord object, then the partitioner will not do anything and return the specified partition directly. If no partition is specified, the partitioner chooses a partition based on the key of the ProducerRecord object.
  3. After the partition is selected, the producer knows which topic and partition to send the record to. Next, the record is added to a record batch, and all messages in this batch are sent to the same topic and partition. A separate thread is responsible for sending batches of these records to the corresponding broker.
  4. The server returns a response when it receives these messages. If the message is successfully written to Kafka, a RecordMetaData object is returned, which contains topic and partition information, and the offset recorded in the partition. If the write fails, an error is returned. The producer will try to resend the message after receiving the error, and if it still fails after a few times, it will return an error message.

2. Create a producer using the Java API

2.1 pom file dependencies

<properties>
    <java.version>1.8</java.version>
    <kafka.version>1.1.0</kafka.version>
</properties>


<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>${kafka.version}</version>
    </dependency>
</dependencies>

At present, the latest stable version of kafka is 1.1.0, so this time it also uses the supporting client version api.

2.2 Required properties for connection

To write messages to Kafka, first create a producer object and set some properties. Kafka producers have 3 required properties.

  1. bootstrap.servers : This property specifies the address list of the broker, the format of the address is host:port. The list does not need to contain all borkeraddresses, the producer will look for other broker information from the given broker. However, in actual use, it is recommended to provide at least two broker information. This is because if one of the machines goes down, the producer will still be able to connect to the cluster.
  2. key.serializer : The information that the broker expects to receive is a byte array. The producer interface allows the use of parameterized types, so Java objects can be sent to the broker as keys and values. Such code has good readability. But the producer itself does not know how to serialize Java objects into byte arrays, so we need to set this property to a org.apache.kafka.common.serialization.Serializerclass that implements the interface, and the producer will use this class to serialize key objects into byte arrays.
  3. value.serializer : Like key.serializer, the class specified by value.serializer will serialize the value. If both the key and the value are of the same data type, then the same serializer is used. If the data types are different, you need to use a different serializer.

Note:
The Kafka client currently only provides the following serializers:

  1. ByteArraySerializer
  2. StringSerializer
  3. IntegerSerializer

There is no need to customize the serializer when only common Java objects are used, but when there are other requirements, you need to implement it yourself. Also note that the key.serializer property must be set even if you only intend to send values.

The following is a code snippet, only the necessary properties are specified here

import java.util.Properties;

Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "master:9092");
kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

Three, kafka sending method

Next, you only need to instantiate the producer object, and you can send the message. There are currently three main ways to send messages:

  1. fire-and-forget
  2. Send synchronously
  3. Asynchronous send

3.1 fire-and-forget

We send the message to the server, but don't care if it arrives normally. Most of the time, messages arrive normally because Kafka is highly available and producers automatically try to resend. However, some messages are sometimes lost using this method.

public static void main(String[] args) {
    Properties kafkaProps = new Properties();
    kafkaProps.put("bootstrap.servers", "master:9092");
    kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

    KafkaProducer producer = new KafkaProducer<String, String>(kafkaProps);

    ProducerRecord<String, String> record = new ProducerRecord<>("test", "Precision Products", "USA");

    try {
        producer.send(record);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
  1. The producer's send() method takes a ProducerRecord object as a parameter, so we need to create a ProducerRecord object first.
  2. We send the ProducerRecord object using the producer's send() method. As you can see from the producer's architecture diagram, the message is first put into the buffer and then sent to the server using a separate thread. The send() method returns a Future object containing the RecordMetadata, but since we ignore the return value, we have no way of knowing whether the message was sent successfully. If you don't care about sending results, you can use this sending method. For example, log Twitter messages, or less important application logs.
  3. We can ignore errors that may occur when sending the message or errors that may occur on the server side, but other exceptions may occur on the producer before the message is sent. These exceptions may be SerializationException (indicating a failure to serialize the message), BufferExhaustedException or TimeoutException (indicating that the buffer is full), or InterruptException (indicating that the sending thread was interrupted).

3.2 Synchronous transmission

We use the send() method to send a message, it will return a Future object, call the get() method to wait, and you can know whether the message is sent successfully.

public static void main(String[] args) {
    Properties kafkaProps = new Properties();
    kafkaProps.put("bootstrap.servers", "master:9092");
    kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

    KafkaProducer producer = new KafkaProducer<String, String>(kafkaProps);

    ProducerRecord<String, String> record = new ProducerRecord<>("test", "Precision Products", "USA");

    try {
        producer.send(record).get();
    } catch (Exception e) {
        e.printStackTrace();
    }
}
  1. Here, the producer.send() method first returns a Future object, and then calls the Future object's get() method to wait for the Kafka response. The get() method throws an exception if the server returns an error. If no error occurred, we get a RecordMetadata object, which we can use to get the offset of the message.
  2. An exception is thrown if any error occurs before or during the sending of the data, such as the broker returning an exception that does not allow retransmission of the message or the number of retransmissions has been exceeded. We simply print the exception information.

3.3 Asynchronous sending

We call the send() method and specify a callback function that the server calls when it returns a response.

Assume that a message takes 10ms to go back and forth between the application and the Kafka cluster. If you wait for a response after each message is sent, it takes 1 second to send 100 messages. But if you just send the message and don't wait for a response, it takes a lot less time to send 100 messages.

Most of the time, we don't need to wait for a response - although Kafka will send back the destination topic, partition information, and the offset of the message, it's not necessary for the sending application. However, when encountering a message sending failure, we need to throw an exception, record an error log, or write the message to an "error message" file for later analysis.

In order to handle exceptions while sending messages asynchronously, the producer provides callback support. Below is an example of using callbacks.

public static void main(String[] args) {
    Properties kafkaProps = new Properties();
    kafkaProps.put("bootstrap.servers", "master:9092");
    kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

    KafkaProducer producer = new KafkaProducer<String, String>(kafkaProps);

    ProducerRecord<String, String> record = new ProducerRecord<>("test", "Precision Products", "USA");

    try {
        producer.send(record, new DemoProducerCallback());
    } catch (Exception e) {
        e.printStackTrace();
    }
}

private static class DemoProducerCallback implements Callback {
    @Override
    public void onCompletion(RecordMetadata metadata, Exception exception) {
        if (exception != null) {
            exception.printStackTrace();
        }
    }
}
  1. To use callbacks, you need a class that implements the org.apache.kafka.clients.producer.Callback interface, which has only one onCompletion method.
  2. If Kafka returns an error, the onCompletion method will throw a non null exception. Here we simply print it out, but there should be a better way to handle it in production.
  3. Pass in a callback object when sending a message.

3.4 Attention

Two types of errors generally occur in KafkaProducer.

  • One category is retryable errors, which can be resolved by resending the message. For example, connection errors can be resolved by establishing the connection again, and "no
    leader" errors can be resolved by re-electing the leader for the partition. KafkaProducer
    can be configured to automatically retry, if the problem cannot be resolved after multiple retries, the application will receive a retry exception.
  • Another type of error cannot be resolved by retries, such as "message too large" exceptions. For this kind of error, KafkaProducer will not do any retry and throw an exception directly.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325951072&siteId=291194637