Kafka producer - write data to Kafka

(1) Producer overview

(1) Different application scenarios have different requirements for messages, that is, whether to allow message loss , repetition , delay , and throughput requirements . Different scenarios have a direct impact on the API usage and configuration of Kafka producers.

Example 1: The credit card transaction processing system does not allow the repetition and loss of messages, and the maximum delay is 500ms, which requires high throughput.

Example 2: Save the click information of the website, allow a small amount of message loss and duplication, the delay can be slightly higher (the user can click the link to load the page immediately), and the throughput depends on the frequency of the user using the website.

(2) The main steps for Kafka to send messages

Message format: Each message is a ProducerRecord object, which must specify the Topic to which the message belongs and the message value Value. In addition , the Partition to which the message belongs and the Key of the message can also be specified .

1: Serialize ProducerRecord

2: If Partition is specified in ProducerRecord, Partitioner does nothing; otherwise, Partitioner gets a Partition according to the key of the message. This means that the producer knows to send this message to which Partition under which Topic.

3: The message is added to the corresponding batch, and independent threads send these batches to the Broker

4: The broker will return a response when it receives the message. If the message is successfully written to Kafka, the RecordMetaData object will be returned, which contains Topic information, Patition information, and Offset information of the message in the Partition; if it fails, an error will be returned

(3) Kafka's order guarantee. Kafka guarantees that the messages in the same partition are ordered, that is, if the producer sends messages in a certain order, the broker will write them to the partition in this order, and the consumer will read them in the same order.

Example: Depositing 100 to the account and then withdrawing it is completely different from withdrawing 100 and then depositing it, so this scenario is very order-sensitive.

If some scenarios require messages to be ordered, it is not recommended to set retries to 0. You can set max.in.flight.requests.per.connection to 1, which will seriously affect the throughput of the producer, but it can ensure strict order.

(2) Create a Kafka producer

To write messages to Kafka, you need to create a Producer and set some properties.

Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "broker1:port1, broker2:port2");
kafkaProps.put("key.serializer", "org.apache.kafka.common.StringSerializer");
kafkaProps.put("value.serializer", "org.apache.kafka.common.StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);

A Kafka producer has the following three required properties:

(1) bootstrap.servers , specify the address list of the broker

(2) key.serializer must be a class that implements the org.apache.kafka.common.serialization.Serializer interface to serialize the key into a byte array. Note: key.serializer must be set even if no key is specified in the message.

(3) value.serializer, serializes the value into a byte array

(3) Send messages to Kafka

(1) Send messages synchronously

ProducerRecord<String, String> record = new ProducerRecord<>("CustomCountry", "Precision Products", "France");//Topic Key Value
try{
    Future future = producer.send(record); 
    future.get();//If you don't care whether the transmission is successful, you don't need this line.
} catch(Exception e) {
    e.printStackTrace();//Connection errors and No Leader errors can be resolved by retrying; errors such as message too large kafkaProducer will not perform any retry, and throw an exception directly
}

(2) Send messages asynchronously

ProducerRecord<String, String> record = new ProducerRecord<>("CustomCountry", "Precision Products", "France"); // Topic Key Value 
producer.send(record, new DemoProducerCallback());//When sending a message, Pass a callback object, which must implement the org.apahce.kafka.clients.producer.Callback interface

private class DemoProducerCallback implements Callback {
    @Override
    public  void onCompletion(RecordMetadata recordMetadata, Exception e) {
         if (e != null ) {//If Kafka returns an error, the onCompletion method throws a non null exception.
            e.printStackTrace();//Do some processing on the exception, just simply print it out here
        }
    }
}

(4) Configuration of the producer

(1) acks specifies how many partition replicas must receive the message before the producer considers the message writing to be successful.

      acks=0, the producer does not need to wait for the server's response, and sends messages at the maximum speed that the network can support, with high throughput, but if the broker does not receive the message, the producer does not know

      acks=1, the leader partition receives the message, and the producer receives a successful response from the server

      acks=all, all partitions have received the message, the producer will receive a successful response from the server

(2) buffer.memory, which sets the size of the cache area in the producer, which is used by the producer to buffer messages to be sent to the server.

(3) compression.type, by default, the message will not be compressed when sent, this parameter can be set to snappy, gzip or lz4 to compress the message sent to the broker

(4) retries, the number of times the producer resends the message when the producer receives a temporary error from the server

(5) batch.size, messages sent to the same partition will be stored in the batch first. This parameter specifies the memory size that a batch can use, in bytes. It is not necessary to wait until the batch is full before sending

(6) linger.ms, the producer waits for linger.ms before sending a message, thereby waiting for more messages to be added to the batch. If the batch is full or linger.ms reaches the upper limit, the messages in the batch are sent out

(7) max.in.flight.requests.per.connection, the number of messages that the producer can send before receiving the server response

(5) Serializer

When creating a ProducerRecord, you must specify a serializer. It is recommended to use the serialization frameworks Avro, Thrift, ProtoBuf, etc. It is not recommended to create a serializer yourself.

Before using Avro, you need to define a schema, which is usually written in JSON.

(1) Create a class to represent the customer as the value of the message

class Custom {
    private int customID;
    private String customerName;
    
    public Custom(int customID, String customerName) {
        super();
        this.customID = customID;
        this.customerName = customerName;
    }

    public int getCustomID() {
        return customID;
    }

    public String getCustomerName() {
        return customerName;
    }
}

(2) Define schema

{  
  "namespace": "customerManagement.avro",  
   "type": "record",  
   "name": "Customer",  
   "fields":[  
       {  
          "name": "id", "type": "string"  
       },  
       {  
          "name": "name",  "type": "string"  
       },  
   ]  
}

(3) Generate Avro objects and send them to Kafka

Properties props = new Properties();  
      props.put("bootstrap", "loacalhost:9092");  
      props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");  
      props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");  
      props.put( "schema.registry.url" , schemaUrl);//schema.registry.url points to the storage location of Shema
      String topic = "CustomerContacts";
      Producer<String, Customer> produer = new KafkaProducer<String, Customer>(props);
      
      // keep generating messages and sending 
      while ( true ) {
          Customer customer = CustomerGenerator.getNext();
          ProducerRecord<String, Customer> record = new ProducerRecord<>(topic, customer.getId(), customer);
          producer.send(record);//Send customer as the value of the message, KafkaAvroSerializer will handle the rest
      }

(6)Partition

ProducerRecord can only contain topic and value of message, key is null by default, but most applications will use key and key for two functions:
(1) As additional information of message

(2) Determine which partition of the topic the message should be written to, and messages with the same key will be written to the same partition.

If the key is empty, Kafka uses the default partitioner and uses the RoundRobin algorithm to evenly distribute the messages on each partition;

If the key is not empty, Kafka uses its own hash method to hash the key, and the same key is mapped to the same partition. Only on the premise of not changing the number of partitions, the mapping of keys and partitions can remain unchanged.

Kafka also supports users to implement their own partitioner. The partitioner defined by the user needs to implement the Partitioner interface.

 

Reference: "Kafka Definitive Guide"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325018467&siteId=291194637