kafka producer principle analysis (a)

When using KafkaProducer user message sending messages need to be transmitted is first packaged into ProducerRecord, it returns a Future object, Future typical design pattern. Callable interfaces can also specify a callback message for transmission at transmission.

1, ProducerRecord FIG class

 

Here Insert Picture Description

 

We first look at ProducerRecord core attributes, namely six core elements that constitute the message:

 

  • String topic message belongs theme.
  • Integer message queues partition where the subject matter, can be artificially specified, if designated key, then uses the key to the queue to the total number hashCode modulo selected partition, if all the previous two are not specified, it will poll the topic partition.
  • Headers headers additional attributes of the message, the message body stored separately.
  • Message key K key, if this value is specified, the partition will be used to select the queue number hashcode value modulo.
  • V value message body.
  • Long timestamp message time stamp, based on the value of the configuration information message.timestamp.type topic to impart a different value.
    • CreateTime sending client time stamp when sending the message.
    • LogAppendTime message broker for adding a time stamp.

Headers which is a series of key-value pairs.

After understanding ProducerRecord we began to explore Kafka's messaging processes.

2, Kafka additional message flow

KafkaProducer send method, and is not sent directly to the message broker, the message is sent asynchronously Kafka, i.e. into two steps, functions method is to send an additional message into memory (buffer queue of partitions), will then be made send a special thread asynchronous cached messages sent in bulk to Kafka Broker in.

Message KafkaProducer # send additional entrance

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {  
    // intercept the record, which can be potentially modified; this method does not throw exceptions
    ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);                // @1
    return doSend(interceptedRecord, callback);                                                                     // @2
}
复制代码

@ 1 Code: performing first message interceptor, the interceptor is specified by interceptor.classes, type List <String>, each element is a fully qualified class name of the path defined interceptor. Code @ 2: Perform doSend method, follow-up call that we need to look at the timing of Callback.

Next we look doSend method.

2.1 DoSend

KafkaProducer # DoSend

ClusterAndWaitTime clusterAndWaitTime;
try {
    clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
} catch (KafkaException e) {
    if (metadata.isClosed())
        throw new KafkaException("Producer closed while send in progress", e);
	throw e;
}
long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
复制代码

Step1: Get a list of topic partition, if not the local partition information in the topic, the need to acquire broker distally, pulling the method returns metadata time-consuming. When the maximum waiting time during the transmission time of the message portion will be deducted loss.

Tips: This article does not intend to carry out in-depth study of this method, there will be a special follow-up article to analyze synchronization mechanism Kafka metadata, similar to the Nameserver devoted RocketMQ similar.

KafkaProducer # DoSend

byte[] serializedKey;
try {
    serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
} catch (ClassCastException cce) {
    throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer", cce);
}
复制代码

Step2: Serialization key. Note: The serialization method, although there is an incoming topic, Headers these two properties, but only to participate in the serialized key.

KafkaProducer # DoSend

byte[] serializedValue;
try {
    serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
} catch (ClassCastException cce) {
    throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer", cce);
}
复制代码

Step3: message body content of the serialization.

KafkaProducer # DoSend

int partition = partition(record, serializedKey, serializedValue, cluster);
tp = new TopicPartition(record.topic(), partition);
复制代码

Step4: The message is calculated according to the algorithm of the load destined partition partition. The default implementation class for the DefaultPartitioner, the routing algorithm is as follows:

  • If the specified key, the key is used with the number of partitions modulo hashcode.
  • If the key is not specified, all polling district.

KafkaProducer # DoSend

setReadOnly(record.headers());
Header[] headers = record.headers().toArray();
复制代码

Step5: If the message header information (RecordHeaders), is set to read-only.

KafkaProducer # DoSend

int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(apiVersions.maxUsableProduceMagic(),
                    compressionType, serializedKey, serializedValue, headers);
ensureValidRecordSize(serializedSize);
复制代码

Step5: The version number used to calculate the length of the message according to the message protocol and exceeds the specified length, than if an exception is thrown.

KafkaProducer # DoSend

long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);
复制代码

Step6: first initialization message timestamp, and incoming Callable (callback) was added to the interceptor chain.

KafkaProducer # DoSend

if (transactionManager != null && transactionManager.isTransactional())
    transactionManager.maybeAddPartitionToTransaction(tp);
复制代码

Step7: If the transaction processor is not empty, the implementation of the relevant transaction management, this section does not consider matters related to the implementation details of the message, follow the corresponding article estimated that there will be parsed.

KafkaProducer # DoSend

RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, headers, interceptCallback, remainingWaitMs);
if (result.batchIsFull || result.newBatchCreated) {
    log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
}
return result.future;
复制代码

Step8: The message is appended to the buffer zone, which will be the focus of this article need to explore. If the current buffer is full or create a new buffer zone, (message threads) is awakened Sender, will send a message to the buffer zone broker server, eventually return future. Here is a classic Future design patterns, from here you can also learn that after doSend method execution is completed, then the message is not necessarily sent successfully to broker.

KafkaProducer # DoSend

} catch (ApiException e) {
    log.debug("Exception occurred during message send:", e);
    if (callback != null)
        callback.onCompletion(null, e);
        
	this.errors.record();
    this.interceptors.onSendError(record, tp, e);
        return new FutureFailure(e);
} catch (InterruptedException e) {
    this.errors.record();
    this.interceptors.onSendError(record, tp, e);
    throw new InterruptException(e);
} catch (BufferExhaustedException e) {
    this.errors.record();
    this.metrics.sensor("buffer-exhausted-records").record();
    this.interceptors.onSendError(record, tp, e);
    throw e;
} catch (KafkaException e) {
    this.errors.record();
    this.interceptors.onSendError(record, tp, e);
    throw e;
} catch (Exception e) {
    // we notify interceptor about all exceptions, since onSend is called before anything else in this method
    this.interceptors.onSendError(record, tp, e);
    throw e;
}
复制代码

Step9: For a variety of abnormalities, collect relevant information.

Next will focus on how the message is appended to the transmit buffer producer, its implementation class: RecordAccumulator.

2.2 RecordAccumulator append method Detailed

RecordAccumulator#append

public RecordAppendResult append(TopicPartition tp,
                                     long timestamp,
                                     byte[] key,
                                     byte[] value,
                                     Header[] headers,
                                     Callback callback,
                                     long maxTimeToBlock) throws InterruptedException {
复制代码

Before introducing this method, we first look at the parameters of the method.

  • TopicPartition tp topic and partition information that is sent to the partition of which topic.
  • Time stamp when sending long timestamp client.
  • byte [] key key message.
  • byte [] value message body.
  • Header [] headers message header, the message may be understood as an additional attribute.
  • Callback callback callback method.
  • long maxTimeToBlock additional message timeout.

RecordAccumulator#append

Deque<ProducerBatch> dq = getOrCreateDeque(tp);
synchronized (dq) {
    if (closed)
        throw new KafkaException("Producer closed while send in progress");
    RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    if (appendResult != null)
        return appendResult;
}
复制代码

Step1: try to obtain a double-ended queue kafka according to topic and partitions, if not, create one, then call a method tryAppend appended to the message buffer. Kafka will each partition to create a message buffer for each topic, the first message added to the cache, and then transmits the message API returns immediately, and by a separate thread in the cache the Sender messages regularly sent to the broker. Achieve this cache area using ArrayQeque. Then call tryAppend method attempts to append messages to its cache, if additional successful, the result is returned.

Before explaining the next process, we first look at the storage structure Kafka deque:

Here Insert Picture Description

 

 

RecordAccumulator#append

int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
buffer = free.allocate(size, maxTimeToBlock);
复制代码

Step2:如果第一步未追加成功,说明当前没有可用的 ProducerBatch,则需要创建一个 ProducerBatch,故先从 BufferPool 中申请 batch.size 的内存空间,为创建 ProducerBatch 做准备,如果由于 BufferPool 中未有剩余内存,则最多等待 maxTimeToBlock ,如果在指定时间内未申请到内存,则抛出异常。

RecordAccumulator#append

synchronized (dq) {
    // Need to check if producer is closed again after grabbing the dequeue lock.
    if (closed)
        throw new KafkaException("Producer closed while send in progress");
    // 省略部分代码
    MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
    ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
    FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));
    dq.addLast(batch);
    incomplete.add(batch);
    // Don't deallocate this buffer in the finally block as it's being used in the record batch
    buffer = null;
    return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}
复制代码

Step3:创建一个新的批次 ProducerBatch,并将消息写入到该批次中,并返回追加结果,这里有如下几个关键点:

  • 创建 ProducerBatch ,其内部持有一个 MemoryRecordsBuilder对象,该对象负责将消息写入到内存中,即写入到 ProducerBatch 内部持有的内存,大小等于 batch.size。
  • 将消息追加到 ProducerBatch 中。
  • 将新创建的 ProducerBatch 添加到双端队列的末尾。
  • 将该批次加入到 incomplete 容器中,该容器存放未完成发送到 broker 服务器中的消息批次,当 Sender 线程将消息发送到 broker 服务端后,会将其移除并释放所占内存。
  • 返回追加结果。

纵观 RecordAccumulator append 的流程,基本上就是从双端队列获取一个未填充完毕的 ProducerBatch(消息批次),然后尝试将其写入到该批次中(缓存、内存中),如果追加失败,则尝试创建一个新的 ProducerBatch 然后继续追加。

接下来我们继续探究如何向 ProducerBatch 中写入消息。

2.3 ProducerBatch tryAppend方法详解

ProducerBatch #tryAppend

public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
    if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {  // @1
        return null;
    } else {
        Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);                    // @2
        this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
                    recordsBuilder.compressionType(), key, value, headers));               // @3
        this.lastAppendTime = now;                                                                          //                                                     
        FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
                                                                   timestamp, checksum,
                                                                   key == null ? -1 : key.length,
                                                                   value == null ? -1 : value.length,
                                                                   Time.SYSTEM);                                        // @4
        // we have to keep every future returned to the users in case the batch needs to be
        // split to several new batches and resent.
        thunks.add(new Thunk(callback, future));                                                           // @5
        this.recordCount++;
        return future;                                                                            
    }
}
复制代码

代码@1:首先判断 ProducerBatch 是否还能容纳当前消息,如果剩余内存不足,将直接返回 null。如果返回 null ,会尝试再创建一个新的ProducerBatch。

代码@2:通过 MemoryRecordsBuilder 将消息写入按照 Kafka 消息格式写入到内存中,即写入到 在创建 ProducerBatch 时申请的 ByteBuffer 中。本文先不详细介绍 Kafka 各个版本的消息格式,后续会专门写一篇文章介绍 Kafka 各个版本的消息格式。

代码@3:更新 ProducerBatch 的 maxRecordSize、lastAppendTime 属性,分别表示该批次中最大的消息长度与最后一次追加消息的时间。

代码@4:构建 FutureRecordMetadata 对象,这里是典型的 Future模式,里面主要包含了该条消息对应的批次的 produceFuture、消息在该批消息的下标,key 的长度、消息体的长度以及当前的系统时间。

代码@5:将 callback 、本条消息的凭证(Future) 加入到该批次的 thunks 中,该集合存储了 一个批次中所有消息的发送回执。

流程执行到这里,KafkaProducer 的 send 方法就执行完毕了,返回给调用方的就是一个 FutureRecordMetadata 对象。

源码的阅读比较枯燥,接下来用一个流程图简单的阐述一下消息追加的关键要素,重点关注一下各个 Future。

2.4 Kafka 消息追加流程图与总结

 

Here Insert Picture Description

 

上面的消息发送,其实用消息追加来表达更加贴切,因为 Kafka 的 send 方法,并不会直接向 broker 发送消息,而是首先先追加到生产者的内存缓存中,其内存存储结构如下:ConcurrentMap< TopicPartition, Deque< ProducerBatch>> batches,那我们自然而然的可以得知,Kafka 的生产者为会每一个 topic 的每一个 分区单独维护一个队列,即 ArrayDeque,内部存放的元素为 ProducerBatch,即代表一个批次,即 Kafka 消息发送是按批发送的。其缓存结果图如下:

Here Insert Picture Description

 

KafkaProducer send method returns the final FutureRecordMetadata, Future is subclass, i.e., Future mode. That kafka of how asynchronous message sending, synchronous send it?

 

After the return value, if the project parties need to use synchronous transmission, just to get the send method returns the results implied in fact, the answer will send method, calling its get () method, this time if the message has not been sent to the Broker this method will be blocked, which is until the wake-up message broker returns the obtained result and the result message. If you need to send an asynchronous, it is recommended to use the send (ProducerRecord <K, V> record, Callback callback), but you can not get method call. Callback will receive a response after the results of broker calls, and supports interceptors

Published 504 original articles · won praise 610 · Views 1.14 million +

Guess you like

Origin blog.csdn.net/asdfsadfasdfsa/article/details/104056910