Kafka Source Code Analysis - Sequence 5 - Producer - RecordAccumulator Queue Analysis

In Kafka source code analysis - sequence 2, we mentioned the architecture diagram of the entire Producer client, as shown below:

We have talked about several other components before, and today we will talk about the last component, RecordAccumulator.

Batch is sent

in the previous kafka client , each message is called "Message", and in the Java client, it is called "Record", and because it has the function of sending and accumulating in batches, it is called RecordAccumulator.

One of the biggest features of RecordAccumulator is batch messages, which are thrown into Multiple messages in the queue may form a RecordBatch, which is then sent by Sender at one time.

Below a queue for each TopicPartition

is the internal structure of the RecordAccumulator. It can be seen that each TopicPartition corresponds to a message queue, and only the messages of the same TopicPartition can be batched.
public final class RecordAccumulator {
    private final ConcurrentMap<TopicPartition, Deque<RecordBatch>> batches;

   ...
}

Batch strategy

When will the message be batched, and when will it not? The following looks from the send method of KafkaProducer:
// KafkaProducer
    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        try {
            // first make sure the metadata for the topic is available
            long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);

            ...

            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, serializedKey, serializedValue, callback, remainingWaitMs); //Core function: put the message into the queue

            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
            }
            return result.future;

As you can see from the above code, the batch logic is all in the accumulator.append function:
    public RecordAppendResult append(TopicPartition tp, byte[] key, byte[] value, Callback callback, long maxTimeToBlock) throws InterruptedException {
        appendsInProgress.incrementAndGet();
        try {
            if (closed)
                throw new IllegalStateException("Cannot send after the producer is closed.");
            Deque<RecordBatch> dq = dequeFor(tp); //find the message queue corresponding to the topicPartiton
            synchronized (dq) {
                RecordBatch last = dq.peekLast(); //Take out the last element of the queue
                if (last != null) {  
                    FutureRecordMetadata future = last.tryAppend(key, value, callback, time.milliseconds()); //The last element, that is, the RecordBatch is not empty, add the Record to the RecordBatch
                    if (future != null)
                        return new RecordAppendResult(future, dq.size() > 1 || last.records.isFull(), false);
                }
            }

            int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));
            log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
            ByteBuffer buffer = free.allocate(size, maxTimeToBlock);
            synchronized (dq) {
                // Need to check if producer is closed again after grabbing the dequeue lock.
                if (closed)
                    throw new IllegalStateException("Cannot send after the producer is closed.");
                RecordBatch last = dq.peekLast();
                if (last != null) {
                    FutureRecordMetadata future = last.tryAppend(key, value, callback, time.milliseconds());
                    if (future != null) {
                        // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
                        free.deallocate(buffer);
                        return new RecordAppendResult(future, dq.size() > 1 || last.records.isFull(), false);
                    }
                }

                //There is no RecordBatch in the queue, create a new one, and then put the Record in
                MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
                RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
                FutureRecordMetadata future = Utils.notNull(batch.tryAppend(key, value, callback, time.milliseconds()));

                dq.addLast(batch);
                incomplete.add(batch);
                return new RecordAppendResult(future, dq.size() > 1 || batch.records.isFull(), true);
            }
        } finally {
            appendsInProgress.decrementAndGet();
        }
    }

    private Deque<RecordBatch> dequeFor(TopicPartition tp) {
        Deque<RecordBatch> d = this.batches.get(tp);
        if (d != null)
            return d;
        d = new ArrayDeque<>();
        Deque<RecordBatch> previous = this.batches.putIfAbsent(tp, d);
        if (previous == null)
            return d;
        else
            return previous;
    }

From the above code we can see the strategy of Batch:
1. If it is sent synchronously, the RecordBatch will be empty every time the queue is fetched. At this time, the message will not be batched, and a Record forms a RecordBatch

2. Producer queuing rate < Sender queuing rate && lingerMs = 0 , the message will not be batch

3. Producer entry rate > Sender outgoing rate, messages will be batch

4. lingerMs > 0, at this time Sender will wait until lingerMs > 0 or the queue is full, or exceeds the maximum value of a RecordBatch, it will send. This logic is in the ready function of RecordAccumulator.
    public ReadyCheckResult ready(Cluster cluster, long nowMs) {
        Set<Node> readyNodes = new HashSet<Node>();
        long nextReadyCheckDelayMs = Long.MAX_VALUE;
        boolean unknownLeadersExist = false;

        boolean exhausted = this.free.queued() > 0;
        for (Map.Entry<TopicPartition, Deque<RecordBatch>> entry : this.batches.entrySet()) {
            TopicPartition part = entry.getKey();
            Deque<RecordBatch> deque = entry.getValue();

            Node leader = cluster.leaderFor(part);
            if (leader == null) {
                unknownLeadersExist = true;
            } else if (!readyNodes.contains(leader)) {
                synchronized (deque) {
                    RecordBatch batch = deque.peekFirst();
                    if (batch != null) {
                        boolean backingOff = batch.attempts > 0 && batch.lastAttemptMs + retryBackoffMs > nowMs;
                        long waitedTimeMs = nowMs - batch.lastAttemptMs;
                        long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
                        long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);
                        boolean full = deque.size() > 1 || batch.records.isFull();
                        boolean expired = waitedTimeMs >= timeToWaitMs;
                        boolean sendable = full || expired || exhausted || closed || flushInProgress(); //Key sentence
                        if (sendable && !backingOff) {
                            readyNodes.add(leader);
                        } else {

                            nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
                        }
                    }
                }
            }
        }

        return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeadersExist);
    }

Why Deque?

We saw above that the message queue uses a "double-ended queue" instead of a normal queue.
One end produces and the other end consumes. Isn't it enough to use an ordinary queue? Why do you need "double ends"?

This is actually to deal with the problem of "failure to send, retry": when the message fails to be sent and needs to be resent, the message needs to be put into the head of the queue and resent, which requires the use of a double-ended queue, which is in the head , rather than a tail join.

Of course, even so, the order in which the messages are sent out is inconsistent with the order in which the Producer puts them in.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327059849&siteId=291194637