文章目录

早就开始计划写 Kafka 源码分析的文章，但却一直迟迟没有动手，直到看到一位同事的博客编程小梦，彻底受到了打击，这位同事是去年本科毕业，年龄算起来应该比我小两岁，但是非常厉害，在刚工作半年的时候就成为了 Apache Kylin 的 commiter，看到身边同事这么优秀，而且还这么努力（编程小梦-我的书单），自己实在没有理由不努力了，因此，在 github 上给自己提了一个 issue Kafka 源码分析系列，希望自己能够在未来半年里，至少每两周输出一篇 Kafka 源码分析的文章，本文是这个系列的第一篇 —— Producer 的发送模型（以 Kafka 0.10.2 为例）。

前言

Kafka，作为目前在大数据领域应用最为广泛的消息队列，其内部实现和设计有很多值得深入研究和分析的地方。

再 0.10.2 的 Kafka 中，其 Client 端是由 Java 实现，Server 端是由 Scala 来实现的，在使用 Kafka 时，Client 是用户最先接触到部分，因此，计划写的源码分析也会从 Client 端开始，会先从 Producer 端开始，今天讲的是 Producer 端的发送模型的实现。

Producer 使用

在分析 Producer 发送模型之前，先看一下用户是如何使用 Producer 向 Kafka 写数据的，下面是一个关于 Producer 最简单的应用示例。

import org.apache.kafka.clients.producer.KafkaProducer;

import org.apache.kafka.clients.producer.ProducerRecord;

import org.apache.kafka.clients.producer.Producer;

import java.util.Properties;

/**

* Created by matt on 16/7/26.

public class ProducerTest {

private static String topicName;

private static int msgNum;

private static int key;

public static void main(String[] args) {

Properties props = new Properties();

props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.2:9092");

props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

topicName = "test";

msgNum = 10; // 发送的消息数

Producer<String, String> producer = new KafkaProducer<>(props);

for (int i = 0; i < msgNum; i++) {

String msg = i + " This is matt's blog.";

producer.send(new ProducerRecord<String, String>(topicName, msg));

}

producer.close();

}

从上面的代码可以看出 Kafka 为用户提供了非常简单的 API，在使用时，只需要如下两步：

初始化 KafkaProducer 实例；
调用 send 接口发送数据。

本文主要是围绕着 Producer 在内部是如何实现 send 接口而展开的。

Producer 数据发送流程

下面通过对 send 源码分析来一步步剖析 Producer 数据的发送流程。

Producer 的 send 实现

用户是直接使用 producer.send() 发送的数据，先看一下 send() 接口的实现

// 异步向一个 topic 发送数据

@Override

public Future<RecordMetadata> send(ProducerRecord<K, V> record) {

return send(record, null);

}

// 向 topic 异步地发送数据，当发送确认后唤起回调函数

@Override

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {

// intercept the record, which can be potentially modified; this method does not throw exceptions

ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);

return doSend(interceptedRecord, callback);

}

数据发送的最终实现还是调用了 Producer 的 doSend() 接口。

Producer 的 doSend 实现

下面是 doSend() 的具体实现

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {

TopicPartition tp = null;

try {

// 1.确认数据要发送到的 topic 的 metadata 是可用的

ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);

long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);

Cluster cluster = clusterAndWaitTime.cluster;

// 2.序列化 record 的 key 和 value

byte[] serializedKey;

try {

serializedKey = keySerializer.serialize(record.topic(), record.key());

} catch (ClassCastException cce) {

throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +

" to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +

" specified in key.serializer");

}

byte[] serializedValue;

try {

serializedValue = valueSerializer.serialize(record.topic(), record.value());

} catch (ClassCastException cce) {

throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +

" to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +

" specified in value.serializer");

}

// 3. 获取该 record 的 partition 的值（可以指定,也可以根据算法计算）

int partition = partition(record, serializedKey, serializedValue, cluster);

int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);

ensureValidRecordSize(serializedSize); // record 的字节超出限制或大于内存限制时,就会抛出 RecordTooLargeException 异常

tp = new TopicPartition(record.topic(), partition);

long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp(); // 时间戳

log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);

Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);

// 4. 向 accumulator 中追加数据

RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);

// 5. 如果 batch 已经满了,唤醒 sender 线程发送数据

if (result.batchIsFull || result.newBatchCreated) {

log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);

this.sender.wakeup();

}

return result.future;

} catch (ApiException e) {

log.debug("Exception occurred during message send:", e);

if (callback != null)

callback.onCompletion(null, e);

this.errors.record();