kafka java客户端消息的分区与缓存发送

当kafka发送消息的时候，在完成消息的序列化之后，如果没有指定消息的分区，将会通过Partitioner来选择该消息发往的分区，在默认情况下，将采用DefaultPartitioner来进行消息的分区选择。

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
    List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
    int numPartitions = partitions.size();
    if (keyBytes == null) {
        int nextValue = nextValue(topic);
        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
        if (availablePartitions.size() > 0) {
            int part = Utils.toPositive(nextValue) % availablePartitions.size();
            return availablePartitions.get(part).partition();
        } else {
            // no partitions are available, give a non-available partition
            return Utils.toPositive(nextValue) % numPartitions;
        }
    } else {
        // hash the keyBytes to choose a partition
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }
}

private int nextValue(String topic) {
    AtomicInteger counter = topicCounterMap.get(topic);
    if (null == counter) {
        counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
        AtomicInteger currentCounter = topicCounterMap.putIfAbsent(topic, counter);
        if (currentCounter != null) {
            counter = currentCounter;
        }
    }
    return counter.getAndIncrement();
}

首先获得发送消息topic的分区数，如果消息定义了key，那么将会根据key的hash来选择具体发送到的分区编号，如果没有，则通过nextValue()方法内部维护了一个AtoimcInteger随着消息的个数顺序而增长而与总分区数取余达到分区轮流存放的目的。

而后，将会把所要发送的消息放入到客户端的缓冲区中等待发送。

在kafka客户端中，一份消息的内存被抽象为一份MemoryRecordsBuilder，用来存放具体的消息信息以及消息被序列化后的实体。而MemoryRecordsBuilder将会被包上一层ProducerBatch，这个ProducerBatch则是一组同样topic和分区的MemoryRecordsBuilder的集合。若干个同一个topic分区的ProducerBatch将会被保存在一个队列中，等待被获取信息被发送。

以上逻辑实现在kafka客户端的RecordAccumulator中。

Deque<ProducerBatch> dq = getOrCreateDeque(tp);
synchronized (dq) {
    if (closed)
        throw new KafkaException("Producer closed while send in progress");
    RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    if (appendResult != null)
        return appendResult;
}

每个topic加分区都会有一个对应的队列存放相应的ProducerBatch，当一条新消息进入时，将会取队列最末端的ProducerBatch加入消息。

MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

如果消息的大小大于ProducerBatch的对应大小，将会重新从缓冲区中申请一分内存，来新建一个ProducerBatch加入到队列的尾部，存放消息的实体等待发送。

所以当客户端需要发送消息的时候，将会从队列的前端开始拉取消息进行发送。

当正式需要发送消息的时候，会直接拉取队列的第一个ProducerBatch，如果其中存储的消息实体大小小于消息体的剩余大小，将会全部加入到要发送的消息中，并移除该ProducerBatch。

tydhot

发布了141 篇原创文章 · 获赞 19 · 访问量 10万+

私信关注

kafka java客户端消息的分区与缓存发送

猜你喜欢