Kafka生产者流程客户端消息发送过程解析

本文Kafka源码版本:1.0.0.
在这里插入图片描述

一、KafkaProducer介绍

1、KafkaProducer介绍与使用

KafkaProducer是Kafka的客户端、消息的生产者,用来将消息发往Kafka cluster。KafkaProducer是线程安全的,并且单线程使用KafkaProducer实例比多线程性能更高。

使用示例:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++)
 	producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));

producer.close();

2、KafkaProducer属性

   // 生产者id
    private final String clientId;
    // 内部使用的监控模块,负责管理Sensor对象
    final Metrics metrics;
    // 对同一个操作需要有多方面度量, 比如请求平均时间、错误数等
    private final Sensor errors;

    // 分区选择器, 用于消息的路由
    private final Partitioner partitioner;
    // 消息的最大长度,包括消息头+序列化后key+序列化后value的长度
    private final int maxRequestSize;
    // 发送单个消息缓冲区大小
    private final long totalMemorySize;
    // 整个Kafka集群的元数据, 被客户端线程共享
    private final Metadata metadata;
    // 用于收集待发送的数据
    private final RecordAccumulator accumulator;
    // 用于发送消息的任务, 在ioThread线程中执行
    private final Sender sender;
    // 调用Sender消息发送任务
    private final Thread ioThread;
    // 压缩算法:用于消息压缩
    private final CompressionType compressionType;
    // 时间相关的工具类
    private final Time time;
    // key和value的序列化类, 可以在ProducerConfig中自定义序列化类
    private final ExtendedSerializer<K> keySerializer;
    private final ExtendedSerializer<V> valueSerializer;
    // 配置类, 用于初始化KafkaProducer
    private final ProducerConfig producerConfig;
    // Kafka集群metadata更新的最长时长
    private final long maxBlockTimeMs;
    // 消息发送到接收ACK响应的最长时长
    private final int requestTimeoutMs;
    // 拦截器, 用于在消息发送前或者回调时的预处理
    private final ProducerInterceptors<K, V> interceptors;
    // Node版本,Kafka内部使用
    private final ApiVersions apiVersions;
    // 事务管理
    private final TransactionManager transactionManager;

3、KafkaProducer构造器

我们使用最多的KafkaProducer构造器是:

public KafkaProducer(Properties properties) {
    this(new ProducerConfig(properties), null, null);
}

在new ProducerConfig()中会初始化配置:

ProducerConfig(Map<?, ?> props) {
    super(CONFIG, props);
} 

static {
   CONFIG = new ConfigDef().define(BOOTSTRAP_SERVERS_CONFIG, Type.LIST, Importance.HIGH, CommonClientConfigs.BOOTSTRAP_SERVERS_DOC)
                           .define(BUFFER_MEMORY_CONFIG, Type.LONG, 32 * 1024 * 1024L, 	atLeast(0L), Importance.HIGH, BUFFER_MEMORY_DOC)
     ...
 }

然后会将我们在Properties设置的参数覆盖默认值.

KafkaProducer在进行构造器初始化的时候会初始化Partitioner、Serializer、拦截器、元数据、消息收集器RecordAccumulator、NetworkClient、Sender等:

private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
    try {
        Map<String, Object> userProvidedConfigs = config.originals();
        this.producerConfig = config;
        this.time = Time.SYSTEM;
        String clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
        if (clientId.length() <= 0)
            clientId = "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();
        this.clientId = clientId;
        Map<String, String> metricTags = Collections.singletonMap("client-id", clientId);
        MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
                .timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG), TimeUnit.MILLISECONDS)
                .recordLevel(Sensor.RecordingLevel.forName(config.getString(ProducerConfig.METRICS_RECORDING_LEVEL_CONFIG)))
                .tags(metricTags);
        List<MetricsReporter> reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
                MetricsReporter.class);
        reporters.add(new JmxReporter(JMX_PREFIX));
        this.metrics = new Metrics(metricConfig, reporters, time);
        ProducerMetrics metricsRegistry = new ProducerMetrics(this.metrics);
        // 通过反射实例化Partitioner类 (可以自定义Partitioner)
        this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
        long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);

        // 通过反射实例化keySerializer、valueSerializer(可以自定义Serializer)
        if (keySerializer == null) {
            // 实例化keySerializer, extend为我们自定义的Serializer
            this.keySerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                                                                                          Serializer.class));
            // 初始化keySerializer
            this.keySerializer.configure(config.originals(), true);
        } else {
            config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
            this.keySerializer = ensureExtended(keySerializer);
        }
        if (valueSerializer == null) {
            this.valueSerializer = ensureExtended(config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                                                                                       Serializer.class));
            this.valueSerializer.configure(config.originals(), false);
        } else {
            config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
            this.valueSerializer = ensureExtended(valueSerializer);
        }

        // load interceptors and make sure they get clientId
        // 拦截器配置
        userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
        List<ProducerInterceptor<K, V>> interceptorList = (List) (new ProducerConfig(userProvidedConfigs, false)).getConfiguredInstances(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
                ProducerInterceptor.class);
        this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
        // 集群元数据, 并初始化更新
        ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer, valueSerializer, interceptorList, reporters);
        this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG),
                true, true, clusterResourceListeners);
        List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
        this.metadata.update(Cluster.bootstrap(addresses), Collections.<String>emptySet(), time.milliseconds());

        ...

        // 创建RecordAccumulator用于收集消息
        this.accumulator = new RecordAccumulator(logContext,
                config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                this.totalMemorySize,
                this.compressionType,
                config.getLong(ProducerConfig.LINGER_MS_CONFIG),
                retryBackoffMs,
                metrics,
                time,
                apiVersions,
                transactionManager);

        // 创建NetworkClient, 用于网络I/O
        NetworkClient client = new NetworkClient(
                new Selector(config.getLong(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG),
                        this.metrics, time, "producer", channelBuilder, logContext),
                this.metadata,
                clientId,
                maxInflightRequests,
                config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
                config.getLong(ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
                config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
                config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG),
                this.requestTimeoutMs,
                time,
                true,
                apiVersions,
                throttleTimeSensor,
                logContext);
        
        // 用于消息发送的任务和线程
        this.sender = new Sender(logContext,
                client,
                this.metadata,
                this.accumulator,
                maxInflightRequests == 1,
                config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
                acks,
                retries,
                metricsRegistry.senderMetrics,
                Time.SYSTEM,
                this.requestTimeoutMs,
                config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG),
                this.transactionManager,
                apiVersions);
        String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
        this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
        this.ioThread.start();
        
    } catch (Throwable t) {
        throw new KafkaException("Failed to construct kafka producer", t);
    }
}

二、KafkaProducer消息发送流程

Kafka消息发送的整体流程如下:
在这里插入图片描述
Kafka消息发送的步骤:

  1. 初始化:Kafka初始化,加载默认配置以及设置的配置参数,开启网络线程;
  2. 执行拦截器逻辑,预处理消息(如果未实现拦截器,就跳过此步骤);
  3. 获取集群元数据metadata;
  4. 调用Serializer.serialize()方法序列化消息的key/value;
  5. 调用partition()选择合适的分区策略,为消息进行分区;
  6. 将消息缓存到RecordAccumulator中;
  7. 唤醒Sender线程,将待发送的数据按 【Broker Id <=> List】的数据进行归类
  8. 与服务端不同的Broker建立网络连接,将对应Broker待发送的消息List发送出去。

三、KafkaProducer 常见概念

在KafkaProducer发送消息过程中,有一些常见类需要讲解,以便更好理解源码的实现过程。

1. Cluster

Kafka Cluster保存节点、topic、分区信息,比如当前 Kafka 集群共有多少主题、多少 Broker 等:

  • broker.id 与 node对应关系
  • topic与Partition对应关系
  • node与Partition对应关系
/**
 * A representation of a subset of the nodes, topics, and partitions in the Kafka cluster.
 */
public final class Cluster {

    private final boolean isBootstrapConfigured;
    // 集群中的node 节点id
    private final List<Node> nodes;
    // 未认证的topic列表
    private final Set<String> unauthorizedTopics;
		// 内置的 topic 列表
    private final Set<String> internalTopics;
   // 
    private final Node controller;
   // partition详细信息
    private final Map<TopicPartition, PartitionInfo> partitionsByTopicPartition;
   // topic与Partition关系
    private final Map<String, List<PartitionInfo>> partitionsByTopic;
    // 可用topic于Partition关系
    private final Map<String, List<PartitionInfo>> availablePartitionsByTopic;
    // node与Partition关系
    private final Map<Integer, List<PartitionInfo>> partitionsByNode;
    // node与id关系
    private final Map<Integer, Node> nodesById;
    private final ClusterResource clusterResource;
}

2. Metadata

Metadata保存了所有topic相关的部分数据,会被所有client线程和后台sender线程共享。当请求一个topic的metadata不存在时,会触发metadata的更新过程。
Metadata的主要数据结构为:Cluster对象和其他熟悉,Cluster记录了topic与集群的信息,具体参数如下。

public final class Metadata {

    private static final Logger log = LoggerFactory.getLogger(Metadata.class);

    public static final long TOPIC_EXPIRY_MS = 5 * 60 * 1000;
    private static final long TOPIC_EXPIRY_NEEDS_UPDATE = -1L;
    // metadata更新失败时再次更新的最小时间间隔,避免频繁更新
    private final long refreshBackoffMs;
    // metadata过期时间
    private final long metadataExpireMs;
    // metadata版本号,每更新一次就自增加1,用于判断metadat是否更新
    private int version;
    // metadata最近一次更新的时间(包含失败的情况)
    private long lastRefreshMs;
    // metadata最近一次成功更新的时间
    private long lastSuccessfulRefreshMs;
    // SASL authentication相关, 如果出现该错误,会停止更新
    private AuthenticationException authenticationException;
    // topic与集群相关的信息
    private Cluster cluster;
    // 是否需要更新metadat
    private boolean needUpdate;
    /* Topics with expiry time */
    private final Map<String, Long> topics;
    // metadata更新时监听对象
    private final List<Listener> listeners;
    // 接收metadata更新的请求
    private final ClusterResourceListeners clusterResourceListeners;
    // 是否强制更新所有topics对应的metadata数据
    private boolean needMetadataForAllTopics;
    // 如果为true,当更新metadata的时候topic不存在, broker会自动创建
    private final boolean allowAutoTopicCreation;
   // 构造时默认为true, 代表producer会定时移除过期的topic,consumer不会
    private final boolean topicExpiryEnabled;
}

3、Metrics

Metrics是用来统计某些指标的度量。Sensor能够统计某段时间内关联的多个Metric的指标功能,如平均值、最大值、最小值。·

public class Metrics implements Closeable {

    private final MetricConfig config;
    private final ConcurrentMap<MetricName, KafkaMetric> metrics;
    private final ConcurrentMap<String, Sensor> sensors;
    private final ConcurrentMap<Sensor, List<Sensor>> childrenSensors;
    private final List<MetricsReporter> reporters;
    private final Time time;
    ...
}  

4、RecordAccumulator

RecordAccumulator可以理解为固定大小的队列,用来将消息内容保存到内容当中。

public final class RecordAccumulator {

    private final Logger log;
    private volatile boolean closed;
    private final AtomicInteger flushesInProgress;
    private final AtomicInteger appendsInProgress;
    private final int batchSize;
    private final CompressionType compression;
    private final long lingerMs;
    private final long retryBackoffMs;
    private final BufferPool free;
    private final Time time;
    private final ApiVersions apiVersions;
    private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;
    private final IncompleteBatches incomplete;
    // The following variables are only accessed by the sender thread, so we don't need to protect them.
    private final Set<TopicPartition> muted;
    private int drainIndex;
    private final TransactionManager transactionManager;
}

5、Sender
封装消息发送的处理逻辑。

————————————————
版权声明:本文为CSDN博主「是Guava不是瓜娃」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/noaman_wgs/article/details/105646288

猜你喜欢

转载自blog.csdn.net/weixin_43956062/article/details/106784514