一序

上一篇在介绍producer的核心方法dosend ，send除了拦截器外，第一步就是要获取集群信息，

clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);

因为dosend本身流程较长，所以本篇主要围绕kafka的集群元数据metadata来展开。看它的作用是什么？有哪些属性？核心方法及如何实现的更新流程。《kafka源码剖析》书上是分为2.2.2 元数据metadata、2.4.4 metadataUpdater 来讲的。

二元数据作用、属性

之前在介绍kafka 原理的时候提到过，每个topic都有多个分区，分区的副本可以分布在集群不同的broker上，从生产者角度来看这种分区的数量及副本的分布是动态变化的。而且producer在发送消息时，通常只指定了topic的名称，没有明确指定分区的编号。为了满足设计需求，producer需要知道目标分区的leader副本所在的服务器地址、端口等信息，才能建立连接，讲消息发送到kafka。因此，在kafka的producer维护了集群的元数据metadata。包括了每个topic的所有partition的信息: leader, leader_epoch, controller_epoch, isr, replicas等;

相关的类：

Node; org.apache.kafka.common 表示集群的一个节点，属性有。

/**
 * Information about a Kafka node 集群节点
 */
public class Node {    
    private final int id;
    private final String idString;
    private final String host;
    private final int port;
    private final String rack;//机架信息

TopicPartition ：表示某个topic的一个分区。

/**
 * A topic name and partition number
 * topic 名称跟 分区编号
 */
public final class TopicPartition implements Serializable {
    private static final long serialVersionUID = -613627415771699627L;

    private int hash = 0;
    private final int partition;//此分区在topic中的分区编号
    private final String topic;//topic 名称

PartitionInfo ：分区的详细信息

public class PartitionInfo {
    private final String topic;
    private final int partition;
    //leader 副本所在的节点
    private final Node leader;
    //全部副本所在的节点
    private final Node[] replicas;
    //ISR集合中所有副本所在的节点信息
    private final Node[] inSyncReplicas;
    //下线的副本所在的节点
    private final Node[] offlineReplicas;

通过这三个类的组合，可以完整表示出kafka producer需要的集群元数据，保存在org.apache.kafka.common.Cluster 这个类中。

/**
 * A representation of a subset of the nodes, topics, and partitions in the Kafka cluster.
 *  属性字段都是用private final修饰的。只提供了查询方法。保证了对象不可变，是线程安全的。
 */
public final class Cluster {
   
    private final boolean isBootstrapConfigured;
    private final List<Node> nodes;
    private final Set<String> unauthorizedTopics;
    private final Set<String> invalidTopics;
    private final Set<String> internalTopics;
    private final Node controller;
    //map ,记录TopicPartition，PartitionInfo 映射关系
    private final Map<TopicPartition, PartitionInfo> partitionsByTopicPartition;
    //map 记录Topic 名称，与PartitionInfo的映射关系
    private final Map<String, List<PartitionInfo>> partitionsByTopic;
    //map 记录Topic 名称，与PartitionInfo的映射关系(有leader副本的)
    private final Map<String, List<PartitionInfo>> availablePartitionsByTopic;
    //map 记录了Node id与PartitionInfo 的映射关系。按照node节点id查询
    private final Map<Integer, List<PartitionInfo>> partitionsByNode;
    private final Map<Integer, Node> nodesById;
    private final ClusterResource clusterResource;

cluster 主要提供了不同的查询接口。方便集群元数据的查询。注意它是线程安全的。

Metadata 封装了cluster对象、listener监听器。并保存cluster数据的最后更新日期、版本号、是否需要更新等待信息。

字段如下：

public class Metadata implements Closeable {

    private static final Logger log = LoggerFactory.getLogger(Metadata.class);

    public static final long TOPIC_EXPIRY_MS = 5 * 60 * 1000;
    private static final long TOPIC_EXPIRY_NEEDS_UPDATE = -1L;
    //两次更新元数据的最小时间差（防止操作过于频繁）
    private final long refreshBackoffMs;
    //更新间隔。默认5分钟
    private final long metadataExpireMs;
    //版本号，每次更新+1.判断是否更新完成。
    private int version;
    //上一次更新的元数据的时间戳（包含失败）
    private long lastRefreshMs;
    //上一次成功更新的元数据的时间戳
    private long lastSuccessfulRefreshMs;
    //认证失败
    private AuthenticationException authenticationException;
    //记录kafka集群元数据
    private Cluster cluster;
    //标记是否强制更新cluster
    private boolean needUpdate;
    /* Topics with expiry time （topic待超时时间） */
    private final Map<String, Long> topics;
    //监听metadata 更新的监听器集合
    private final List<Listener> listeners;
    //当接收到 metadata 更新时, ClusterResourceListeners的列表
    private final ClusterResourceListeners clusterResourceListeners;
    // 是否强制更新所有的 metadata
    private boolean needMetadataForAllTopics;
    //是否需要在主题不存在的时候创建
    private final boolean allowAutoTopicCreation;
    // 默认为 true, Producer 会定时移除过期的 topic
    private final boolean topicExpiryEnabled;
    //是否关闭
    private boolean isClosed;

看下metadata呗主线程调用的方法。requestUpdate()、awaitUpdate().

    /**
     * Request an update of the current cluster metadata info, return the current version before the update
     */
    public synchronized int requestUpdate() {
    	//true,表示需要强制更新cluster
        this.needUpdate = true;
        //返回当前kafka集群元数据的版本号
        return this.version;
    }

/**
     * Wait for metadata update until the current version is larger than the last version we know of
     */
    public synchronized void awaitUpdate(final int lastVersion, final long maxWaitMs) throws InterruptedException {
        if (maxWaitMs < 0)
            throw new IllegalArgumentException("Max time to wait for metadata updates should not be < 0 milliseconds");

        long begin = System.currentTimeMillis();
        long remainingWaitMs = maxWaitMs;
        //比较版本号，直到 metadata 更新成功,version 自增，还有判断没有关闭
        while ((this.version <= lastVersion) && !isClosed()) {
        	//获取更新期间不可重复的认证错误
            AuthenticationException ex = getAndClearAuthenticationException();
            if (ex != null)
                throw ex;
            if (remainingWaitMs != 0)
            	//通过wait可以看出，主线程与sender通过wait/noitfy同步，更新元数据的操作交给sender线程去完成
                wait(remainingWaitMs);
            long elapsed = System.currentTimeMillis() - begin;
            if (elapsed >= maxWaitMs)//timeout
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            remainingWaitMs = maxWaitMs - elapsed;
        }
        if (isClosed())
            throw new KafkaException("Requested metadata update after close");
    }

requestUpdate() 主要修改needUpdate标识强制更新，这样当sender线程运行时会更新metadata。

awaitUpdate()是通过比较版本号的方式，控制数据一致性。类似乐观锁的方式。 Sender线程在更新成功元数据之前，会一直阻塞主线程。这里需要注意，metadata中的字段可以有主线程读、sender线程更新，也是通过wait/notify同步机制做到。它加上synchronized 是线程安全的。

当然这里只是看client工程的producer，实际上core工程broker也有维护MetadataCache，通过KafkaApis来获取和更新metadata信息，这里还没看，本篇不展开。待后续整理。

三更新请求流程

我们回到本篇一开始的dosend方法调用。源码在producer。

3.1 发送请求

    /**
     * Wait for cluster metadata including partitions for the given topic to be available.
     * @param topic The topic we want metadata for
     * @param partition A specific partition expected to exist in metadata, or null if there's no preference
     * @param maxWaitMs The maximum time in ms for waiting on the metadata
     * @return The cluster containing topic metadata and the amount of time we waited in ms
     * @throws KafkaException for all Kafka-related exceptions, including the case where this method is called after producer close
     */
    private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long maxWaitMs) throws InterruptedException {
        // add topic to metadata topic list if it is not there already and reset expiry
        Cluster cluster = metadata.fetch();
        //校验是否topic是否是无效的，无效抛出异常
        if (cluster.invalidTopics().contains(topic))
            throw new InvalidTopicException(topic);
        // 如果元数据不存在这个topic，则添加到元数据的topic集合中
        metadata.add(topic);
        //集群根据topic获取partitions数量
        Integer partitionsCount = cluster.partitionCountForTopic(topic);
        // Return cached metadata if we have it, and if the record's partition is either undefined
        // or within the known partition range（根据cluster缓存的metadata, 直接返回ClusterAndWaitTime）
        if (partitionsCount != null && (partition == null || partition < partitionsCount))
            return new ClusterAndWaitTime(cluster, 0);

        long begin = time.milliseconds();
        // 最大的等待时间
        long remainingWaitMs = maxWaitMs;
        long elapsed;
        // Issue metadata requests until we have metadata for the topic or maxWaitTimeMs is exceeded.
        // In case we already have cached metadata for the topic, but the requested partition is greater
        // than expected, issue an update request only once. This is necessary in case the metadata
        // is stale and the number of partitions for this topic has increased in the meantime.
        do {
            log.trace("Requesting metadata update for topic {}.", topic);
            metadata.add(topic);
            //请求更新metadata,在更新之前返回当前版本
            int version = metadata.requestUpdate();
            // 唤醒sender线程
            sender.wakeup();
            try {// 等待元数据更新，直到当前版本大于我们所知道的最新版本
                metadata.awaitUpdate(version, remainingWaitMs);
            } catch (TimeoutException ex) {
                // Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            }
            // metadata更新完了在获取一次集群信息
            cluster = metadata.fetch();
            //计算耗时
            elapsed = time.milliseconds() - begin;
            // 如果时间超过最大等待时间，抛出更新元数据失败异常
            if (elapsed >= maxWaitMs)
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            // 如果集群未授权topics包含这个topic，也会抛出异常
            if (cluster.unauthorizedTopics().contains(topic))
                throw new TopicAuthorizationException(topic);
            // 如果集群无效topics包含这个topic，也会抛出异常 
            if (cluster.invalidTopics().contains(topic))
                throw new InvalidTopicException(topic);
            remainingWaitMs = maxWaitMs - elapsed;
            // 获取该topic的partition数量
            partitionsCount = cluster.partitionCountForTopic(topic);
        } while (partitionsCount == null);// 直到topic的partition数量不为空就不在执行

        if (partition != null && partition >= partitionsCount) {
            throw new KafkaException(
                    String.format("Invalid partition given with record: %d is not in the range [0...%d).", partition, partitionsCount));
        }
        // 返回ClusterAndWaitTime
        return new ClusterAndWaitTime(cluster, elapsed);
    }

主要步骤是：

1)检测metadata中是否包含了指定topic的元数据，若不包含，则将topic添加到topics的集合中，下次更新时会从服务端获取指定的topic的元数据。

2）尝试获取topic中分区的详细信息，失败那么就请求更新 metadata，如果 metadata 没有更新的话，方法就一直处在 do ... while 的循环之中。若超时抛出异常。

在循环中：

metadata.requestUpdate() 将 metadata 的 needUpdate 变量设置为 true（强制更新），并返回当前的版本号（version），通过版本号来判断 metadata 是否完成更新；
sender.wakeup() 唤醒 sender 线程，sender 线程又会去唤醒 NetworkClient 线程，去更新metadata保存的元数据。
metadata.awaitUpdate(version, remainingWaitMs) 等待sender线程完成 metadata 的更新。

在 Metadata.awaitUpdate() 方法中，线程通过wait会阻塞在 while 循环中，直到 metadata 更新成功或者 timeout。

3.2 更新metadata

那么 metadata 是如何更新的呢？前面说过主要是通过 sender.wakeup()来唤醒 sender 线程，间接唤醒 NetworkClient 线程。在看下sender的源码，它的run方法就是调用KafkaClient.poll的方法。具体实现就是KafkaClient的子类NetworkClient。源码如下

 public List<ClientResponse> poll(long timeout, long now) {
        if (!abortedSends.isEmpty()) {
            // If there are aborted sends because of unsupported version exceptions or disconnects,
            // handle them immediately without waiting for Selector#poll.
            List<ClientResponse> responses = new ArrayList<>();
            handleAbortedSends(responses);
            completeResponses(responses);
            return responses;
        }
        // 判断是否需要更新 meta
        long metadataTimeout = metadataUpdater.maybeUpdate(now);
        try {//调用selector的poll方法
            this.selector.poll(Utils.min(timeout, metadataTimeout, defaultRequestTimeoutMs));
        } catch (IOException e) {
            log.error("Unexpected error during I/O", e);
        }

        // process completed actions
        long updatedNow = this.time.milliseconds();
        List<ClientResponse> responses = new ArrayList<>();
        handleCompletedSends(responses, updatedNow);
        //处理任何已经完成的接收响应
        handleCompletedReceives(responses, updatedNow);
        handleDisconnections(responses, updatedNow);
        handleConnections();
        handleInitiateApiVersionRequests(updatedNow);
        handleTimedOutRequests(responses, updatedNow);
        //invoke callback
        completeResponses(responses);

        return responses;
    }

这个方法主要步骤：

metadataUpdater.maybeUpdate(now)：判断是否需要更新 Metadata，如果需要更新的话，先与 Broker 建立连接，然后发送更新 metadata 的请求；
处理 Server 端的一些响应，这里主要关注是 handleCompletedReceives(responses, updatedNow) 方法，它会处理 Server 端返回的 Metadata 结果。

3.3 metadataUpdater

在介绍具体的metadataUpdater.maybeUpdate()方法之前，先说下 metadataUpdater是干啥的。

/**
 * The interface used by `NetworkClient` to request cluster metadata info to be updated and to retrieve the cluster nodes
 * from such metadata. This is an internal class.
 * <p> 接口：有NetworkClient 调用，对应的实现类：DefaultMetadataUpdater（NetworkClient）、ManualMetadataUpdater(空实现)
 * This class is not thread-safe!
 */
public interface MetadataUpdater extends Closeable {

    List<Node> fetchNodes();
  
    boolean isUpdateDue(long now);

    long maybeUpdate(long now);
...
}

MetadataUpdater 是个接口，用来辅助NetworkClient更新metadata的接口。它有两个实现类：ManualMetadataUpdater(空实现)、DefaultMetadataUpdater（NetworkClient的默认实现）。

    class DefaultMetadataUpdater implements MetadataUpdater {

        /* the current cluster metadata 机器元数据的metadata */
        private final Metadata metadata;

        /* true iff there is a metadata request that has been sent and for which we have not yet received a response */
        //标识是否已经发送了 metadata request变更metadata，还没收到响应
        private boolean metadataFetchInProgress;

现在来看下maybeUpdate方法：

        @Override
        //核心方法，判断当前的metadata是否需要更新
        public long maybeUpdate(long now) {
            // should we update our metadata? metadata是否需要更新
        	// metadata 下次更新的时间（需要判断是强制更新还是 metadata 过期更新,前者是立即更新,后者是计算 metadata 的过期时间）
            long timeToNextMetadataUpdate = metadata.timeToNextUpdate(now);
            // 是否发送了metadataRequest请求 ,那么时间设置为 waitForMetadataFetch（默认30s）
            long waitForMetadataFetch = this.metadataFetchInProgress ? defaultRequestTimeoutMs : 0;
            //计算当前距离下次可以发送metadataRequest请求的时间差
            long metadataTimeout = Math.max(timeToNextMetadataUpdate, waitForMetadataFetch);

            if (metadataTimeout > 0) {// 时间未到,直接返回下次应该更新的时间
                return metadataTimeout;
            }

            // Beware that the behavior of this method and the computation of timeouts for poll() are
            // highly dependent on the behavior of leastLoadedNode.
            // 找到负载最小的node,若没有可用的node,则返回null
            Node node = leastLoadedNode(now);
            if (node == null) {
                log.debug("Give up sending metadata request since no node is available");
                return reconnectBackoffMs;
            }
            //创建并缓存metadataRequest，等待下次poll()方法才会真正发送
            return maybeUpdate(now, node);
        }

        /**
         * Add a metadata request to the list of sends if we can make one
         */
        private long maybeUpdate(long now, Node node) {
            String nodeConnectionId = node.idString();
            //检测是否允许向此node发送请求
            if (canSendRequest(nodeConnectionId, now)) {
            	// 准备开始发送数据,将 metadataFetchInProgress 置为 true
                this.metadataFetchInProgress = true;
                MetadataRequest.Builder metadataRequest; // 创建 metadata 请求
                if (metadata.needMetadataForAllTopics())// 强制更新所有 topic 的 metadata
                    metadataRequest = MetadataRequest.Builder.allTopics();
                else//只更新 metadata 中的 topics 列表（列表中的 topics 由 metadata.add() 得到）
                    metadataRequest = new MetadataRequest.Builder(new ArrayList<>(metadata.topics()),
                            metadata.allowAutoTopicCreation());


                log.debug("Sending metadata request {} to node {}", metadataRequest, node);
                //发送 metadata Request
                sendInternalMetadataRequest(metadataRequest, nodeConnectionId, now);
                return defaultRequestTimeoutMs;
            }

            // If there's any connection establishment underway, wait until it completes. This prevents
            // the client from unnecessarily connecting to additional nodes while a previous connection
            // attempt has not been completed.
            if (isAnyNodeConnecting()) {// 如果 client 正在与任何一个 node 的连接状态是 connecting,那么就进行等待
                // Strictly the timeout we should return here is "connect timeout", but as we don't
                // have such application level configuration, using reconnect backoff instead.
                return reconnectBackoffMs;
            }
             // 如果没有连接这个 node,那就初始化连接
            if (connectionStates.canConnect(nodeConnectionId, now)) {
                // we don't have a connection to this node right now, make one
                log.debug("Initialize connection to node {} for sending metadata request", node);
                initiateConnect(node, now);
                return reconnectBackoffMs;
            }

            // connected, but can't send more OR connecting
            // In either case, we just need to wait for a network event to let us know the selected
            // connection might be usable again.
            return Long.MAX_VALUE;
        }

如果需要更新，则发送MetadataRequest请求，发送请求之前，需要将metadataFetchInProgress设置为true.然后选择负载最小的node节点，向它发送更新请求。这里负载的大小判断是通过InFlightRequests队列中未确认的请求决定的，未确认的请求越多则认为负载越大。剩余步骤与普通的请求一样。

这里关注下更新条件，代码里有几种情况：

如果 node 可以发送请求，则直接发送请求；
如果该 node 正在建立连接，则直接返回；
如果该 node 还没建立连接，则向 broker 初始化链接。

所以

sender 线程第一次调用 poll() 方法时，初始化与 node 的连接；
sender 线程第二次调用 poll() 方法时，发送 Metadata 请求；
sender 线程第三次调用 poll() 方法时，获取 metadataResponse，并更新 metadata。

经过上述 sender 线程三次调用 poll()方法，所请求的 metadata 信息才会得到更新，此时 Producer 线程也不会再阻塞，开始发送消息。

NetworkClient 接收到 MetadataResponse之后，会先调用handleCompletedReceives，方法如下：

    private void handleCompletedReceives(List<ClientResponse> responses, long now) {
        for (NetworkReceive receive : this.selector.completedReceives()) {//遍历已完成
            String source = receive.source();
            //从缓存队列获取已发送请求
            InFlightRequest req = inFlightRequests.completeNext(source);
            //解析返回结果
            Struct responseStruct = parseStructMaybeUpdateThrottleTimeMetrics(receive.payload(), req.header,
                throttleTimeSensor, now);
            if (log.isTraceEnabled()) {
                log.trace("Completed receive from node {} for {} with correlation id {}, received {}", req.destination,
                    req.header.apiKey(), req.header.correlationId(), responseStruct);
            }
            // If the received response includes a throttle delay, throttle the connection.
            AbstractResponse body = AbstractResponse.parseResponse(req.header.apiKey(), responseStruct);
            maybeThrottle(body, req.header.apiVersion(), req.destination, now);
            //判断是否为MetadataResponse
            if (req.isInternalRequest && body instanceof MetadataResponse)
                metadataUpdater.handleCompletedMetadataResponse(req.header, now, (MetadataResponse) body);
            else if (req.isInternalRequest && body instanceof ApiVersionsResponse)//ApiVersionsResponse
                handleApiVersionsResponse(responses, req, now, (ApiVersionsResponse) body);
            else //其他响应
                responses.add(req.completed(body, now));
        }
    }

 @Override
         // 处理 Server 端对 Metadata 请求处理后的 response
        public void handleCompletedMetadataResponse(RequestHeader requestHeader, long now, MetadataResponse response) {
            this.metadataFetchInProgress = false;//标识
            //创建Cluster对象
            Cluster cluster = response.cluster();

            // If any partition has leader with missing listeners, log a few for diagnosing（诊断） broker configuration
            // issues. This could be a transient（临时的） issue if listeners were added dynamically to brokers.
            List<TopicPartition> missingListenerPartitions = response.topicMetadata().stream().flatMap(topicMetadata ->
                topicMetadata.partitionMetadata().stream()
                    .filter(partitionMetadata -> partitionMetadata.error() == Errors.LISTENER_NOT_FOUND)
                    .map(partitionMetadata -> new TopicPartition(topicMetadata.topic(), partitionMetadata.partition())))
                .collect(Collectors.toList());
            if (!missingListenerPartitions.isEmpty()) {
                int count = missingListenerPartitions.size();
                log.warn("{} partitions have leader brokers without a matching listener, including {}",
                        count, missingListenerPartitions.subList(0, Math.min(10, count)));
            }

            // check if any topics metadata failed to get updated 获取错误
            Map<String, Errors> errors = response.errors();
            if (!errors.isEmpty())
                log.warn("Error while fetching metadata with correlation id {} : {}", requestHeader.correlationId(), errors);

            // don't update the cluster if there are no valid nodes...the topic we want may still be in the process of being
            // created which means we will get errors and no nodes until it exists
            if (cluster.nodes().size() > 0) {//更新metadata
                this.metadata.update(cluster, response.unavailableTopics(), now);
            } else {//更新metadata 失败，只更新lastRefreshMs
                log.trace("Ignoring empty metadata response with correlation id {}.", requestHeader.correlationId());
                this.metadata.failedUpdate(now, null);
            }
        }

这里就是真正的更新metadata了。

    /**
     * Updates the cluster metadata. If topic expiry is enabled, expiry time
     * is set for topics if required and expired topics are removed from the metadata.
     *
     * @param newCluster the cluster containing metadata for topics with valid metadata
     * @param unavailableTopics topics which are non-existent or have one or more partitions whose
     *        leader is not known
     * @param now current time in milliseconds
     */
    public synchronized void update(Cluster newCluster, Set<String> unavailableTopics, long now) {
        Objects.requireNonNull(newCluster, "cluster should not be null");
        if (isClosed())
            throw new IllegalStateException("Update requested after metadata close");

        this.needUpdate = false;
        this.lastRefreshMs = now;
        this.lastSuccessfulRefreshMs = now;
        this.version += 1;

        if (topicExpiryEnabled) {
            // Handle expiry of topics from the metadata refresh set.
            for (Iterator<Map.Entry<String, Long>> it = topics.entrySet().iterator(); it.hasNext(); ) {
                Map.Entry<String, Long> entry = it.next();
                long expireMs = entry.getValue();
                if (expireMs == TOPIC_EXPIRY_NEEDS_UPDATE)
                    entry.setValue(now + TOPIC_EXPIRY_MS);
                else if (expireMs <= now) {
                    it.remove();
                    log.debug("Removing unused topic {} from the metadata list, expiryMs {} now {}", entry.getKey(), expireMs, now);
                }
            }
        }

        for (Listener listener: listeners)//如果有人监听了metadata的更新，通知他们  
            listener.onMetadataUpdate(newCluster, unavailableTopics);

        String previousClusterId = cluster.clusterResource().clusterId();

        if (this.needMetadataForAllTopics) {
            // the listener may change the interested topics, which could cause another metadata refresh.
            // If we have already fetched all topics, however, another fetch should be unnecessary.
            this.needUpdate = false;
            this.cluster = getClusterForCurrentTopics(newCluster);
        } else {
            this.cluster = newCluster;
        }

        // The bootstrap cluster is guaranteed not to have any useful information
        if (!newCluster.isBootstrapConfigured()) {
            String newClusterId = newCluster.clusterResource().clusterId();
            if (newClusterId == null ? previousClusterId != null : !newClusterId.equals(previousClusterId))
                log.info("Cluster ID: {}", newClusterId);
            clusterResourceListeners.onUpdate(newCluster.clusterResource());
        }

        notifyAll(); //通知所有的阻塞的producer线程 
        log.debug("Updated cluster metadata version {} to {}", this.version, this.cluster);
    }

前面说了producerwait等待metadata更新，那么这里更新完就会notify。

四 Metadata 的更新策略

Metadata 会在下面两种情况下进行更新

KafkaProducer 第一次发送消息时强制更新，其他时间周期性更新，它会通过 Metadata 的 lastRefreshMs, lastSuccessfulRefreshMs 这2个字段来实现；
失效强制更新：调用 Metadata.requestUpdate() 将 needUpdate 置成了 true 来强制更新。

在 NetworkClient 的 poll() 方法调用时，就会去检查这两种更新机制，只要达到其中一种，就行触发更新操作。

那如何判定Metadata失效了呢？这个有很多地方，会判定Metadata失效。通常认为异常了就去强制更新。

initConnect 方法调用时，初始化连接；
poll() 方法中对 handleDisconnections() 方法调用来处理连接断开的情况，这时会触发强制更新；
poll() 方法中对 handleTimedOutRequests() 来处理请求超时时；
发送消息时，如果无法找到 partition 的 leader；
处理 Producer 响应（handleProduceResponse），如果返回关于 Metadata 过期的异常，比如：没有 topic-partition 的相关 meta 或者 client 没有权限获取其 metadata。

参考：

《Apache kafka源码剖析》第二章
https://www.jianshu.com/p/bb7c332eac25

kafka producer 学习笔记2-集群元数据metadata更新

一序

二元数据作用、属性

三更新请求流程

3.1 发送请求

3.2 更新metadata

3.3 metadataUpdater

四 Metadata 的更新策略

猜你喜欢

kafka producer 学习笔记2-集群元数据metadata更新

一 序

二 元数据作用、属性

三 更新请求流程

3.1 发送请求

3.2 更新metadata

3.3 metadataUpdater

四 Metadata 的更新策略

猜你喜欢

一序

二元数据作用、属性

三更新请求流程