Follow me to learn the detailed explanation of Kafka's Controller controller (1)

Our kafka source code sharing has been carried out for many periods, and the main content is almost shared, so in the future sharing, we will mainly focus on kafka performance optimization and use

One of the Brokers in the Kafka cluster will be elected as the Controller, which is mainly responsible for Partition management and replica state management, and also performs management tasks such as redistributing Partitions. If the current Controller fails, the Controller will be re-elected from other normal brokers.

Enter the KafkaController.scala file and see the following code:

class KafkaController(val config : KafkaConfig, zkClient: ZkClient, val brokerState: BrokerState) extends Logging with KafkaMetricsGroup {
  this.logIdent = "[Controller " + config.brokerId + "]: "
  private var isRunning = true
  private val stateChangeLogger = KafkaController.stateChangeLogger
  val controllerContext = new ControllerContext(zkClient, config.zkSessionTimeoutMs)
  val partitionStateMachine = new PartitionStateMachine(this)
  val replicaStateMachine = new ReplicaStateMachine(this)
  private val controllerElector = new ZookeeperLeaderElector(controllerContext, ZkUtils.ControllerPath, onControllerFailover,
    onControllerResignation, config.brokerId)
  // have a separate scheduler for the controller to be able to start and stop independently of the
  // kafka server
  private val autoRebalanceScheduler = new KafkaScheduler(1)
  var deleteTopicManager: TopicDeletionManager = null
  val offlinePartitionSelector = new OfflinePartitionLeaderSelector(controllerContext, config)
  private val reassignedPartitionLeaderSelector = new ReassignedPartitionLeaderSelector(controllerContext)
  private val preferredReplicaPartitionLeaderSelector = new PreferredReplicaPartitionLeaderSelector(controllerContext)
  private val controlledShutdownPartitionLeaderSelector = new ControlledShutdownLeaderSelector(controllerContext)
  private val brokerRequestBatch = new ControllerBrokerRequestBatch(this)

  private val partitionReassignedListener = new PartitionsReassignedListener(this)
  private val preferredReplicaElectionListener = new PreferredReplicaElectionListener(this)

Many properties are defined in the KafkaController class. Let's focus on the following PartitionLeaderSelector object, mainly to elect the leader broker for the partition. This trait only defines a method selectLeader, which receives a TopicAndPartition object and a LeaderAndIsr object. TopicAndPartition indicates the partition to elect the leader, and the second parameter indicates the current leader and ISR records of the partition saved in zookeeper. This method returns a tuple containing the elected leader and ISR and a set of replicas that need to receive the LeaderAndISr request.

trait PartitionLeaderSelector {  
    def selectLeader(topicAndPartition: TopicAndPartition, currentLeaderAndIsr: LeaderAndIsr): (LeaderAndIsr, Seq[Int])
}

Through our code above, we can see that there are five selector electors defined in KafkaController:

  • 1、NoOpLeaderSelector
  • 2、OfflinePartitionLeaderSelector
  • 3、ReassignedPartitionLeaderSelector
  • 4、PreferredReplicaPartitionLeaderSelector
  • 5、ControlledShutdownLeaderSelector

Before we explain these five selectors, let's first understand the four states of Partition in Kafka:

  • NonExistentPartition - This state indicates that the partition was either not created or was created but was later deleted.
  • NewPartition —— 分区创建之后就处于NewPartition状态。在这个状态中,分区应该已经分配了副本,但是还没有选举出leader和ISR。
  • OnlinePartition —— 一旦分区的leader被推选出来,它就处于OnlinePartition状态。
  • OfflinePartition —— 如果leader选举出来后,leader broker宕机了,那么该分区就处于OfflinePartition状态。

四种状态的转换关系如下:

NonExistentPartition -> NewPartition

  1. 首先将第一个可用的副本broker作为leader broker并把所有可用的副本对象都装入ISR,然后写leader和ISR信息到zookeeper中保存
  2. 对于这个分区而言,发送LeaderAndIsr请求到每个可用的副本broker,以及UpdateMetadata请求到每个可用的broker上

OnlinePartition, OfflinePartition -> OnlinePartition

为该分区选取新的leader和ISR以及接收LeaderAndIsr请求的一组副本,然后写入leader和ISR信息到zookeeper中保存。

NewPartition, OnlinePartition -> OfflinePartition

标记分区状态为离线(offline)。

OfflinePartition -> NonExistentPartition

离线状态标记为不存在分区,表示该分区失败或者被删除。

在介绍完最基本的概念之后,下面我们将重点介绍上面提到过的五种选举器:
1、ReassignedPartitionLeaderSelector
从可用的ISR中选取第一个作为leader,把当前的ISR作为新的ISR,将重分配的副本集合作为接收LeaderAndIsr请求的副本集合。
2、PreferredReplicaPartitionLeaderSelector
如果从assignedReplicas取出的第一个副本就是分区leader的话,则抛出异常,否则将第一个副本设置为分区leader。
3、ControlledShutdownLeaderSelector
将ISR中处于关闭状态的副本从集合中去除掉,返回一个新新的ISR集合,然后选取第一个副本作为leader,然后令当前AR作为接收LeaderAndIsr请求的副本。
4、NoOpLeaderSelector
原则上不做任何事情,返回当前的leader和isr。
5、OfflinePartitionLeaderSelector
从活着的ISR中选择一个broker作为leader,如果ISR中没有活着的副本,则从assignedReplicas中选择一个副本作为leader,leader选举成功后注册到Zookeeper中,并更新所有的缓存。

所有的leader选择完成后,都要通过请求把具体的request路由到对应的handler处理。目前kafka并没有把handler抽象出来,而是每个handler都是一个函数,混在KafkaApi类中。
其实也就是如下的代码:

def handle(request: RequestChannel.Request) {  
  try{  
    trace("Handling request: " + request.requestObj + " from client: " + request.remoteAddress)  
    request.requestId match {  
      case RequestKeys.ProduceKey => handleProducerRequest(request)  // producer  
      case RequestKeys.FetchKey => handleFetchRequest(request)       // consumer  
      case RequestKeys.OffsetsKey => handleOffsetRequest(request)  
      case RequestKeys.MetadataKey => handleTopicMetadataRequest(request)  
      case RequestKeys.LeaderAndIsrKey => handleLeaderAndIsrRequest(request) //成为leader或follower设置同步副本组信息  
      case RequestKeys.StopReplicaKey => handleStopReplicaRequest(request)  
      case RequestKeys.UpdateMetadataKey => handleUpdateMetadataRequest(request)  
      case RequestKeys.ControlledShutdownKey => handleControlledShutdownRequest(request)  //shutdown broker  
      case RequestKeys.OffsetCommitKey => handleOffsetCommitRequest(request)  
      case RequestKeys.OffsetFetchKey => handleOffsetFetchRequest(request)  
      case requestId => throw new KafkaException("Unknown api code " + requestId)  
    }  
  } catch {  
    case e: Throwable =>  
      request.requestObj.handleError(e, requestChannel, request)  
      error("error when handling request %s".format(request.requestObj), e)  
  } finally  
    request.apiLocalCompleteTimeMs = SystemTime.milliseconds  
}

这里面的每个请求在上面给出的链接的文章中都有过解释说明,在这里不多解释。

RequestKeys.LeaderAndIsr详细分析
在上面的代码中咱们看到ReequestKeys.LeaderAndlst对应的方法其实是KeyhandleLeaderAndIsrRequest。

def handleLeaderAndIsrRequest(request: RequestChannel.Request) {
    // ensureTopicExists is only for client facing requests
    // We can't have the ensureTopicExists check here since the controller sends it as an advisory to all brokers so they
    // stop serving data to clients for the topic being deleted
    val leaderAndIsrRequest = request.requestObj.asInstanceOf[LeaderAndIsrRequest]
    try {
      val (response, error) = replicaManager.becomeLeaderOrFollower(leaderAndIsrRequest, offsetManager)
      val leaderAndIsrResponse = new LeaderAndIsrResponse(leaderAndIsrRequest.correlationId, response, error)
      requestChannel.sendResponse(new Response(request, new BoundedByteBufferSend(leaderAndIsrResponse)))
    } catch {
      case e: KafkaStorageException =>
        fatal("Disk error during leadership change.", e)
        Runtime.getRuntime.halt(1)
    }
  }

将request.requestObj转换成LeaderAndIstRequest对象类型。


Sample Flowchart Template.png

流程图说明

1、如果请求中controllerEpoch小于当前最新的controllerEpoch,则直接返回ErrorMapping.StaleControllerEpochCode。

2、如果partitionStateInfo中的leader epoch大于当前ReplicManager中存储的(topic, partitionId)对应的partition的leader epoch,则:

2.1、如果当前brokerid(或者说replica id)在partitionStateInfo中,则将该partition及partitionStateInfo存入一个名为partitionState的HashMap中。
否则说明该Broker不在该Partition分配的Replica list中,将该信息记录于log中

3、如果partitionStateInfo中的leader epoch小于当前ReplicManager则将相应的Error code(ErrorMapping.StaleLeaderEpochCode)存入Response中。

4、筛选出partitionState中Leader与当前Broker ID相等的所有记录存入partitionsTobeLeader中,其它记录存入partitionsToBeFollower中。
如果partitionsTobeLeader不为空,则对其执行makeLeaders方。
如果partitionsToBeFollower不为空,则对其执行makeFollowers方法。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327075039&siteId=291194637