Broker Kafka source code analysis and illustration of the principle of the end

I. Introduction

  https://www.cnblogs.com/GrimMjx/p/11354987.html

  He said upper section, any message queues are part of the original aim is 3, message producers (Producer), message consumer (Consumer), and support services (refer to by the Broker Kafka). On one talked about some of the details kafka producer side, and then some of the design principles of this section in terms of broker ends

  First, how kafka create a topic from the start:

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

  There are several arguments:

  • --zookeeper: zookeeper address
  • --replication-factor: a copy of the factor
  • --partitions: The number of partitions (default 1)
  • --topic: topic Name

II. What is Partitioning

  A topic can have multiple partitions, each message is different. While zoning can provide higher throughput, but the partition is not better. The number of partitions generally do not exceed the number of machines kafka cluster. The more memory occupied by partitions and file handles. Generally partition is set to 3-10. For example, now there are three clusters machine, to create a test called the topic, the number of partitions to 2, as shown:

  partiton are sequentially ordered shear immutable record set, and continuously added to the log file, partition of each message is assigned a return id, that is, offset (offset), a recording mark offset for the partition , Fig official website here, and I draw is not good:

2.1 producer and partition end relationship

  On the situation on the map, producer will mq which partition to end it? This is the last one we mentioned a parameter partitioner.class. Default partition's process is: the key has a key hash value is calculated using the algorithm murmur2, calculated on the total partition modulo partition number, then no polling key. (org.apache.kafka.clients.producer.internals.DefaultPartitioner # partition). Of course we can also customize the partitioning strategy, as long as the interface to achieve org.apache.kafka.clients.producer.Partitioner:

 1 /**
 2  * Compute the partition for the given record.
 3  *
 4  * @param topic The topic name
 5  * @param key The key to partition on (or null if no key)
 6  * @param keyBytes serialized key to partition on (or null if no key)
 7  * @param value The value to partition on or null
 8  * @param valueBytes serialized value to partition on or null
 9  * @param cluster The current cluster metadata
10  */
11 public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
12     List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
13     int numPartitions = partitions.size();
14     if (keyBytes == null) {
15         int nextValue = nextValue(topic);
16         List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
17         if (availablePartitions.size() > 0) {
18             int part = Utils.toPositive(nextValue) % availablePartitions.size();
19             return availablePartitions.get(part).partition();
20         } else {
21             // no partitions are available, give a non-available partition
22             return Utils.toPositive(nextValue) % numPartitions;
23         }
24     } else {
25         // hash the keyBytes to choose a partition
26         return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
27     }
28 }

2.2 consumer end and partition relations

  First look at the definition of consumption Xiaguan network groups: Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

  Translation: Consumers use a consumer group name to mark themselves, a topic of the message will be sent to subscribe to its consumer group of a consumer instance.

  ? Consumer group is used to achieve high scalability, consumer mechanism for high fault tolerance. If there is a new consumer to hang up or consumer, consumer Group will be balanced by a weight (rebalance of), the weight balancing mechanism would explain the specific consumer articles in this section do not speak. Well, according to the above figure continues to draw the consumer side:

  This is the best case, two partition corresponding to a group in which two consumer. So thinking, if a consumer group of consumers is greater than the number of partitions it? Or less than the number of partitions it?

  If the consumer a consumer group is greater than the number of partitions, the equivalent of excess consumer is a waste, excess consumer will be unable to consume messages.

  If the consumer a consumer group is less than the number of partitions, there will be a corresponding consumer partition allocation policy. One is Range (default), one is RoundRobin (polling), of course, you can customize the policy. In fact, the idea of a new name, ah, every consumer can balance the work load. Specific consumer articles will explain, do not speak here.

  Recommendation: configure the number of partitions is an integer multiple of the number of consumers

III. A copy of the design and ISR

3.1 What is a copy

  When creating the topic has --replication-factor parameter is used to set the number of copies. Kafka multiple parts with the same backup system to maintain high availability, which is called a backup copy (Replica) in the Kafka. Copies divided into three categories:

  • leader copy: in response to read and write requests producer end
  • follower replica: leader backup copy of the data, does not respond to read and write requests producer end!
  • ISR replica set: contains a leader and a copy of a copy of all follower (follower may not have a copy)

  Kafka all the copies will be evenly distributed to all the broker kafka-cluster, and a selection of these copies from the copy as a leader, other copies become follow. If the leader a copy where the broker is down, then one of them will become the leader follow copy of a copy. leader receives a copy of the producer side read and write requests, but only a copy of the requested data follow the leader does not receive a copy of the write request!

3.2 copy of the synchronization mechanism

  He says the dynamic maintenance of ISR is a set of synchronous replica set, leader always included in a copy of ISR collection. Only a copy of the ISR are eligible to be elected as leader a copy. When the ack-side configuration parameters producer to all (-1), mq producer needs to write to the ISR all copies are received, it is treated as submitted. Of course, on one mentions the use ack parameter must cooperate min.insync.replicas broker side (default is 1) parameters together to achieve results, how many copies of the isr written control parameter to be successful. If the number of copies ISR is less than min.insync.replicas, the client will return an exception org.apache.kafka.common.errors.NotEnoughReplicasExceptoin: Messages are rejected since there are fewer in-sync replicas than required.

  To see a copy of the synchronization mechanisms need to learn a few terms:

  • High Watermark: a copy of the high water level, referred to as HW, less than HW or HW following messages are considered to be "backed up already", HW also pointed to the next message! HW value of the leader determines the number of copies of a message can poll the consumer! consumer can only consume messages less than HW value!
  • LEO: log end offset, displacement of the next message. That points to the LEO position is no news!
  • remote LEO: This is strictly a collection. broker's leader in memory where a copy maintains a Partition object to hold the partition information corresponding to the Partition maintained a Replica list, save a copy of all the objects in the partition. In addition to a copy of Replica leader, the list of other Replica LEO objects is called a remote LEO

  Below is an example of a real (in this case refer to blog Hu Xi), the topic of this example is a single partition, the copy factor is two. That is a copy of a leader, a follower copy, ISR contains two copies of this collection. When we first look when the producer sends a message, leader / follower end broker's copy of the object in the end what will happen and how the partition HW updated. The first is the initial state:

  In this case the producer to send a message topic partition. At this time, a state is shown below:

  As seen in FIG., The producer sends a message successfully (assuming acks = 1, leader i.e. return a successful write), follower FECTH sent a new request, the request still fetchOffset = 0 data. And last difference is that, the data is read, the entire process flow follows:

   Clearly, there is a leader and follower are kept displacement of this message is 0, the value of both sides of the HW but have not been updated, they need to update request is processed in the next round of the FETCH, as shown below:

  简单解释一下, 第二轮FETCH请求中,follower发送fetchOffset = 1的FETCH请求——因为fetchOffset = 0的消息已经成功写入follower本地日志了,所以这次请求fetchOffset = 1的数据了。Leader端broker接收到FETCH请求后首先会更新other replicas中的LEO值,即将remote LEO更新成1,然后更新分区HW值为1——具体的更新规则参见上面的解释。做完这些之后将当前分区HW值(1)封装进FETCH response发送给follower。Follower端broker接收到FETCH response之后从中提取出当前分区HW值1,然后与自己的LEO值比较,从而将自己的HW值更新成1,至此完整的HW、LEO更新周期结束。

3.3 ISR维护  

  在0.9.0.0版本之后,只有一个参数:replica.lag.time.max.ms来判定该副本是否应该在ISR集合中,这个参数默认值为10s。意思是如果一个follower副本响应leader副本的时间超过10s,kafka会认为这个副本走远了从同步副本列表移除。

四.日志设计

  Kafka的每个主题相互隔离,每个主题可以有一个或者多个分区,每个分区都有记录消息数据的日志文件:

   图中有个demo-topic的主题,这个topic有8个分区,每一个分区都存在[topic-partition]命名的消息日志文件。在分区日志文件中,可以看到前缀一样,但是文件类型不一样的几个文件。比如图中的3个文件,(00000000000000000000.index、00000000000000000000.timestamp、00000000000000000000.log)。这称之为一个LogSegment(日志分段)。

4.1 LogSegment

  以一个测试环境的具体例子来讲,一个名为ALC.ASSET.EQUITY.SUBJECT.CHANGE的topic,我们看partition0的日志文件:

  每一个LogSegment都包含一些文件名一致的文件集合。文件名的固定是20位数字,如果文件名是00000000000000000000代表当前LogSegment的第一条消息的offset(偏移量)为0,如果文件名是00000000000000000097代表当前LogSegment的第一条消息的offset(偏移量)为97。日志文件有多种后缀的文件,重点关注.index、.timestamp、.log三种类型文件即可。

  • .index:偏移量索引文件
  • .timeindex:时间索引文件
  • .log:日志文件
  • .snapshot:快照文件
  • .swap:Log Compaction之后的临时文件

4.2 索引与日志文件

  kafka有2种索引文件,第一种是offset(偏移量)索引文件,也就是.index结尾的文件。第二种是时间戳索引文件,也就是.timeindex结尾的文件。

  我们可以用kafka-run-class.sh来查看offset(偏移量)索引文件的内容:

  可以看到每一行都是offset:xxx  position:xxxx。这两者没有直接关系。

  • offset:相对偏移量
  • position:物理地址

  那么第一行的offset:12 position:4423是什么意思呢?它代表偏移量从0-12的消息的物理地址在0-4423。

  同理第二行的offset:24 position:8773的意思也能猜得出来:它代表偏移量从13-24的消息的物理地址在4424-8773。

  我们可以再用kafka-run-class.sh来看下.log文件的文件内容,关注里面的baseOffset和postion的值。你看看和上面说的对应的上吗。

4.3 如何用offset查找 

  按上面的例子,如何查询偏移量为60的消息

  1. 根据offset首先找到对应的LogSegment,这里找到00000000000000000000.index
  2. 通过二分法找到不大于offset的最大索引项,这里找到offset:24 position:8773
  3. 打开00000000000000000000.log文件,从position为8773的那个地方开始顺序扫描直到找到offset=60的消息

 

 

 

参考文档:

http://kafka.apachecn.org/documentation.html#introduction

https://www.cnblogs.com/huxi2b/p/9579681.html

《Apache Kafka实战》

 

Guess you like

Origin www.cnblogs.com/GrimMjx/p/11523067.html