Kafka's storage mechanism and reliability

Kafka's storage mechanism and reliability

First, the storage mechanism of kafka

    Kafka uses topics to store data by topic. There are partitions in the topic, and the partition can have multiple copies. The interior of the partition is also subdivided into several segments.

    The so-called partition is actually a folder created in the corresponding storage directory of kafka. The name of the folder is the topic name plus the partition number, and the number starts from 0.

1、segment

    The so-called segment is actually the file generated in the folder corresponding to the partition.

    A partition will be divided into several segments of equal size, which ensures that the data of the partition is divided into multiple files to ensure that no excessively large files are generated; on the other hand, historical data can be deleted based on these segment files. ,Improve efficiency.

    A segment consists of a .log and a .index file.

1..log

    The .log file is a data file used to store data segmented data.

2..index

    .index saves the index information of the corresponding .log file for the index file.

    In the .index file, the index information of the corresponding .log file is saved. By searching the .index file, you can know the starting position of each offset stored in the current segment in the .log file, and each log has its own fixed format. , which saves relevant information including offset number, log length, key length, etc. Through the data in this fixed format, the end position of the current offset can be determined, so as to read the data.

3. naming convention

    The naming rules for these two files are:

    The first segment of the global partition starts from 0, and the file name of each subsequent segment file is the offset value of the last message of the previous segment file.

2. Read data

    When starting to read the data corresponding to an offset in the specified partition, first compare the offset and the names of all segments in the current partition to determine which segment the data is in, and then find the index file of the segment to determine the current offset in the data file. The starting position in the data file, and finally read the data file from this position, and obtain the complete data according to the result of the data format judgment.

2. Reliability guarantee

1 、AR

    A list of ARs is maintained in Kafka, including replicas of all partitions. AR is further divided into ISR and OSR.

    ON = ISR + OSR。

    AR, ISR, OSR, LEO, HW are all stored in Zookeeper.

1.ISR

    The replicas in the ISR must synchronize the data in the leader. Only when the data is synchronized can it be considered to be successfully submitted, and it can be accessed by the outside world after it is successfully submitted.

    In this synchronization process, the data cannot be accessed by the outside world even if it has been written. This process is implemented through the LEO-HW mechanism.

2.OSR

    Whether the replica in the OSR synchronizes the leader's data does not affect the submission of the data. The followers in the OSR try their best to synchronize the leader, and the data version may fall behind.

    In the beginning, all replicas are in the ISR. During the work of kafka, if the synchronization speed of a replica is slower than the threshold specified by replica.lag.time.max.ms, it will be kicked out of the ISR and stored in the OSR. Recovery can go back into the ISR.

3.LEO

    LogEndOffset: The offset of the latest data of the partition. When the data is written to the leader, LEO immediately executes the latest data. Equivalent to the latest data flag.

4.HW

    HighWatermark: Only after the written data is synchronized to all replicas in the ISR, the data is considered to be submitted, the HW is updated to this location, and the data before the HW can be accessed by consumers, ensuring that the data that has not been synchronized will not be consumer access. Equivalent to all replica synchronization data flags.

    After the leader is down, it can only select a new leader from the ISR list. No matter which replica in the ISR is selected as the new leader, it knows the data before the HW, which can ensure that after switching the leader, consumers can continue to watch Data that has been submitted before HW.

    Therefore, LEO represents the latest data location that has been written, and HW represents the data that has been synchronized. Only the data before HW can be accessed by the outside world.

5. HW truncation mechanism

    If the leader goes down and a new leader is selected, the new leader does not guarantee that all the data of the previous leader has been completely synchronized, but can only guarantee that the data before the HW has been synchronized. Truncate to the location of HW, and then synchronize data with the new leader to ensure data consistency.

    When the downed leader recovers and finds that the data in the new leader is inconsistent with the data it holds, the downed leader truncates its data to the hw position before the downtime, and then synchronizes the data of the new leader. The downed leader also synchronizes data like a follower to ensure data consistency.

 

2. Producer reliability level

    Through the above explanation, the internal reliability of the kafka cluster can be guaranteed, but when the producer sends the data to the kafka cluster, the data is transmitted through the network, which is also unreliable, and the data may be lost due to network delays, flashes and other reasons.

    Kafka provides the following three reliability levels for producers, and different reliability guarantees are guaranteed through different strategies.

    In fact, this policy configures the timing when the leader will successfully receive the message information and respond to the client.

    Configured through the request.required.acks parameter:

    1 : The producer sends data to the leader, and the leader sends a success message after receiving the data. After the producer receives it, it considers that the data is sent successfully. If the success message is not received, the producer will automatically resend the data if it thinks that the data sending failed.

    When the leader goes down, data may be lost.

    0 : The producer keeps sending data to the leader without the leader returning a success message.

    This mode is the most efficient and least reliable. Data may be lost during the sending process, or it may be lost when the leader is down.

    -1 : The producer sends data to the leader. After receiving the data, the leader waits until all replicas in the ISR list have synchronized the data before sending a success message to the producer. If one cannot receive a success message, it is considered to send data. Failure will automatically resend the data.

    In this mode, the reliability is very high, but when only the leader is left in the ISR list, data may be lost when the leader goes down.

    At this point, you can configure min.insync.replicas to specify that there must be at least a specified number of replicas in the observed ISR. The default value is 1, and it needs to be changed to a value greater than or equal to 2

    In this way, when the producer sends data to the leader but finds that there is only the leader in the ISR, it will receive an exception indicating that the data writing failed, and the data cannot be written at this time, ensuring that the data will never be lost.

    Although it is not lost, redundant data may be generated. For example, the producer sends data to the leader, and the leader synchronizes the data to the followers in the ISR. When half of the leaders are down, a new leader is elected, which may have some of the submitted data. If the producer receives the failure message and resends the data, the new leader accepts the data and the data is duplicated.

3. Leader election

    When the leader goes down, a follower in the ISR will be selected to become the new leader. What if all the replicas in the ISR go down?

    The following configuration can solve this problem:

    unclean.leader.election.enable=false

    Strategy 1: You must wait for the replica in the ISR list to come alive before choosing it to become the leader to continue working.

    unclean.leader.election.enable=true

    Strategy 2: Select any surviving replica and become the leader to continue working. This follower may not be in the ISR.

    Strategy 1. The reliability is guaranteed, but the availability is low. Only after the leader hangs up and survives, Kafka can recover.

    Strategy 2, high availability, reliability is not guaranteed, any replica can continue to work if it survives, but there may be data inconsistencies.

4. Guarantee of kafka reliability

    At most once: Messages may be lost, but never repeated.

    At least once: The message is never lost, but may be transmitted repeatedly.

    Exactly once: Each message will definitely be transmitted once and only once.

    Kafka guarantees At least once at most, which can guarantee not to lose, but it may be repeated. In order to solve the duplication, it is necessary to introduce unique identification and deduplication mechanism. Kafka provides GUID to realize unique identification, but does not provide its own deduplication mechanism. Developers deduplicate themselves based on business rules.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325163102&siteId=291194637