One article will help you master Kafka, don’t do basic skills 6

Author: Big Bowl Wide Noodles

This article is based on the summary of chatGPT, supplemented by the sharing of various blogs or personal websites of big guys, and finally launches its own colloquial summary, striving to create a complete and systematic collection of Kafka theoretical knowledge. This article is divided into five modules, basic nouns, advanced noun analysis, Kafka mechanism and scenario analysis, and finally a brief analysis of the Kafka source code package to achieve depth and breadth.

Kafka basic noun parsing

What is Kafka, and what are Broker, Topic, Partition, Producer, and Consumer?

Kafka is a high-performance, scalable distributed message queuing system commonly used to process massive amounts of data and real-time data streams.

  • Broker : A Kafka cluster consists of one or more independent server nodes, each node is called a Broker. Each Broker is responsible for the storage, reception and forwarding (receipt and transmission) of messages . Together they form a distributed messaging system.

  • Topic: A topic is a category of messages or the name of a data stream in Kafka. Messages are categorized and published through topics. You can think of a topic as a message queue with a unique identifier. For example, you could have one topic that receives log messages and another topic that receives user activity messages.

  • Partition : Each topic can be divided into one or more partitions. Partitions are physical divisions of topics and are used to implement parallel processing and distributed storage of data. Each partition is an ordered, immutable sequence of messages. Partitions are stored as files on the disk and are managed by the Broker.

  • Producer : A producer is an application or system that generates messages and publishes them to a Kafka topic. Producers are responsible for sending messages to specific topics and can optionally send messages to specific partitions. The producer sends the message to the broker, which then persists the message and replicates it to other brokers according to the configured replication policy.

  • Consumer : A consumer is an application or system that reads messages from a Kafka topic. A consumer subscribes to one or more topics and reads messages from one or more partitions of the topic . Each consumer has an independent consumer group identifier so that multiple consumers can consume messages from the topic in a parallel fashion.

A brief summary is that Kafka is a commonly used high-performance distributed message queue, which mainly includes the concepts of broker, topic, partition, producer and consumer. The agent is the name of each server node in the Kafka cluster and is responsible for the storage, reception and forwarding of messages. Kafka is a messaging system based on publish and subscribe, and the topic is the bridge between producers and consumers. Each topic can be divided into at least one partition, each partition is a set of ordered messages, and the number of partitions also represents the maximum degree of parallelism. Producers are applications or systems that generate messages and publish them to Kafka topics, and consumers are applications or systems that read messages from Kafka topics.

Add some niche knowledge:

  1. partition's data file (offset, MessageSize, data) . Each Message in the partition contains the following three attributes: offset, MessageSize, data, where offset represents the offset of the Message in this partition. Offset is not the actual storage location of the Message in the partition data file, but a logical A value that uniquely determines a Message in a partition. It can be considered that offset is the id of the Message in the partition; MessageSize represents the size of the message content data; data is the specific content of the Message.

  2. Data file segmentation (sequential reading and writing, segment naming, binary search) . A partition is physically composed of multiple segment files. Each segment is equal in size and is read and written sequentially. Each segment data file is named with the smallest offset in the segment, and the file extension is .log. In this way, when searching for a Message with a specified offset, binary search can be used to locate the segment data file in which the Message is located .

  3. Data file index (segmented index, sparse storage) . Kafka creates an index file for each segmented data file. The file name is the same as the data file, except that the file extension is .index. The index file does not create an index for each Message in the data file. Instead, it uses sparse storage and creates an index every certain byte of data. This prevents the index file from taking up too much space, allowing the index file to remain in memory.

Kafka advanced term analysis

Controller

When Kafka is started, it will register the current Broker information in Zookeeper. Whoever registers first is the Controller. It reads the data of the registered slave nodes (through the listening mechanism), generates the metadata information of the cluster , and then distributes this information to other nodes. server, allowing other servers to be aware of the existence of other members in the cluster

Coordinator

Kafka's Coordinator is the key component responsible for coordinating and managing transactions between consumer groups and producers. It ensures that the consumer group is able to allocate and consume partitions correctly, and manages the correct commit and recovery of producer transactions .

  1. Consumer Group Coordinator (Group Coordinator) : Each consumer group has a consumer group coordinator. It is a Kafka broker responsible for coordinating and managing various activities of the consumer group. The consumer group coordinator performs the following tasks:

  • Register a new consumer : When a consumer joins a consumer group, it registers itself with the consumer group coordinator and receives the allocation policy for the consumer group.

  • Assign partitions : When the number of consumers in the consumer group changes (such as new consumers joining or old consumers leaving), the consumer group coordinator is responsible for redistributing partitions to consumers. It uses the Partition Assignment Strategy to decide how to allocate partitions to consumers to achieve load balancing and fault tolerance.

  • Managing the offset of the consumer group : The consumer group coordinator is also responsible for tracking and managing the consumption offset of the consumer group on each partition. It commits the consumer group's offset into Kafka to ensure that the consumer group can correctly continue consuming from the last offset after disconnecting or rebalancing.

  1. Producer Transaction Coordinator : Kafka also has a producer transaction coordinator, which is responsible for managing the coordination of Kafka transactions. The producer transaction coordinator performs the following tasks:

  • Initializing a transaction : When a producer starts a transaction, it interacts with the producer transaction coordinator to obtain a transaction ID and the corresponding Leader copy.

  • Commit the transaction : The producer sends the transaction's commit request to the producer transaction coordinator when the transaction is completed. The coordinator ensures that all messages in the transaction are successfully written and writes the transaction's commit status to the Kafka log.

  • Transaction recovery : If the producer fails or crashes during a transaction, the producer transaction coordinator is responsible for coordinating the recovery process to ensure the integrity of the transaction.

Consumer Group

Consumer Group is used to organize and manage message consumers. According to the official introduction, the consumer group is a scalable and fault-tolerant consumer mechanism provided by Kafka. A consumer group contains multiple consumer instances, each of which can consume messages in parallel. A message queuing system distributes messages to consumer instances in a group for high throughput and load balancing. It should be noted that consumers in the same consumer group that exceed the number of partitions cannot consume messages, so the number of consumers is generally set to equal the number of partitions. Of course, the extra consumers can also be used as backup for failover .

It is worth mentioning that Kafka uses the consumer group design to implement two message queue models at the same time. If all instances belong to the same Group, it is a queue model. If all instances belong to different Groups, then it implements the publish/subscribe model (one-to-many).

Rebalancing

Rebalancing refers to the process in which the message queue system redistributes messages distributed to each consumer instance when the consumer instances in the consumer group change. The addition of new consumer instances, or the exit of old instances due to failures or heartbeat timeouts, will cause rebalancing to occur . The rebalancing process takes some time and may result in some messages not being processed during the reallocation period.

The Rebalance process is divided into two steps: Join and Sync

  1. In the Join phase, all consumer instances send join requests to the Broker's consumer coordinator (Group Coordinator). After obtaining all successful requests, the coordinator randomly selects a consumer instance to become the leader, and sends the group member information and topic information to the leader. The Consumer Leader is responsible for formulating the consumption allocation plan.

  2. In the Sync phase, the Consumer Leader begins to allocate the consumption plan, that is, which Consumer is responsible for consuming which Topics and which Partitions. Once the allocation is completed, the Leader will encapsulate the plan into a SyncGroup request and send it to the Coordinator. Non-Leaders will also send SyncGroup requests, but the content is: Empty. After receiving the allocation plan, the Coordinator will stuff the plan into the SyncGroup's Response and send it to each Consumer. In this way, all members of the group will know which partitions they should consume.

Message

A Kafka Message consists of a fixed-length header and a variable-length message body.

The header part consists of one byte of magic (file format) and four bytes of CRC32 (used to determine whether the body message body is normal). When the value of magic is 1, there will be one more byte of data between magic and crc32: attributes (save some related attributes, such as whether to compress, compression format, etc.); if the value of magic is 0, then it does not exist attributes attributes

Body is a message body composed of N bytes, including specific key/value messages, such as:

  • Key (optional): The key of the message, used for partitioning and message ordering. If a key is provided, Kafka will route messages to a specific partition based on the hash of the key, ensuring that messages with the same key are written and read to the same partition.

  • Value: The actual message content, usually a byte array or string. This is data that needs to be transferred and processed.

  • Offset: The unique identifier of the message in the partition, indicating the position of the message in the partition. Consumers can use offsets to track the progress of their consumption, ensuring no messages are missed.

  • Partition (optional): The partition number to which the message belongs. If no partitions are specified, Kafka will automatically allocate partitions based on the partitioning policy configured by the producer.

  • Timestamp (optional): The timestamp of the message, indicating the time when the message was produced. The timestamp can be the time the message was actually produced, or it can be a custom time explicitly set when the producer sends the message.

  • Headers (optional): A set of key-value pairs used to store metadata related to the message. The header can contain various custom information, such as the source, type, version of the message, etc.

TOPIC的TOPIC(__consumer_offsets)

In Kafka, __consumer_offsets is an internal topic used to store offset information of consumer groups. It is used to track the progress of consumers consuming in the partition of a specific topic. Each message in the __consumer_offsets topic contains the following information:

  • consumer group ID

  • topic name

  • Partition ID

  • The offset of the consumer within the consumer group (offset)

  • Timestamp of message submission (timestamp)

By maintaining this internal topic, Kafka can track and manage the progress of the consumer group's consumption in the partition, ensuring that consumers can correctly process messages and achieve failure recovery . When the consumer starts, it reads offset information from the topic to determine at which offset to start consuming messages. As messages are processed and consumers commit offsets, this information is updated accordingly. At the same time, Kafka's consumer coordinator is responsible for updating and maintaining the __consumer_offsets topic. It handles requests for consumers to join and leave the consumer group, as well as operations for consumers to submit and get offsets.

Data files and Log files

Data Files:

In Kafka, a data file usually refers to the file on the Kafka server where the actual message data is stored . Kafka uses segmented storage to manage messages. Each topic is divided into multiple partitions, and each partition is subdivided into a series of data files. These data files are used to store the content of the message. When messages are sent to the Kafka cluster by the producer, they are appended to the appropriately partitioned data files. Consumers can read messages from these data files.

Log Files:

In Kafka, log files usually refer to files on the Kafka Broker used to persist messages . Kafka uses logs for message persistence and uses an append-based log storage model. Each partition of each topic has a corresponding log file that stores messages in sequence. Messages in the log file are written and read in an appended manner. Once a log file reaches a certain size limit or time limit, Kafka closes it and creates a new log file in order to continue persisting messages.

Log index

Kafka can support TB-level data at the log level for two reasons: sequential append-based log writing + sparse hash index (similar to data files, both use the concept of index for optimization). Kafka's sparse hash index is a data structure that records key message offsets at fixed intervals to speed up message location and retrieval while saving memory space.

When messages are written to Kafka's log files, they are appended to the end of the partition's log in order. Each partition has a corresponding sparse hash index, which records some important message offsets. These indexes do not record the offset of each message, but record the offset of some key messages at fixed intervals (usually a continuous message). The benefit of using a sparse hash index is that it can significantly reduce the size of the index, thus saving memory space.

When a consumer needs to read a message at a specific offset, it can first find the record closest to the target offset through the sparse hash index, and then linearly scan the log file until the target message is found. Although this method is slower than a complete index search, the existence of the sparse hash index makes the overall search speed still very efficient.

Kafka mechanism analysis

Let’s explain in detail Kafka’s multiple replica (Replica) mechanism.

Kafka's replica (Replica) provides a data redundancy and failure recovery mechanism to ensure that data will not be lost when a Kafka node fails, and supports high-throughput data read and write operations.

Each partition can have multiple copies, and multiple copies are distributed on different Broker nodes. The first replica of a partition is called the leader replica (Leader Replica), and the other replicas are called follower replicas (Follower Replica).

Leader replica : The leader replica of each partition is responsible for handling all read and write requests for that partition. Producers send messages to the leader replica, and consumers read messages from the leader replica. The leader replica is also responsible for replicating data to the follower replicas.

Follower Copy : A follower copy is a copy of the leader copy. They synchronize data from the leader replica. Follower replicas do not handle read and write requests from clients and are only used to provide redundancy and failover. ( It is worth mentioning that when designing similar replica concepts, many distributed systems will at least provide read functions to share the pressure on the master node. In comparison, Kafka’s replica mechanism is more strict)

Data replication : The leader replica writes the message to its local log (Log), and then copies the message to the follower replica's log through the replication mechanism. Replication can use two modes: synchronous replication and asynchronous replication.

  • Synchronous replication: The leader replica waits for all follower replicas to confirm that the message has been successfully replicated before considering the message as committed. This mode provides the strongest data guarantees, but has some impact on write latency.

  • Asynchronous replication: The leader replica returns a success response immediately after writing the message to the local log without waiting for acknowledgment from the follower replica. This mode provides lower write latency, but data loss may occur under certain circumstances.

Replica synchronization : In order to maintain the consistency of the follower copy and the leader copy, Kafka uses a log-based replication mechanism. Follower replicas copy log entries from the leader replica and append them to their own logs in order. The replication process uses an efficient incremental pull method, which only pulls log segments that have not yet been copied.

Failover : When the leader replica fails, Kafka automatically elects a follower replica as the new leader. All replicas in the partition are collectively called AR (Assigned Replicas). During the election process, Kafka uses the ISR (In-Sync Replica) set of the partition, that is, the set of follower replicas that maintain data synchronization with the leader replica. Delayed followers are stored in OSR (Outof-Sync Replicas) list, newly added followers will also be stored in OSR first. AR=ISR+OSR. Only follower copies in the ISR are eligible to become the new leader.

  • If the leader replica fails, Kafka will select a follower replica from the ISR to become the new leader. Note that if all ISRs are down, the first responding replica from AR will be selected as the leader, thus causing message loss or duplication.

  • If a follower replica fails, Kafka will remove it from the ISR and continue to synchronize with other follower replicas.

To briefly summarize , Kafka's copy mechanism provides data redundancy and failure recovery functions. Each partition can have multiple copies and can be dispersed on different nodes. The role of the replica is divided into leaders and followers. The leader is responsible for actual reading and writing and synchronizing data to the follower replicas. There are two modes: synchronous and asynchronous. Followers are only responsible for data redundancy and failover recovery. All replicas are collectively called AR. The set of followers and leaders with consistent data synchronization is called ISR, and the others are OSR. If the leader fails, a replica from the ISR is randomly selected to become the leader. If the ISR fails, the first replica in the AR is selected. A copy of the response.

Replica sharding rules

There are two algorithms for Kafka to allocate Replica: RangeAssignor and RoundRobinAssignor. The default is RangeAssignor, you can also customize the sharding location.

  • Sort all Brokers (assuming n Brokers in total) and Partitions to be allocated

  • Assign the i-th Partition to the (i mod n)-th Broker

  • Assign the j-th Replica of the i-th Partition to the ((i + j) mod n)-th Broker

What are the benefits of Kafka's multi-partition and multi-replica mechanisms?

  1. Improved throughput : The multi-partition mechanism allows messages to be processed in parallel on multiple partitions, thereby improving overall throughput. Each partition can be processed concurrently on an independent consumer, thereby improving the parallelism and processing capabilities of the system.

  2. Achieve horizontal scalability : Kafka can easily scale horizontally by distributing data across multiple partitions. Each partition can be deployed on different servers to achieve load balancing and horizontal expansion to meet the needs of high throughput and large-scale data processing.

  3. Improved fault tolerance : The multi-copy mechanism allows each partition to be replicated across multiple replicas. If one replica fails, Kafka can automatically switch read and write operations to other available replicas to achieve high availability and fault tolerance. In addition, the multi-copy mechanism also provides data redundancy. Even if one copy is damaged, the data is still available.

  4. Achieve data persistence : The multi-copy mechanism ensures that data is replicated on multiple copies, thereby achieving data persistence. Even in the event that one copy becomes corrupted or malfunctions, the data can still be accessed and recovered through the other copies.

  5. Support message sequence and partitioning : The multi-partition mechanism can allocate messages to different partitions according to business needs, and each partition can maintain the order of messages. This is important for applications that need to guarantee message ordering. At the same time, partitions can also be used to group and isolate messages to better control message flow.

Do you know the role of Zookeeper in Kafka?

  1. Coordinator election : Zookeeper is responsible for electing the coordinator (Coordinator) and controller (Controller) in the Kafka cluster, both of which can be served by one Broker at the same time. The controller is responsible for managing the status and metadata of the entire Kafka cluster, and is responsible for allocating partitions to various Brokers to achieve load balancing.

  2. Configuration management : Kafka's cluster configuration information (such as topics, partitions, replicas, etc.) and consumer group offsets and other metadata are stored in Zookeeper. Each Broker and consumer of Kafka obtains the latest cluster configuration information by interacting with Zookeeper.

  3. Broker registration and discovery : Kafka's Broker will register its own information with Zookeeper when it starts, including host name, port number, etc. At the same time, consumers can discover available Broker nodes through Zookeeper.

  4. Partition allocation and rebalancing : When a new Broker joins the cluster or an old Broker goes offline, Zookeeper assists the controller in redistributing and rebalancing partitions. It maintains the partition allocation plan and notifies each Broker of this information to ensure high data availability and load balancing.

  5. Replica management : Zookeeper tracks and manages replica information of Kafka partitions. It monitors the status of replicas and is responsible for reallocation and recovery of replicas in the event of replica failure.

  6. Client session management : Kafka consumers and producers maintain session connections with the cluster through Zookeeper. Zookeeper can detect client activity and handle connection failure and recovery.

In general, Zookeeper plays important roles in Kafka such as coordination, configuration management, partition allocation, copy management and client session management, ensuring the stable operation and data consistency of the Kafka cluster.

Why is Kafka so fast?

  1. Distributed architecture : Kafka adopts a distributed architecture that can distribute data and load across multiple nodes to achieve parallel processing and scalability. Messages are split into multiple partitions, and each partition can be processed on a different node in the cluster, allowing for horizontal scaling and load balancing.

  2. Zero-copy technology : Kafka uses zero-copy technology to avoid unnecessary data copy operations when reading and writing messages. Kafka uses the file mapping (mmap) mechanism of the operating system to directly map messages on the disk to the memory, reducing the cost of data copying between the memory and the disk and improving read and write performance.

  3. Batch processing : Kafka supports batch processing of messages. Producers can send multiple messages to Kafka together, and consumers can also pull multiple messages in batches for processing. Batch processing reduces network overhead and the number of system calls, improving throughput and efficiency.

  4. Efficient disk sequential writing : Kafka messages are appended to the disk instead of randomly written. This sequential writing method makes disk reading and writing more efficient, reduces seek time and disk fragmentation, and improves disk utilization and performance. Since modern operating systems provide read-ahead and write-ahead technology, sequential writes to disk are in most cases faster than random writes to memory.

  5. Memory-based storage and caching : Kafka uses the operating system's page cache to cache messages to improve read and write performance. Popular messages will be cached in memory, reducing the number of disk accesses and speeding up message reading. If Kafka's write rate is similar to its consumption rate, then the entire production and consumption process will not go through disk IO. All are memory operations.

  6. Efficient replication mechanism : Kafka's replication mechanism adopts streaming replication, which improves replication efficiency and throughput through asynchronous replication and batch replication. At the same time, Kafka also uses multiple replicas and ISR (In-Sync Replicas) mechanisms to ensure data reliability and high availability.

To briefly summarize, first, the multi-partition distributed structure makes full use of multi-threads. The second is to use the file mapping mechanism of the operating system to directly map disk data to memory, reducing one-time copy overhead and improving data reading and writing performance. The third is to support batch processing, which is supported by both producers and consumers at the same time, reducing network overhead and the number of requests. Fourth, disk sequential writing is more efficient than random writing. Fifth, hot news is stored in memory to improve reading and writing performance.

Share some unpopular knowledge

Why doesn't Kafka manage its own cache instead of using page cache?

  1. Everything in the JVM is an object. Object storage of data will cause so-called object overhead and waste space.

  2. If the JVM manages the cache, it will be affected by GC, and an excessively large heap will also drag down the efficiency of GC and reduce throughput.

  3. Once the program crashes, all cache data managed by you will be lost.

Kafka Performance: Why is Kafka so "fast"? [1], I read this article and thought it was very well written, so I posted it here. If you want to understand in detail, you can jump to the link and take a look

Kafka scenario analysis

How does Kafka ensure high availability?

  1. Distributed architecture : Kafka adopts a distributed architecture to store data distributedly on multiple Broker nodes. This allows even if a Broker node fails, other normally operating Broker nodes can still continue to provide services.

  2. Multi-copy mechanism : Kafka uses a multi-copy mechanism to ensure data redundancy and availability. Each partition can be configured with multiple replicas, which are distributed on different Broker nodes. If the Broker where a replica is located fails, other replicas can still take over the service to ensure data availability.

  3. Controller : The controller in the Kafka cluster is responsible for managing the status and metadata of the entire cluster. When a Broker node fails or goes offline, the controller will detect this change and perform corresponding operations, such as redistributing the leadership of partitions and replicas to ensure high availability and consistency of data.

  4. Automatic failure recovery : Kafka has the capability of automatic failure recovery. When a Broker node fails or goes offline, the controller automatically triggers the redistribution and recovery process of replicas and allocates replicas to other available Broker nodes to ensure data redundancy and availability.

  5. Heartbeat and session expiration : The connection between Kafka and the client is maintained through the heartbeat mechanism, and the active status of the client is regularly detected. If a consumer or producer client does not send a heartbeat for a long time, the controller considers that client's session to have expired and redistributes its partitions to other active consumers.

  6. Monitoring and alarming : Kafka provides a wealth of monitoring indicators and alarm mechanisms, which can monitor the status of the cluster, the health of the partition, and the progress of the consumer in real time. This helps to detect and resolve potential problems in a timely manner, improving system availability and stability.

Simply put, it is divided into four points. The first is Kafka's distributed structure. Multiple partitions are distributed on different nodes. If some nodes are down, services can still be provided normally. Second, the multi-copy mechanism ensures high availability through data redundancy. Third, the controller role of the Kafka cluster will manage the status and metadata of the entire cluster. Once a node is monitored online or offline, partition and replica leadership will be redistributed. The fourth is the coordinator in the Kafka cluster, which ensures the transaction operation of data on the production side and uses the rebalancing mechanism on the consumer side to ensure that the data is smoothly consumed by the consumer group.

How does Kafka ensure that messages are not consumed repeatedly?

  1. Consumer Offset : Kafka maintains the offset consumed by each consumer group, that is, the position of the message that has been processed. The consumer periodically commits the current offset, indicating that it has successfully consumed the message. Kafka will save this offset and restore it after restarting, ensuring that consumers can continue consuming from the last committed offset.

  2. Consumer Group Coordinator : The consumer group coordinator in Kafka is responsible for tracking and managing the offset of the consumer group. It stores each consumer's offset information in an internal or external storage system (such as Zookeeper or Kafka's own internal topic __consumer_offsets). Through the management of the coordinator, Kafka ensures that each consumer in the consumer group can correctly continue consuming from the last offset.

  3. Committing Consumer Offsets : Consumers can choose to manually submit offsets after consuming a batch of messages, or automatically submit offsets regularly through automatic submission configuration. Manually committing offsets ensures that offsets are not committed when consumption fails or error handling occurs, thereby avoiding repeated consumption. Auto-commit offsets need to be used with caution to ensure that operations handling messages are idempotent.

  4. Exactly-Once semantics : Kafka introduces transactional producer and consumer APIs, enabling applications to achieve "exactly-once" semantics. Transactional producers can write messages to Kafka transactions and only expose the messages to consumers after confirming transaction commit. The transactional consumer will ensure the atomic submission of offsets and the consistency of message processing through transaction submission after processing the message.

Simply put, Kafka mainly uses consumer offsets to ensure that messages are not consumed repeatedly. Kafka uses the coordinator and internal TOPIC (__consumer_offsets) internally to ensure that consumer instances do not consume repeatedly. Developers can manually submit consumption offsets to ensure that when consumption fails or an exception occurs, offsets are not submitted to avoid repeated consumption. Developers can also avoid repeated consumption by performing idempotent processing on the consumer side .

How does Kafka ensure the order in which messages are consumed?

Kafka ensures the order of messages through partitions and offsets within partitions. A topic can be multiple partitions, but the data in each partition is sequential. The messages sent by the producer are appended to the end of the partition sequentially, and the consumers also consume data sequentially.

That is to say, if a topic has multiple partitions, the order of messages is only guaranteed within the partition, and the order of messages between different partitions cannot be guaranteed by Kafka. If the application's logic requires strict global ordering, then all related messages should be sent to the same partition, or the application performs additional processing on the consumer side to achieve global ordering.

How does Kafka ensure that messages are not lost?

  1. Persistent storage : Kafka stores messages persistently on disk, not just in memory. Messages for each topic are written to multiple partitions, and each partition can have multiple replicas. In this way, even if a Broker or disk fails, messages can still be recovered from other copies, avoiding message loss.

  2. Replication mechanism : Kafka uses a replication mechanism to provide high availability and fault tolerance. The messages of each partition can be replicated on multiple Brokers to form a replica set. One replica in the replica set is designated as the leader and is responsible for read and write operations, while the other replicas are followers. When a leader replica fails, one of the followers is elected as the new leader, ensuring message durability and availability.

  3. Write acknowledgment (Acknowledgement) : When sending a message to Kafka, the producer can choose to wait for confirmation that the message is successfully written. The producer can configure the number of replicas to wait for, and the producer will receive confirmation only after the specified number of replicas have been successfully written. This mechanism ensures that messages are written to enough replicas to reduce the risk of message loss.

  4. Replica synchronization : Kafka uses a replica synchronization mechanism to ensure that replicas of partitions remain in sync. When a message is written, the leader replica copies the message to all follower replicas and waits for their acknowledgment. The leader replica will consider the message write successful only after all follower replicas have successfully replicated and acknowledged the message. This synchronization mechanism ensures data consistency and reduces the possibility of data loss.

  5. Consumer displacement submission : When consumers consume messages, they will regularly submit the consumed displacement (offset) to Kafka. In this way, even if the consumer fails or rejoins the consumer group, it can resume consumption based on the last submitted displacement. Through offset submission, Kafka can track the progress of consumers and ensure that messages are not consumed twice or lost.

To summarize briefly, Kafka's multi-copy redundant data mechanism can be used with multiple backups on multiple nodes. And the ACK mechanism ensures that after the producer sends a message, it can choose that some or all replicas receive the message before confirming the end request, similar to the transaction mechanism. Consumers can also avoid repeated consumption or loss of messages through active submission of offsets.

How to deal with message backlog

  1. Improve the consumer's processing capabilities: You can increase the consumer's concurrent processing capabilities by increasing the number of consumer instances. This allows messages to be consumed faster and reduces backlog.

  2. Adjust the consumer's processing logic: If the consumer's processing logic is complex, it may result in slower processing speed. You can evaluate and optimize the consumer's code logic to improve its processing efficiency.

  3. Increase the number of partitions: Increasing the number of partitions for a topic can improve message parallelism. In this way, the messages in each partition will be evenly distributed to different consumer instances, thereby alleviating the backlog problem.

  4. Adjust the producer's sending rate: If the message squeeze is caused by the producer's sending rate being too fast, you can adjust the producer's sending rate to slow down the message generation rate so that the consumer has enough time to process the message.

  5. Increase the resources of the Kafka cluster: If the resources of the Kafka cluster (such as disks, network bandwidth, etc.) are insufficient, message processing may slow down. You may consider increasing the resources of the cluster to increase overall processing capabilities.

To sum up briefly, under normal circumstances, consumption pressure is high, and consumption speed can be increased by temporarily expanding the number of zones and consumers. Follow-up based on my own experience, is it difficult to solve the problem of message backlog? The details of the code optimization of the idea are fully disclosed [2], and the project difficulties are summarized.

Kafka source code

Production side

NIO network communication module : How Kafka implements a set of industrial production-grade network underlying communication modules based on Java NIO

Memory buffer pool design : How the Kafka client designs a high-throughput buffering mechanism that supports millions of concurrencies

Sender sending thread : The key point is, how does the Kafka client send batches of messages to the Kafka Broker through network communication? This involves many details of network communication, some parameter settings, and handling of network failures.

Cluster metadata pull and update mechanism : cluster metadata pull components and pull timing; how metadata is cached on the client; how to support fine-grained on-demand loading and synchronous waiting for Topic metadata.

Server

Cluster architecture : How the cluster architecture of Kafka Broker is implemented; how each Broker forms a cluster after being started; how the cluster controller is elected; how the fault recovery high-availability architecture is implemented, etc.

Server-side network communication module : How the Kafka server-side network communication module is implemented, understand the Reactor design model and Kafka's Reactor-based network architecture that supports ultra-high concurrency, and in-depth analysis of Acceptor threads, Processor threads, RequestChannel, IO thread pool and other networks The underlying communication component implementation and the entire request processing process source code.

Partitions and replicas : How to implement multi-copy redundancy and high-availability architecture; how leader and follower data are synchronized, how replicas are transmitted, and how the HW and LEO between them are changed; how to ensure system high availability after the leader's Broker fails and goes down; replicas How the manager manages replicas; how the broker updates the metadata cache asynchronously, etc.

Load balancing and scaling architecture : How to ensure that data is evenly distributed on the Broker machines of the cluster; how to scale the Partition of the Topic and the Broker.

Log storage architecture : How Kafka stores efficiently; how disk reading and writing is implemented; what is the storage structure of logs; how to use OS Cache, zero copy, sparse index, sequential writing and other excellent designs to support ultra-high throughput storage architecture.

Consumer side

Consumption process : how the consumer is initialized; how to communicate with the server; how the Consumer's poll() method consumes data.

Consumer group management : Consumer Group concept, state machine flow; how a Consumer joins the Group after startup; the entire process of Consumer Group management; the design principles of Consumer Group metadata management, etc.

Coordinator mechanism : How the Consumer Coordinator works; how the Consumer Coordinator elects the Consumer Leader; how the Consumer Leader formulates the partition allocation plan; how the Consumer Coordinator issues the partition allocation plan; and how the Consumer regularly sends heartbeats to the Consumer Coordinator, etc. wait.

Message rebalancing mechanism : several rebalancing scenarios; and rebalancing source code process analysis.

__consumer_offsets : topic within topic.

Subscription status and Offset operation : How the consumer subscription status saves and tracks the correspondence between Topic Partition and Offset; and understand how to obtain and submit Offset, etc.

How to design a message queue?

When designing a message queue, you can consider the following key aspects:

  1. Define requirements and purposes: Clarify the purpose of the message queue, expected throughput, latency requirements, data size limits and other requirements. Understanding the problem you are trying to solve and the goals you want to achieve will help determine the appropriate design direction.

  2. Choose the appropriate message queue middleware: Choose the appropriate message queue middleware based on your needs and usage. There are many open source and commercial message queue middlewares to choose from, such as Kafka, RabbitMQ, ActiveMQ, etc. Consider aspects such as their features, reliability, performance, scalability, and community support.

  3. Define message format and protocol: Design the message format and protocol, including the message structure, data fields, and possible message headers and metadata. This helps with message consistency and scalability, and provides consumers with useful information.

  4. Consider message persistence and reliability: Determine whether messages need to be persisted to disk to prevent message loss. Some message queue middleware provides persistence options to ensure that messages are not lost in the event of a failure.

  5. Consider the order of messages: If the order of messages is important to your application, you need to choose a message queue middleware that supports ordered message delivery, and design appropriate message partitioning and order guarantee mechanisms according to your needs.

  6. Consider message partitioning and scaling: If you anticipate a large number of messages and high throughput, you need to consider how to partition and scale messages. This improves parallelism and scalability of the system, ensuring messages are processed efficiently.

  7. Implement appropriate security mechanisms: Consider message security, including authentication, access control, data encryption, etc. Choose appropriate security mechanisms based on your needs to protect the confidentiality and integrity of messages.

  8. Monitoring and management: Design monitoring and management mechanisms to enable real-time monitoring of message queue performance, throughput, latency and other indicators. This helps identify and troubleshoot problems promptly.

Designing a message queue requires comprehensive consideration of multiple factors and making decisions based on actual needs and application scenarios. These guidelines can help you start designing a reliable, performant, and scalable message queuing system.

I just read it. The last article was published on June 27th. This article was written on July 23rd. It was roughly written on the 21st. I tinkered with it for a long time and posted it on Monday, which is the first time. January. This update frequency is indeed a bit slow, and we will continue to invest time in the future. Blogger, it has been almost a month, haha, a lot of things have happened. Except for the last week, which was ordinary, the other two weeks were full of excitement and surprises for me. As for me, new goals and motivation have been activated. I will balance work, life and study in the future. This world is very interesting, I’m serious.

I am lazy. To be honest, in the past few weeks, I have not had enough time to study. I have focused more on work and life. I also worked overtime last week, which was very annoying. In terms of life, maybe my mentality has changed again. I don’t know why recently, but I have read a lot of inspirational words and done things that I never dared to do before. It’s very bold, I’m serious. The result is pretty good, I’m very happy anyway, haha, it’s a wonderful feeling, I feel like my mentality is more positive. No matter what the result is, I will do better without going against my will, come on! I reiterate my life creed, I want this painful and oppressive world to bloom with the flowers of happiness, and offer my blessings to this beautiful world! !

References

[1]

https://mp.weixin.qq.com/s/kMIhPW2uLdy-mgS9sF6agw: https://link.juejin.cn/?target=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FkMIhPW2uLdy-mgS9sF6agw

[2]

https://juejin.cn/post/7209657931124555834: https://juejin.cn/post/7209657931124555834

Guess you like

Origin blog.csdn.net/weixin_54542328/article/details/133344995