Some knowledge about kafka

  1. Broker: Each server in a Kafka cluster is called a Broker. Each Broker is an independent Kafka server instance, responsible for handling the storage and forwarding of messages. In a production environment, there are usually multiple Brokers forming a Kafka cluster.

  2. Topic: Topic is a classification of messages, and each Topic has a unique name. Producer sends messages to Topic, and Consumer reads messages from Topic.

  3. Partition: Each Topic can be divided into multiple Partitions. Partition is a physical concept and is the basic unit for message distribution between Brokers. A Partition only belongs to one Broker, but a Broker can have multiple Partitions.

  4. Offset: Offset is the offset of the message in the Partition, which is used to uniquely identify a message. The messages in each Partition have a unique Offset. Consumers can use Offset to track the messages they have consumed.

  5. Producer: Producer is a client application that sends messages to the Kafka cluster.

  6. Consumer: A Consumer is a client application that reads messages from a Kafka cluster.

  7. Consumer Group: A Consumer Group contains multiple Consumer instances, which jointly consume one or more Topics. A Partition can only be consumed by one Consumer instance in the same Consumer Group. If the number of Consumer instances in the Consumer Group is greater than the number of Partitions, some Consumer instances will remain idle.

  8. ZooKeeper: ZooKeeper is the coordinator of the Kafka cluster. It is used to manage the status of Broker and Consumer. Kafka relies on ZooKeeper to store metadata information, be responsible for Leader election, and register Broker and Consumer.

Will the previous data in kafka be lost after kafka restarts?

If the Kafka cluster has been shut down normally and all data has been successfully written to the Kafka Topic before shutdown, there should be no data loss after restarting.

Kafka adopts a distributed architecture, and data is usually copied to multiple Brokers to ensure data reliability and high availability. After restarting, the Kafka cluster will automatically restore the topic's partition copy and data according to the configuration, and ensure that all messages are correctly copied to the Broker.

However, if some data is not successfully written to the Topic before Kafka is shut down, or other failures occur during the restart process, data loss may occur. In order to avoid this situation, it is recommended to ensure that all data has been successfully written to the topic before shutting down the Kafka cluster, and to back up the Kafka data before restarting so that it can be restored when necessary.

How to load data in kafka when kafka restarts

After Kafka restarts, Kafka will automatically restore the topic's partition copy and data according to the configuration. Specifically, Kafka will load previously saved message data from the message log on disk, and reconstruct metadata information such as the Topic's partition copy and consumer offset in memory.

Each Topic in Kafka corresponds to one or more partitions, and the message data in each partition will be saved on disk. When Kafka restarts, it will automatically restore the status and message data of each partition by checking the status of the partition copy and consumer displacement, and redistribute the partition's copy. After the partition copy recovery is completed, Kafka will continue to read messages from the last consumer position, ensuring that the consumer can continue to consume messages from the previous position immediately after recovery.

It should be noted that the Kafka recovery process may take some time, depending on factors such as the size of the cluster, number of partitions, and disk speed. During this process, producers and consumers may be temporarily unable to access the Kafka cluster until the cluster is fully restored and becomes active. Therefore, before restarting Kafka, you should back up all data so that it can be restored if necessary, and fully test the cluster's recovery process to ensure that the cluster can recover quickly and reliably.

Guess you like

Origin blog.csdn.net/heihei_100/article/details/130222849