Kafka (distributed streaming system)

How Kafka guarantees sequential consumption

Sending end: The
sending end cannot send asynchronously. In the case of asynchronous sending, there is no way to guarantee the message order.

Storage side:
(1) Messages cannot be partitioned. That is, a topic can only have one queue. In Kafka, it is called partition; in RocketMQ, it is called queue. If you have multiple queues, the messages of the same topic will be distributed into multiple partitions, and the order cannot be guaranteed.

(2) Even if there is only one queue, there will be a second problem. Can I switch to another machine after the machine hangs up? That is the problem of high availability.

For example, if your current machine hangs, there are still messages on it that have not been consumed. Switch to another machine at this time, availability is guaranteed. But the order of the messages was messed up.

To ensure that, on the one hand, synchronous replication, not asynchronous replication; on the other hand, to ensure that before cutting the machine, all messages must be consumed on the hung machine, and there must be no residue. Obviously, this is difficult.

The receiving end:
For the receiving end, it cannot be consumed in parallel, that is, multiple threads or multiple clients cannot consume the same queue.

How to ensure that there is no duplication and no loss

Kafka message guarantees that the production information is not lost and repeated consumption problems
1) When using the synchronous mode, there are 3 states to ensure that the message is safely produced. If it is configured as 1 (only to ensure that the leader is successfully written), if the leader partition just happens , Data will be lost.

2) There is another case where the message may be lost. When the asynchronous mode is used, when the buffer is full, if it is configured to 0 (the buffer pool is full when the confirmation is not received, the message in the buffer pool is emptied. ), The data is immediately discarded.

Ways to avoid data loss during data production:
As long as you can avoid the above two situations, you can ensure that messages will not be lost.
1) In synchronous mode, the confirmation mechanism is set to -1, which means that the message is written to the leader and all copies.

2) Also, in asynchronous mode, if the message is sent, but the acknowledgment has not been received, the buffer pool is full, set in the configuration file to not limit the blocking timeout time, that is, let the production end keep Blocking, so that the data will not be lost. During data consumption, the method to avoid data loss: If Storm is used, the ackfail mechanism of Storm should be enabled; if Storm is not used, the offset value should be updated after confirming that the data has been processed. The low-level API requires manual control of the offset value.

Guess you like

Origin www.cnblogs.com/yyml181231/p/12693480.html