kafka data curse and failover

CAP hat theory.

consistency: consistency Availability: availability partition tolerance: partition tolerance

 

 

 CA: mysql oracle (abandoned network partition)

CP: hbase redis mongodb (abandoned availability)

AP: cassandra simpleDB (abandoning strong consistency, using weak consistency or final consistency, irregular consistency)

 

consistent plan

master-slave(hadoop)

After the WNR is read, it is necessary to determine which data is the latest. Common practice (version number or timestamp)

 

 

 

 

 

 Usually, data is read from the leader, and the follower is to prevent the leader from downtime for availability assurance. The data is pulled by the follower from the leader, similar to the consumer

 

 Kafka is neither a synchronous nor an asynchronous mechanism, but uses the isr mechanism. (kafka must ensure that all data is committed once the data is committed)

Once it is found that the distance between the follower and the leader is too large, the node will be removed. The condition that the gap is too large is the time or the number of entries:

This is a highlight of the difference between Kafka and other systems. It neither adopts synchronous replication nor asynchronous, and adopts the design of dynamic control in the middle.

min, insync.replicas is the choice of kafka backup, usually 2 is safer

request.required.acks 

0: This means that the producer does not wait for an acknowledgment from the broker that the synchronization is complete to continue sending the next (batch) message. This option provides the lowest latency but the weakest durability guarantee (some data will be lost in the event of a server failure, e.g. the leader dies, but the producer does not know about it, and the broker does not receive messages sent out).

1: This means that the producer sends the next message after the leader has successfully received and acknowledged the data. This option provides better durability as clients wait for the server to confirm that the request was successful (only messages that were written to a dead leader but not yet replicated would be lost).

-1: This means that the producer does not count a transmission until the follower replica confirms that it has received the data. 
This option provides the best durability, we guarantee that no information will be lost as long as at least one in-sync replica remains alive.

 

 As can be seen from the above figure, Kafka can only consume the commit data. For example, M3 data will be lost from 3 to 4, because M3 has never been committed when the leader is down, so the data will be lost before the default retry is successful, but if the retry is successful, it will be inserted after M5, and the sequence is also It has changed (so the order of kafka is the order of comit rather than the order of sending, and if it is not handled properly, there will be data loss), once the down node recovers, you need to check out all the backward data until the critical point set by isr ( such as 4K entries) will be added to the ISR list.

There is an option to configure whether to restore the list in the isr when all nodes fail, or whether all machines are in the ISR or not (default option)

The number of backups cannot exceed the number of brokers

By default, Kafka's replicas and leaders are distributed as evenly as possible. Because the read and write are all through the leader, the performance needs to be as uniform as possible.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324615419&siteId=291194637