How to ensure high availability message queue?

Copyright: ~ reproduced please marked, thank you - if infringement please private letter to me, I will immediately delete ~ https://blog.csdn.net/baidu_26954625/article/details/90644203

本系列内容转载自git项目advancejava

Face questions analysis

The question you ask is good, because you can not ask for high availability Kafka's how to ensure that? ActiveMQ how to ensure high availability? If the interviewer asks a so it is not very level, people may use is RabbitMQ, not used Kafka, Kafka people up to ask you to do? This is not made clear what people make things difficult.
So there are levels of interviewer ask is how MQ high availability guarantee? This is what you used MQ, you talk about your understanding of the high availability of the MQ.

RabbitMQ High Availability

RabbitMQ is more representative, because it is based on a master-slave (non-distributed) to do high-availability, we have to RabbitMQ as an example to explain how to achieve the first MQ high availability.
RabbitMQ has three modes: stand-alone mode, normal mode cluster, the cluster mirroring mode.

Stand-alone mode

Stand-alone mode, is the Demo level, the general is that you start a local children's play, with no one to produce stand-alone mode.

Ordinary cluster mode (no high availability)

Ordinary cluster model, meaning that start multiple instances RabbitMQ on multiple machines, each machine a start. Queue you created, and only on a RabbitMQ instance, but each instance synchronization queue metadata (metadata may be considered to be some configuration information queue through the metadata, you can find instances where the queue). When you consume, in fact, if the connection to another instance, then that instance will pull examples from the queue where the data come on.
This approach does a lot of trouble, not how good, did not do so-called distributed, is a common cluster. Because it leads consumers to either randomly each time you connect one instance and then pull the data, either a fixed connection that queue instance where consumption data, the former overhead pull data, which leads to single-instance performance bottlenecks.
And if that's put down the queue instance, will lead to other instances of the following will not be able to pull from that instance, if you open the message persistence, let RabbitMQ floor to store the message, then the message will not necessarily lose, you have to wait for this example restored before you can continue to pull data from the queue.
So this thing is more awkward, and this is no so-called high-availability, mainly to improve the throughput of this program, that allows multiple nodes in a cluster to serve a queue of read and write operations.
Here Insert Picture Description

Mirroring cluster mode (High Availability)

This model is called the High Availability mode RabbitMQ. With ordinary cluster model is not the same, in mirror mode cluster, queue you created, or whether the metadata inside the message queue will exist on more than one instance, that is to say, each node has a complete RabbitMQ of this queue mirror, comprising all the data queue means. Then every time you write a message to the queue, the message will automatically sync to multiple instances of the queue.
So how do you turn this image cluster model? Actually very simple, RabbitMQ have a good management console is a new strategy in the background, this strategy is a strategy mirrored cluster mode, the specified time is required to synchronize data to all nodes, you can also specify the number required to synchronize node, create a queue again when applying this strategy, it will automatically synchronize data to other nodes go up.
In this case, the benefit is that you either machine goes down, all right, other machines (nodes) also contains the complete data of the queue, other consumer can consume up data to other nodes. The downside is that, first, this performance overhead is too big now, the message needs to be synchronized on all machines, resulting in network bandwidth consumption and heavy pressure! Second, such a play, not distributed, there would be no extension at all, and if a queue is heavily loaded, you add a machine, the new machine also contains all the data of the queue, and there is no way linear expansion your queue. You think if the amount of data the queue of big, capacity on this machine can not be accommodated at this time how to do it?
Here Insert Picture Description

Kafka's High Availability

Kafka a basic understanding of the architecture: a plurality of broker, each broker is a node; you create a topic, this topic can be divided into multiple partition, each partition can exist on a different broker, each partition on put part of the data.
This is a natural distributed message queue, that is a topic data is dispersed on multiple machines, each machine to put a portion of the data.
In fact RabbmitMQ and the like, are not distributed message queue, it is the traditional message queues, but offers some clustering, HA (High Availability, HA) mechanisms for it, because no matter how play, RabbitMQ of a queue data are on a node inside, mirroring the cluster, each node also put the complete data of the queue.
Kafka 0.8 previously, there is no mechanism of HA is any broker is down, partition on that broker on the waste, can not write are unable to read, nothing to speak of high availability.
For example, we suppose that you create a topic, specify the number of partition is 3, respectively, in the three machines. However, if the second machine is down, it will lead to a third of data on this topic is lost, so this can not be done highly available.
Here Insert Picture Description
After Kafka 0.8, provides HA mechanism is replica (replica) a copy of the mechanism. Each partition data will be synchronized to other machines, forming a plurality of replica copies of their own. All replica will elect a leader out, then the leader of production and consumption are related to this deal, then other replica is follower. The time of writing, leader will be responsible for data synchronization to all follower up, when reading data can be read directly on the leader. You can read and write leader? Very simple, if you are free to read and write each follower, then they would care data consistency problem, the system complexity is too high, it is easy to go wrong. Kafka will be evenly distributed to all replica of a partition on different machines, so that it can increase fault tolerance.
Here Insert Picture Description
So engaged, there is the so-called high-availability, because if a broker is down, all right, the broker above partition on the other machines have a copy if there is a partition of this top leader, so at this time will re-elect a new leader from the follower, we continue to read and write the new leader can be. This is the so-called high availability.
When writing data, the producer wrote leader, then leader will write data to a local disk floor, then other follower on their own initiative to pull data from the leader. Once all the data is good follower synchronization, it will be sent to the leader ack, ack after receipt of all follower of the leader, will return to write messages to the successful producer. (Of course, this is just one of the modes, you can adjust this behavior)
consumption when only read from the leader, but only when a message has been successfully synchronized all follower have returned ack when the news will be consumers read.
See here, I believe you generally understand how Kafka is to ensure high availability mechanisms, right? We will not know anything about the site but also to the interviewer to draw chart. If the case is indeed a Kafka expert interviewer, digging a question, then you can only say I'm sorry, you're not too deeply studied.

Guess you like

Origin blog.csdn.net/baidu_26954625/article/details/90644203