Messaging middleware FAQ

Availability, repeated consumption, power, etc., the reliability of the transmission, the message is lost

1, kafka, rabbitMQ, activemq, rocketMQ using scene selection and distinguishing
throughput, the impact on throughput topic number, timeliness, availability, reliability, core features, advantages and disadvantages are summarized
activemq: Throughput ten thousand very mature, more powerful features, a large number of projects the company has re-applied occasionally have a low probability message is lost in recent years more and more less and less maintenance and less official community, and indeed mainly based on decoupled and asynchronous to use less use in large-scale throughput scenarios
rabbitMQ: throughput ten thousand have relations with servers based on erlang development, low latency performance is better, but also provides the open source community more active management interface, in recent years, the Internet company with rabbitmq more, because the source code based on erlang do not understand the language, more difficult to customize and control;
rocketMQ: single throughput 10w, topic can reach hundreds or thousands more level topic throughput will decrease to a lesser extent, the large-scale use of Ali, compare reliable daily processing on the message 10 billion to facilitate development, community maintenance can only MQ complex business scenarios
kafka: functional profiles Single, simple mq major support functions, ms delay extremely high level of availability and reliability but the news repeated consumption calculated in real time, and log data collected in the field is large scale use.

Small and medium companies: rabbitMQ, technical strength is generally not very high challenge the community more active;
large companies: rocketMQ stronger R & D infrastructure
if it is calculated in real time big data field, log collection and other scenes with kafka is the industry standard, the community activity relatively high and will not yellow, not to mention real-time norm all over the world in this field;


2. Message Queue usage scenarios, why the use of message queues
decoupling between systems by decoupling between the different systems publish-subscribe model, since such a system B system A Message, A system is only responsible for sending to a message queue message, B simply consumption, if the new C, D, etc. A consumer need not concern;
clipping when the system suddenly A large amount of data, if the call by A storage system RPC, A system is likely to be linked to the system to play, so you can clipping is performed by MQ, then slowly consumed by the system a;
Induction example: a need to rely on one or a notification message to a plurality of system a system C, D, E, F and the like, by way of the message queue and does not require synchronous call rpc way calling;

Disadvantages: increasing the complexity of the system, if the entire system will hang MQ hang up, the problem caused by MQ (lost message, message sequence, message transmission is not guaranteed duplicate message)

3, RabbitMQ kafka and high availability of the available test
is not distributed, a cluster
of three modes: stand-alone mode, and a mirror mode trunked mode
trunked mode: Start rabbitmq plurality of instances on multiple machines, each machine a start. But the queue you created, will put on a rabbitmq instance, but each instance metadata synchronization queue; when the consumer actually be connected to another if one instance
so that instance will pull data from the queue instance where come. This mode is therefore not reliable mode, if the queue where the data is stored linked instances of data is lost, if the additional amount of data is too large amount of data transfer between two instances too;
mirror trunked mode: high availability mode, each node when there is a complete image on the queue (metadata and message content), and each write to queue the message, the message will automatically synchronize multiple instances of the queue inside;
drawbacks: performance overhead is too large, the message needs to be synchronized on all machines, resulting in increased network bandwidth is no way to load the machine scales linearly queue;
open the way: You can specify requirements while data synchronization to synchronize all the nodes or to a specified number of nodes from the management console, new policy (mirrored cluster mode), again when will create a queue using this strategy;

Not distributed: means that data on a single cluster node complete storage
kafka distributed
broker kafka process is started on each machine one of his process, it can also be considered to be a node kafka cluster;
kafka by multiple broker each composition is a broker node; create topic in the topic will be divided into a plurality of partition, each partition may be present on a different broker, a portion of each partition to store data;
Kafka natural distributed message queue is a topic data may be stored to a plurality of partition each partition may be distributed over different Broker;

In fact rabbitmq like not distributed message queue, he is the only message queue provides a number of clusters, the mechanism of HA only, because no matter how to play, rabbitMQ data in a queue of a node is put under ;
in the mirror each cluster node that is a complete data are stored in the queue;

HA mechanism: Other copies of copies of copies of the plurality of leader election mechanism kafka by the presence of HA-copy partition is fllower, write data to be synchronized to the leader fllower synchronization data, the client connection to the leader;
case : Once a machine is down, if it is not a leader, but this time on other machines as well as fllower this time kafka will automatically sense the leader dies, the other will be elected as the leader fllower node;


Retransmission, supplemental policy 4. messages, how to ensure that messages will not be repeated consumption (how to ensure idempotency)
RabbitMQ, rocketMQ, Kafka used to live and other consumer messaging middleware will appear the message repeated consumption of a problem, because this should not be a guarantee mq guaranteed, is the consumer side promised myself;
Kafka used to live repeated consumption: there is a concept of the offset kafka, is written into each message will have an offset, on behalf of its serial number and consumer spending data after
reason: time to time will to offset their own consumption over the message submit, on behalf of the consumer has been passed; zookeeper based implementation, zk which will record the current consumption to offset the consumer's location;
consumers are not a complete consumption data to immediately submit offset, and is the timing of submission of the next periodic offset; if the client hung up or reboot the lead to offset failure to submit the consumer when it will again be repeated consumption from the last start spending;
solution (to ensure idempotent): combination of business
1) get data write library can be a primary key or security in accordance with the respective primary key Or data to the first check under the database (note from the main library) suggested queries main library to filter or update;
2) identification by caching or distributed lock redis course, if written directly in the set there will be no problem, after all, natural idempotent;
3) complex points of the scene is the message which increased globally unique identifier, to find redis go inside after receiving the message, if it is not found to consume and then recorded in redis


? ? ? ? If multiple consumers is how to deal with? In rabbitmq is how to deal with?

How to ensure data 5. MQ system does not lost, the data is a no less no more a
lost cause: the producer during the news is not written in mq, mq own loss, or loss of consumers in the consumption process;
RabbitMQ: news not transferred to the rabbitmq is lost, or rabbitmq the message has been received but not yet had time to save on issues emerged or been saved to memory but not serialized to disk, consumer spending to deal with the message but have not had time to hang up rabbitmq that the consumer has consumed a success;
solution:
producers can open a transaction (blocking would cause throughput degradation) or asynchronous confirmation;
the first step in the creation of metadata rabbitmq in the queue of the queue set it as a persistent guarantee serializes; part II: when sending a message to the message body that is persistent deliveyMode = 2, to ensure that persistent messages;
Confirm the mechanism while the mechanism with producer persistence and together, to ensure that only the message body persistence? ? ? ? (This is saved to memory or save to your hard drive, if you save will be lost to memory can not be completely avoided) will inform the producer after completion;
the consumer side off the automatic reply confirming autoAck mechanism
kafka:
Manufacturer: write failed, set the retry If more than n times the number of manual intervention; if set to max
server is lost: a copy of the synchronization process in the partition, if the leader had just received a copy of the message and the producers that they have saved a copy when ready to hang up synchronization
solution : each partition must have at least two copies and have at least one copy remains with the leader and set acks = all connected to the producer, requires written to all copies is considered a write success;
Consumer side lost data: the message to the consumer, the consumer automatically submitted offset, but the client process is abnormal, the server that the client has completed the consumer, resolve commit offset manual mode after the client has been processed;


6. How to ensure the order of the message (additions and deletions to a producer data binlog, if not assure an orderly)
scenarios: 1) a queue of the plurality of consumer RabbitMQ
2) kafka: a topic, a partition, a consumer consumer use of multithreading
for ActiveMQ: You can use exclusive consumer, and the consumer is inside the loop current, or do not have exclusive consumer. Inside the loop current can consume consumers will be selected news article.
rabbitMQ: split into a plurality of queue, a queue, or a total of only one consumer, but a queue corresponding to a consumer and consumer use internal queue for storage and distribution processes;
Kafka: a topic, a partition, a consumer using a single thread writes consumption in the N memory queue, and the consumer together with the N threads a queue; (hash for threading respectively according to corresponding service id)
data is written in a certain partition sequential, producers can specify the time to write a key, the key relevant data will be written in the same partition and is in order;? ? ? ? ?
? ? ? ? How to configure a queue for multiple consumer or only a consumer?


7, if there is a message backlog how to deal with? (? Delay expires message queue and message queue is full question how to deal with the backlog of several million messages continued for several hours?)
Analysis backlog reasons:
1) If the consumer hung up, restart consumers, but consumers need to increase monitoring data storage such as a database bottleneck;
2) if the normal consumer, the consumer, however, create a new topic partition 10 is provided to transmit the received new topic to topic, and then let the consumer to change the original topic for connecting new patition consumption;

Failures Message Queue (causing expiration time): the team news ban expires, if the data is lost, the program will need to manually write the missing data by the program to make it up to find;

8, how to design a system message queue
1) supports scalable rapid expansion, increasing capacity and throughput, can refer kafka concept: broker-> topic -> part of the data sub-partition each partition a machine, storage; if existing topitc enough resources to increase partition, and data migration, increase machine;
2) data persistence persistent sequential write, there is no disk random read and write addressing overhead, and high performance disk read and write the order for reference kafka ideas;
3) the availability of high availability using kafka multi-copy protection mechanism, leader mechanism
4) loss of data can be used kafka 0 0 data loss programs;


9. Use Which MQ, and others have even mq compare what advantages and disadvantages, MQ connection is thread safe for you, what your company MQ Services Architecture


7. The reasons for the high throughput kafka.

8. kafka difference and other message queue, kafka realize how master-slave synchronization

9. Using mq how to achieve eventual consistency

10. Use kafka have not encountered any problems, how to solve

11. MQ repeated consumption may occur, how to avoid, how to do idempotent

12. MQ message latency how to deal with, what message you can set the expiration time, the expiration shoots as usual you how to deal with


16, kafka, activemq, rabbitmq essential difference is what


rabbitMQ: https://www.jianshu.com/p/787d155ff4e1

Guess you like

Origin www.cnblogs.com/gudicao/p/11649628.html