[System Architecture] The architecture and principle of message middleware

The role of message middleware is to be a carrier for asynchronous concurrency capabilities. Not only that, it still needs to ensure a lot of capabilities in the architecture, high availability, high concurrency, scalability, reliability, integrity, order guarantee, etc. These have caused headaches for various designers; there are some abnormal needs, such as slow consumption, non-repetitive, etc. The design cost that needs to be spent is quite high, so do n’t blindly superstition open source Daniel, for many mechanisms, almost It is necessary to rebuild; it is not so simple to build a private cloud that is suitable for all businesses, easy to use, and universal.

If a payment system processes billions of business orders every day, the processing capacity of message middleware must be at least nearly 10 billion, because many systems rely on the clustering capacity of middleware, and to ensure that no errors can be made, so, let Let's analyze the middleware a little bit from some aspects of the architecture.

High Availability (High Availability)

High availability is an eternal topic. This is also a measure of reliability in the financial world. You should know that architects in the financial industry will find ways to prevent data loss, even if it is a piece of data, but in fact, this thing In theory, it depends on character. . . This is not a flicker.

To give an example, in the Internet data architecture, at least three copies of a data are called high guarantees, but in fact, Google ’s Belgian data center was permanently lost 0.00001% after being struck by lightning on 8.13, less than 0.05 % Of the disks have not been repaired. What I want to say here is that the time is right and the situation is very important. There is nothing impossible under the extreme conditions, there will be architectural vulnerabilities. Let us look at the general practice of mq high availability:

The following figure is the HA solution of activemq:

1233356-702eb3dea716f13b.png

Activemq's HA is hosted through the master / slave failover, where the master-slave switch can be switched in multiple ways:

1: Use a nfs or other shared disk device to perform a shared lock. Mark the state of the master by occupying the shared file lock. When m hangs up, the corresponding slave will occupy the shared_lock and convert to master

2: The management of the cluster through zookeeper is more common and will not be described here

The following figure is the HA solution of metaq

1233356-a908be28c6cd391e.png

As shown in the figure above, the same is true for the master and slave nodes that manage brokers through zk.

Of course, this is only one of the failover mechanisms. It can only guarantee that the message is converted to the slave when the broker hangs up, but it cannot guarantee the loss of the message in the middle of the process.

When the message flows from the broker, it is likely to be caused by a downtime or other hardware failure, which may cause the message to be lost. At this time, a relevant storage medium is needed to guarantee the message.

Then we take the storage mechanism of kafka as a reference. We must know that the dependence of message middleware on storage not only requires fast speed, but also requires very low cost of IO requirements. Kafka has designed a set of storage mechanism to meet the above requirements. Here is simple introduce.

First, the topic in kafka is divided into multiple partitions in a distributed deployment. The partition is equivalent to a load on the message, and then routing is performed by multiple machines. For example: a topic, debit_account_msg will be divided into debit_account_msg_0, debit_account_msg_1 , debit_account_msg_2. . . Wait for N partitions, each partition will generate a local directory such as / debit_account_msg / topic

The file inside will be divided into many segments, each segment will define a size, such as a segment of 500mb, a file is divided into two parts index and log

    00000000000000000.index
    00000000000000000.log
    00000000000065535.index
    00000000000065535.log

The number represents the starting point of the index of the value of msgId, and the corresponding data structure is as follows:

1233356-972f90ede17eaaa8.png

1,0 represents the message with msgId of 1, and 0 represents the offset in this file. After reading this file, find the corresponding segment log file and read the corresponding msg information. The corresponding information is a fixed Format message body:

1233356-cbdd6229fc1c0ea4.png

Obviously, the simple application of this mechanism is definitely unable to meet the high concurrent IO, first search the segmentfile binary, then find the corresponding data through offset, then read the msgsize, then read the report body, at least 4 times the disk io, the overhead is relatively large , But the sequential reading used when pulling, basically has little effect.

In addition to the above query. In fact, before writing to the disk, they are read and written on the pagecache on the os, and then the hard disk is flushed (LRU strategy) through asynchronous threads, but this risk is actually very large, because once the os goes down, it will cause The loss of data, especially in the case of slow consumption and backlog of a lot of data, but Kafka's younger metaq has made a lot of changes to this piece, and the replication mechanism of these partition files (used by Ali), so at this level The chances of losing messages due to lightning strikes on the Internet will be relatively small. Of course, it does not rule out what happens when the optical cable in the host room is dug.

Having said so much, it seems to be more perfect and beautiful, but in fact the operation and maintenance cost seems to be very large. Because these are files, once a problem occurs, it is quite troublesome to manually handle it, and it is on a single machine, and it requires relatively large operation and maintenance costs to do some operation and maintenance specifications and API call facilities.

So, in this area, we can transform and store data on some nosql, such as mongoDB, of course, mysql is also possible, but io capabilities and nosqldb are not at the same level, unless we have a strong transaction processing mechanism, and financial Li is indeed quite strict about this requirement. Like the use of metaq behind Alipay, because the previous middleware tbnotify will be very passive in the case of slow consumption, and metaq will have a huge advantage in this area. Why, please listen to the decomposition later.

High concurrency

At the beginning, most of the engineers who used mq were used to solve the problem of 性能sum 异步化. In fact, for the same point, one io调度is not so resource-consuming. Without further ado, let us look at some of the high concurrency points in mq. Here are some famous middleware backgrounds:

Activemq was a specialized enterprise-level solution at that time. It complied with the jms specification in jee. In fact, the performance was still good, but when it was pulled to the Internet, it was a rabbit holding a watermelon, and there was nothing it could do.

Rabbitmq is written in erlang language, abides by the AMQP protocol specification, and is more cross-platform. The mode transfer mode should be richer and distributed in

rocketmq (the latest version of metaq3.0 today, kafka is also the predecessor of metaq, initially was linkedIn open source log message system), metaq basically wrote the principle and mechanism of kafka in java, after many transformations, support transactions, The development speed is very fast, and there are very good communities in Ali and China to do this maintenance.

For performance comparison, here are some data from the Internet for reference only:

1233356-9db0160ceb800c69.png

To be honest, at these data levels, the difference is not too ridiculous, but we can analyze some commonalities, what are the main performance differences?

Rocketmq is the successor of metaq. Except for improvements in some new features and mechanisms, the principles of performance are similar. Here are some highlights of these high performances:

The consumption of rocketmq mainly uses the pull mechanism, so for the broker, many consumption features do not need to be implemented on the broker, only need to pull the relevant data through the consumer, and like activemq, rabbitmq are older The way to let brokers dispatch messages, of course, these are also some standard delivery methods of jms or amqp

The file storage is stored sequentially, so you only need to call the segment data when you pull the message, and the consumer consumes the information to the greatest extent when making the consumption, it is unlikely to produce a backlog, and you can set io Scheduling algorithms, like noop mode, can improve the performance of some sequential reads.

pagecacheA hot consumption is achieved by hitting the data in the os cache.

The batch disk IO and network IO of metaq try to make the data run in one io, and the messages are all in batches, so that the scheduling of io does not need to consume too many resources.

NIO transmission, as shown below, this is an architecture of the original metaq. The original metaq used some high-performance NIO frameworks integrated with Taobao ’s internal gecko and notify-remoting to distribute messages:

1233356-fbc7dcdd0ded3c5d.png

The lightweight of the consumer queue, we must know that our message capabilities are obtained through the queue

Look at the picture below:

1233356-dde4266c8cd62300.png

metaq physically added consumer queue logic queue, the queue corresponding to the disk data is serialized, the queue will not be added to add disks iowaitburden, may be sequentially written, but when the read is still a need to read the random first is the logical queue, and then read the disk, so pagecacheit is important to try to make a large number of memory, this allocation will be fully used.

In fact, the above can basically guarantee our performance at a relatively high level; but sometimes performance is not the most important, the most important thing is to make an optimal balance with other architectural features, after all, there are Other mechanisms must be satisfied. This is because the three most difficult problems in the industry are: high concurrency, high availability, and consistency.

Scalable

This is a common question. For general systems or middleware, it can be better expanded, but in the middleware of messages, it has always been a hassle. Why?

Let me talk about the limitations of the expansion of activemq first, because the expansion of activemq requires the nature of the business. As a broker, you must first know the source and destination, but if these messages are distributed transmission, it will become complicated. Let ’s take a look at activemq. How does the load play:

1233356-b24977e544400a1d.png

We assume that the producer sends topicA messages. If all consumers are connected to each broker under normal circumstances, is it hot? If there is a message from the producer on the broker, it can be transferred to the corresponding consumer.

But if there is no corresponding messager in broker2 in the figure, what should we do in this case? Because it is assumed that there are many application system (producer) and dependent system (consumer) nodes of the same topic, how to expand the capacity? Activemq can do the normal part of the above picture, but it needs to change the corresponding configuration of producer, broker, consumer, which is quite troublesome.

Of course, activemq can also do dynamic search through multicast (someone mentioned that lvs or f5 is used as the load, but there is a big problem for consumers, and this load configuration has no substantial effect on topic distribution) However, there will still be the problem I said. If the topic is too large, each broker needs to connect all producers or consumers, otherwise the situation I said will appear, and the expansion of activemq is quite troublesome in this regard.

Let's talk about how metaq does this, look at the picture and talk:

1233356-888ca116dff81d84.png

Metaq is partitioned by topic. At this level, we only need to configure as many topic partitions as possible. In this way, slicing is a concept of 'business' as a routing rule; generally, there are many configurations on a broker machine. Topic, each topic generally has only one partition on a machine, if the machine is not enough, it can also support multiple partitions, in general, we can use the business id to model custom partitions, by obtaining the parameters of the sending area That's it.

1233356-53305f2951166b6d.png
1233356-8b0c266b9ca55e6d.png

reliability

Reliability is an important feature of message middleware. Let's see how mq circulates these messages. Take activemq as a reference first. It is based on the push & push mechanism.

How to ensure that every message sent is consumed? Activemq producers need to receive a broker's ack after sending a message to confirm the receipt. The same guarantee is also provided for brokers to consumers.

The mechanism of Metaq is also the same, but the broker to consumer is pulled, so its arrival guarantee depends on the ability of the consumer, but in general, the application server cluster is unlikely to have an avalanche effect.

How to guarantee the idempotency of the message? At present, basically activemq, metaq can not guarantee the idempotency of the message, which requires some business to ensure. Because once the broker times out, it will be retried, and if it is retried, new messages will be generated. It is possible that the broker has landed, so in this case, there is no guarantee that the same business pipeline will produce two messages.

How to ensure the reliability of the message? At this point, activemq and metaq basically have the same mechanism:

Producer guarantee: after producing data to the broker, it must be persisted to return ACK

Broker guarantee: After the metaq server receives the message, it refreshes it to the hard disk regularly, and then the data is copied to the slave through synchronous / asynchronous to ensure that the consumption will not be affected after the downtime.

Activemq is also stored locally through a database or file to do local recovery

Consumer assurance: Consumers of messages consume messages one after another. Only after successfully consuming a message will they continue to consume the next one. If it fails to consume a message (such as an exception), it will try to consume the message again (the maximum is 5 times by default). After exceeding the maximum number of times, it still cannot be consumed, and the message is stored on the local disk of the consumer, which is backed by a background thread. Continue to try again. The main thread continues to go backwards, consuming subsequent messages. Therefore, only after the MessageListener confirms the successful consumption of one message, the meta consumer will continue to consume another message. This ensures the reliable consumption of messages.

consistency

The consistency of mq we discuss two scenarios:

1: Guarantee that the message will not be sent / consumed multiple times
2: Guarantee transaction

Some of the mqs introduced above cannot guarantee consistency, so why not? The cost is relatively high, it can only be said that these can be guaranteed by modifying the source code, and the scheme is relatively not too complicated, but the additional overhead is relatively large, such as through an additional cache cluster to ensure a certain period of time. Repeatability, I believe that there should be some mq with this function.

Activemq supports two kinds of transactions, one is JMS transaction, and the other is XA distributed transaction. If you bring a transaction, a transactionId will be generated during the interaction to the broker. The broker implements some TM to allocate transaction processing. Metaq also supports local transactions and XA, abide by the JTA standard. The activemq and metaq transaction guarantees are all done through the redo log method, which is basically the same.

The distributed transaction here is only guaranteed after the broker stage. Before the broker commits, the prepare message will be stored in the local file, and the message will be written to the queue until the commit stage. Finally, the second stage commit is implemented through TM.


Kotlin developer community

1233356-4cc10b922a41aa80

The public account of the first Kotlin developer community in China, which mainly shares and exchanges related topics such as Kotlin programming language, Spring Boot, Android, React.js / Node.js, functional programming, and programming ideas.

The more noisy the world, the more peaceful thinking is needed.

1665 original articles published · 1067 praised · 750,000 views

Guess you like

Origin blog.csdn.net/universsky2015/article/details/105531346
Recommended