Summary of technical difficulties and solutions of the spike system

Preface

What benefits will the introduction of message middleware in a system bring to the system?

We must know that after the introduction of MQ, there are three main problems that can be solved: asynchrony, decoupling, and peak clipping

This article will talk to you about the specific scenarios of peak shaving, aiming at the technical difficulties and solutions of the spike system in an e-commerce system.

What is the bottleneck facing the system

Let's first understand what specific problems need to be solved in the spike system? Look at the picture:

 

If our system has a spike business, then a large number of users will visit our order system cluster. In fact, this is not a technical bottleneck. As long as we expand the order system cluster and increase the number of machines in the order system, it can resist this or even more High concurrency situation.

So what is the technical bottleneck?

Let's look at the database part again, and you will find that no matter how many machines are added to your order system cluster, they still access a database. Then every time you face an activity like the spike system, the pressure on the database is extremely great, and it is very likely that it will crash, causing the entire system to collapse. The consequences are very terrible.

From this we analyzed that the database is a major bottleneck faced by the spike system.

How to solve the bottleneck of the spike system

Just now we mentioned that the technical bottleneck facing the spike system is the database, so how do we solve it? Is it necessary to deploy more database servers, sub-database and sub-table, and then let more database servers jointly resist the high concurrency situation?

This strategy of sub-database sub-table is as follows. Assuming that we are currently operating an order table in one library, then after sub-library, it becomes multiple libraries, and each library only stores part of the order data. , The sub-database strategy can be calculated according to the timestamp or hash algorithm (this is not the focus of this article, there will be a separate database discussion later), the sub-table means that the order table can be divided into multiple in a library A table, the data is stored in fragments again.

What are the benefits of this model?

The benefits are actually obvious. Our high-concurrency requests can be evenly distributed to multiple databases. The entire database cluster can jointly bear the pressure of high concurrency, and the ability to bear higher concurrency can also be achieved by extending the database cluster.

Having said that, do the friends think that this problem is solved like this?

Can clearly tell you, this solution is very tricky , unless the company's technical ability is too weak, no one can build a more reliable infrastructure will choose this last resort, to resist high number of concurrent By stacking machine Under pressure.

Try to think about it, if our system is very popular with customers, and the number of users is increasing day by day, reaching a massive number of users, should we keep increasing the number of servers?

Is the cost of the server a bit high?

Therefore, to solve this problem, our starting point must be correct, that is, we cannot solve this problem by constantly adding machines, but design a more elegant architecture to solve it with limited resources . This method is the best policy.

Optimization of the front page

Knowing that we should optimize the architecture with limited resources, then we first think about a problem.

How do users operate the system when they participate in the spike activity?

Take the double eleven rush for example, at 00:00 is the time of the spike, so many users will open the phone to the corresponding page a few minutes in advance, and then refresh the page without stopping, waiting for the spike to start. The moment came.

So, have you considered, where do these pages to be refreshed come from?

In fact, our page must have its own dedicated order page server, which is mainly used to provide front-end access pages. The basic structure is as follows:

 

Therefore, the first system that accepts high concurrent requests is the front-end page system.

Let’s think about a question. If it’s not a spike, the products that users see may be different, but once there is a spike activity, there may be a large number of users refreshing the same page of the same product, which will cause pressure on the system. .

So how should we solve this problem?

The solution introduced today is the strategy of static page data + multi-level caching .

Static page data

Let's first talk about what is dynamic page data.

Assuming that our front-end page is dynamic, every time the user visits the page, he will send a request to the page system to obtain data, and then the front-end page will render the page according to the acquired data. Generally speaking, the evolution of the system starts from this dynamic , Such as jsp pages.

So how to achieve static page, in fact, is to change the way the page gets data, every time you get data, you no longer query the database through the page system but get data from other places, avoiding to visit the back-end database every time. The system causes stress.

Multi-level cache

Understand the static idea, then let's take a look at what multi-level cache is. The multi-level cache we are going to talk about refers to the multi-level cache architecture of CDN+Nginx+Redis .

What does that mean? That is to say, the data of the page is first placed on the CDN closest to the user.

Some friends may not understand CDN, so I will give you a brief literacy.

For example, our system server is deployed in Beijing, and the users accessing our system are in Hainan, so every time it accesses our system, does it need to go to our Beijing server to obtain data?

No, we can deploy static page data to the CDN in Hainan, and Hainan users can get the page data of our system through the CDN.

This CDN is a service provided by various cloud vendors, and it is the first level cache in our architecture .

If the page data is not obtained from the CDN due to the expiration of the cache in the CDN, etc., then the user will send this request to our Beijing server at this time, but at this time the system does not directly query the database to return the data, but first Access the cache on the Nginx server.

Nginx can implement local caching based on Lua scripts. We can put the page data in the Nginx cache in advance as the second-level cache .

What if there is no desired data on Nginx?

At this time, you can send a request to the Redis cluster to load data through the lua script on Nginx, and the Redis cluster will serve as the third-level cache of our multi-level cache architecture .

If the data is still not found in the Redis cluster, we will load it from the database and update it to the cache.

Through such a multi-level cache architecture, we can realize the storage of static data of the page (the data may be a json string), so that the pressure on our page server itself is very small.

The architecture diagram is as follows:

 

to sum up

Today we talked about the drawbacks of the stacking machine solution under the high concurrency system, and also introduced some technical challenges in the spike scenario.

It also explained how to build static pages using a multi-level cache architecture.

But the spike system is a complex system, and there are many in-depth research details. The main purpose is to introduce the overall scene of the spike system and some architectural optimization ideas for the spike system, so as to lead to how to implement RocketMQ into the spike system , To achieve the peak-shaving effect of traffic.

If you think this article is helpful to you, you can like to follow and support

If you think this article is helpful to you, you can like it and follow it to support it, or you can follow my public account, there are more technical dry goods articles and related information sharing, everyone can learn and progress together!

 

Guess you like

Origin blog.csdn.net/weixin_50205273/article/details/108623893