Interviewer lore: how the system supports high concurrency?

Many people interview when asked about a particular issue people know what to do: How your system supports high concurrency?

Most of the students asked the question simply did nothing thinking to answer, do not know where to start, in fact, the essence is not experienced some really high concurrent systems temper Bale.

Because there is no over-related project experience, we would not be able to extract itself from the true experience and the experience of a set of answers, then systematic exposition of how they come out of the system too complex to support high concurrent.

So, from this point of view on this article briefly cut to this problem, we teach you how to use the most simple ideas to deal with.

Of course, here a first clear premise: highly concurrent systems are different. For example megabits per second concurrent middleware system, gateway system daily ten billion requests per second instantaneous spike big promotion system hundreds of thousands of requests.

When they deal with high concurrency, because of the different characteristics of each system, so deal with architecture it is not the same.

In addition, such electronic business platform in order system, merchandise system, inventory system architecture design in high concurrency scenarios are different, because behind the business scenario is different.

Therefore, this article is mainly to provide you an answer to such questions is thinking, it does not involve any complex architecture design, so you will not be asked this question in the interview, with the interviewer 大眼瞪小眼.

Specifically to really good answer in the interview this problem, we recommend that you refer to this article thinking, then your hand is responsible for the system's more to think about, it is best to do something related to architecture practice.

1, first consider a simple system architecture

Assuming that your system is just beginning to be deployed on a machine behind it connected with a database, the database is deployed on a single server.

You may even want to be realistic, such as your system deployment machine is a 4-core 8G, the database server is a 16-core 32G.

At this point it is assumed that users of the system a total amount of 100,000, the amount of users rarely, Nikkatsu user differentiated according to the situation of different systems, we take a more objective proportion, 10% of it, every day active users 10000.

According to Pareto rule, the peak of the day count four hours, the peak of active users accounting for 80%, that is 8,000 people active in 4 hours.

Then everyone on the system initiates a request, according to 20 times a day we count it. Then the peak of the 8,000-initiated request also only 16 million times per second to average (14,400 seconds) within four hours, also 10 requests per second.

Ok! Complete with high concurrency take a share, right?

Then the system level is 10 times per second request, calls to the database every request several database operations, such as doing CRUD and the like.

So we take a first request corresponds to three times the database requests it, that this is the case, the database layer also 30 requests per second, right?

According to this configuration database server support is absolutely no problem.

The system described above, with a view showing, this is the following:

v2-f1dacb24a3effc5170f1af0484ca5e86_b.jpg

2, the system cluster deployment

Assuming that the number of users began to grow rapidly, such as the amount of registered users increased 50-fold, rising to 500 million.

At this point Nikkatsu 500,000 users, the peak of requests per second, the system is 500 / s. Then the database requests per second number is 1500 / s, this time what will happen?

According to the above machine configuration, if the process is a more complex system some business logic, business logic is a kind of heavy system, then, is a relatively time-consuming CPU.

At this point, 4-core 8G requests per second when the machine reach 500 / s, it is likely the higher the CPU load your machine.

Then the database level, in terms of the above-described arrangement, in fact, substantially peak 1500 / s requested pressure, then still acceptable.

This is mainly to observe the machine where the database disk load, network load, CPU load, memory load, in accordance with our online experience, the configuration database is no problem under pressure in 1500 / s request.

So in this case need to do a thing, the first is to support your system cluster deployment.

Can hang in front of a load-balancing layer, the request hit a uniform system level, the system can use multiple clustered machines support higher concurrency pressure.

For example, the system is assumed here that the deployment of a machine, then each machine only 250 requests / s of the.

As a result, the two machines will significantly reduce the CPU load, this initial "high concurrency" is not on the first cover live it?

If even this is not done, when that single machine load higher and higher, in extreme cases is possible to deploy the system on the machine can not have enough resources to respond to the request, and the request appears stuck, even of system downtime problem class.

So simple summary, the first step:

Adding a load balancing layer, layer system requests evenly hit.

System layer using cluster deployment of multiple machines, Kang Zhu preliminary concurrent pressure.

At this architecture diagram becomes like the following:

v2-5884ec2c682346bb0f936d645342a5a2_b.jpg

3, the database sub-library separate read and write sub-table +

Assuming that the number of users continues to grow, reaching 10 million registered users, and then every day is one million users daily living.

那么此时对系统层面的请求量会达到每秒1000/s,系统层面,你可以继续通过集群化的方式来扩容,反正前面的负载均衡层会均匀分散流量过去的。

但是,这时数据库层面接受的请求量会达到3000/s,这个就有点问题了。

此时数据库层面的并发请求翻了一倍,你一定会发现线上的数据库负载越来越高。

每次到了高峰期,磁盘IO、网络IO、内存消耗、CPU负载的压力都会很高,大家很担心数据库服务器能否抗住。

没错,一般来说,对那种普通配置的线上数据库,建议就是读写并发加起来,按照上述我们举例的那个配置,不要超过3000/s。

因为数据库压力过大,首先一个问题就是高峰期系统性能可能会降低,因为数据库负载过高对性能会有影响。

另外一个,压力过大把你的数据库给搞挂了怎么办?

所以此时你必须得对系统做分库分表 + 读写分离,也就是把一个库拆分为多个库,部署在多个数据库服务上,这是作为主库承载写入请求的。

然后每个主库都挂载至少一个从库,由从库来承载读请求。

此时假设对数据库层面的读写并发是3000/s,其中写并发占到了1000/s,读并发占到了2000/s。

那么一旦分库分表之后,采用两台数据库服务器上部署主库来支撑写请求,每台服务器承载的写并发就是500/s。每台主库挂载一个服务器部署从库,那么2个从库每个从库支撑的读并发就是1000/s。

简单总结,并发量继续增长时,我们就需要focus在数据库层面:分库分表、读写分离。

此时的架构图如下所示:

v2-04a67ed8ed1607f64ac029f436521967_b.jpg

4、缓存集群引入

接着就好办了,如果注册用户量越来越大,此时你可以不停地加机器,比如说系统层面不停加机器,就可以承载更高的并发请求。

然后数据库层面如果写入并发越来越高,就扩容加数据库服务器,通过分库分表是可以支持扩容机器的,如果数据库层面的读并发越来越高,就扩容加更多的从库。

但是这里有一个很大的问题:数据库其实本身不是用来承载高并发请求的。所以通常来说,数据库单机每秒承载的并发就在几千的数量级,而且数据库使用的机器都是比较高配置,比较昂贵的机器,成本很高。

如果不停地加机器,这是不对的。

在高并发架构里通常都有缓存这个环节,缓存系统的设计就是为了承载高并发而生。

单机承载的并发量都在每秒几万,甚至每秒数十万,对高并发的承载能力比数据库系统要高出一到两个数量级。

可以根据系统的业务特性,对那种写少读多的请求,引入缓存集群。

具体来说,就是在写数据库的时候同时写一份数据到缓存集群里,然后用缓存集群来承载大部分的读请求。

这样的话,通过缓存集群,就可以用更少的机器资源承载更高的并发。

比如说上面那个图里,读请求目前是每秒2000/s,两个从库各自抗了1000/s读请求,但是其中可能每秒1800次的读请求都是可以直接读缓存里的不怎么变化的数据的。

那么此时你一旦引入缓存集群,就可以抗下来这1800/s读请求,落到数据库层面的读请求就200/s。

同样,给大家来一张架构图,一起来感受一下:

v2-586c7903f08475bb8c4ec0b4ac4268ac_b.jpg

按照上述架构,好处是什么呢?

可能未来你的系统读请求每秒都几万次了,但是可能80%~90%都是通过缓存集群来读的,而缓存集群里的机器可能单机每秒都可以支撑几万读请求,所以耗费机器资源很少,可能就两三台机器就够了。

要是换成数据库来试一下,可能就要不停地加从库到10台、20台机器才能抗住每秒几万的读并发,那个成本是极高的。

好了,我们再来简单小结,承载高并发需要考虑的第三个点:

不要盲目进行数据库扩容,数据库服务器成本昂贵,且本身就不是用来承载高并发的

针对写少读多的请求,引入缓存集群,用缓存集群抗住大量的读请求


5、引入消息中间件集群

接着再来看看数据库写这块的压力,其实是跟读类似的。

假如说所有写请求全部都落地数据库的主库层,当然是没问题的,但是写压力要是越来越大了呢?

比如每秒要写几万条数据,此时难道也是不停的给主库加机器吗?

可以当然也可以,但是同理,耗费的机器资源是很大的,这个就是数据库系统的特点所决定的。

相同的资源下,数据库系统太重太复杂,所以并发承载能力就在几千/s的量级,所以此时你需要引入别的一些技术。

比如说消息中间件技术,也就是MQ集群,是非常好的做写请求异步化处理,实现削峰填谷的效果。

假如说,现在每秒是1000/s次写请求,其中比如500次请求是必须请求过来立马写入数据库中的,但是另外500次写请求是可以允许异步化等待个几十秒,甚至几分钟后才落入数据库内的。

那么此时完全可以引入消息中间件集群,把允许异步化的每秒500次请求写入MQ,然后基于MQ做一个削峰填谷。比如就以平稳的100/s的速度消费出来然后落入数据库中即可,此时就会大幅度降低数据库的写入压力。

此时,架构图变成了下面这样:

v2-1397029607151ecb618b9329be0a11a5_b.jpg

大家看上面的架构图,首先消息中间件系统本身也是为高并发而生,所以通常单机都是支撑几万甚至十万级的并发请求的。

所以,这本身也跟缓存系统一样,可以用很少的资源支撑很高的并发请求,用来支撑部分允许异步化的高并发写入是没问题的,比使用数据库直接支撑那部分高并发请求要减少很多的机器使用量。

而且经过消息中间件的削峰填谷之后,比如就用稳定的100/s的速度写数据库,那么数据库层面接收的写请求压力,不就成了500/s + 100/s = 600/s了么?

大家看看,是不是发现减轻了数据库的压力?

到目前为止,通过下面的手段,我们已经可以让系统架构尽可能用最小的机器资源抗住了最大的请求压力,减轻了数据库的负担。

系统集群化

数据库层面的分库分表+读写分离

针对读多写少的请求,引入缓存集群

针对高写入的压力,引入消息中间件集群

初步来说,简单的一个高并发系统的阐述是说完了。

但是,其实故事到这里还远远没有结束。

6、现在能Hold住高并发面试题了吗?

看完了这篇文章,你觉得自己能回答好面试里的高并发问题了吗?

很遗憾,答案是不能。而且我觉得单单凭借几篇文章是绝对不可能真的让你完全回答好这个问题的,这里有很多原因在里面。

First, high concurrency topic itself is very complex, not far from some of the articles can be said clearly, the essence is that real support complex business scenarios in highly concurrent systems architecture is actually very complex.

Spike big promotion system for example megabits per second concurrent middleware system, gateway system ten billion daily requests, hundreds of thousands of requests per second, instantaneously, supporting hundreds of millions of users of large-scale and high-power business platform architecture, etc. .

In order to support high concurrent requests, the system architecture design, will bind to specific business scenarios and characteristics, design a variety of complex architecture, which requires a lot of underlying technical support, the ability to design sophisticated structures and mechanisms need.

Finally, the presentation of complex systems architecture complexity far beyond most of the students did not contact the imagination.

But then the complex system architecture, through some of the articles is difficult to say clearly various details inside and the process of production of the floor.

Secondly, high concurrency this content topic itself contains far more than paper said so several topic: sub-library sub-table, cache, news.

A complete and complex high concurrency system architecture, will include a variety of complex infrastructure systems since the inquiry, the full link a variety of exquisite architectural design (such as hot cache architecture, multi-priority high throughput MQ architecture design, system concurrent performance optimization design, etc.), as well as high concurrency architecture overall technical program complex combination of systems, as well as NoSQL (elasticsearch, etc.) / load balancing / Web server and other related technologies.

So we remember to keep the awe of technology, these things are difficult to express clearly by some articles.

Finally, the real landing when the production system under high concurrency scenarios will be a lot of technical problems.

For example, do not increase the throughput of messaging middleware needs to be optimized, disk write too much pressure poor performance, memory consumption too easily explode, sub-sub-table database middleware does not know why the lost data and so on it.

Questions like very much, it is impossible by these articles make it clear to all.


Guess you like

Origin blog.51cto.com/14378044/2415853