Platform - how the system supports high concurrency

Reprinted: https://zhuanlan.zhihu.com/p/57160217

 

(1) how your system supports high concurrency

This article, we talk about a lot of students asked me a question, the interview when asked about a particular issue people know what to do: How your system supports high concurrency?

Most of the students asked the question simply did nothing thinking to answer, do not know where to start, in fact, the essence is not experienced some really high concurrent systems temper Bale.

Because there is no over-related project experience, we would not be able to extract itself from the true experience and the experience of a set of answers, then systematic exposition out how they had complex systems support high concurrent.

Therefore, this article will be cut from this point to talk about this simple question, with a simple idea to answer, roughly how to respond.

Of course, here a first clear premise: highly concurrent systems are different. For example megabits per second concurrent middleware system, gateway system daily ten billion requests per second instantaneous spike big promotion system hundreds of thousands of requests.

When they deal with high concurrency, because different systems each with its own characteristics, so deal with architecture is not the same.

In addition, such electronic business platform in order system, merchandise system, inventory system architecture design in high concurrency scenarios are different, because the business behind the scenes of what is different.

Therefore, this article is mainly to provide you an answer to such questions is thinking, does not involve any complex architecture design, so you will not be asked this question in the interview, with the interviewer 大眼瞪小眼.

Specifically to really good answer in the interview this problem, we recommend that you refer to this article thinking, then your hand is responsible for the system's more to think about, it is best to do something related to architecture practice.

(2) consider one of the most simple system architecture

Assuming that your system is just beginning to be deployed on a machine behind it connected with a database, the database is deployed on a single server.

You may even want to be realistic, give an example, your system deployment machine is a 4-core 8G, the database server is a 16-core 32G.

At this time, assuming your system to a total of 100,000 subscribers, subscribers few, Nikkatsu user differentiated according to the situation of different systems, we take a more objective proportion, 10% of it, every day active users 10000.

28 in accordance with the law, he considered the peak of the day four hours, the peak of active users accounting for 80%, that is 8,000 people active in 4 hours.

Then everyone on your system initiated the request, we counted 20 times a day he is right. Then the peak of the 8,000-initiated request also only 16 million times per second to average (14,400 seconds) within four hours, also 10 requests per second.

Ok! Complete with high concurrency take a share, right?

Then the system level is 10 times per second request, calls to the database every request several database operations, such as doing the crud and the like.

So we take a first request corresponds to three times the database requests it, that this is the case, the database layer also 30 requests per second, right?

According to this configuration database server support is absolutely no problem.

The system described above, with a view showing, this is the following:

 

 

(3) systems in the cluster deployment

Assuming that your users start of rapid growth, such as the amount of registered users increased 50-fold, rising to 500 million.

At this point Nikkatsu 500,000 users, the peak of requests per second, the system is 500 / s. Then the database requests per second number is 1500 / s, this time what will happen?

According to the above machine configuration, if the processing within your system is more complex some business logic, business logic is a kind of heavy system, then, is a relatively time-consuming CPU.

At this point, 4-core 8G requests per second when the machine reach 500 / s, it is likely you will find that the higher the CPU load your machine.

Then the database level, in terms of the above-described arrangement, in fact, substantially peak 1500 / s requested pressure, then still acceptable.

This is mainly to observe the machine where the database disk load, network load, CPU load, memory load, in accordance with our online experience, the configuration database is no problem under pressure in 1500 / s request.

So now you need to do a thing, the first is to support your system cluster deployment.

You can hang in front of a load-balancing layer, the request hit a uniform system level, the system can use multiple clustered machines support higher concurrency pressure.

For example, the system is assumed here that the deployment of a machine, then each machine only 250 requests / s of the.

As a result, the two machines will significantly reduce the CPU load, this initial "high concurrency" is not on the first cover live it?

If even this is not done, when that single machine load higher and higher, in extreme cases is possible to deploy the system on the machine you can not have enough resources to respond to the request, and the request appears stuck, even of system downtime problem class.

So simple summary, the first step:

Adding a load balancing layer, layer system requests evenly hit. System layer using cluster deployment of multiple machines, Kang Zhu preliminary concurrent pressure.

At this architecture diagram becomes like the following:

 

 

(4) a database sub-library separate read and write sub-table +

Assuming that the number of users continues to grow, reaching 10 million registered users, and then every day is one million users daily living.

So this time the amount of requests for system level will reach 1000 per second / s, system level, you can continue to cluster expansion by the way, anyway, front load balancing layer evenly dispersed flow past.

However, when the amount of the request will reach an acceptable level database 3000 / s, this little problem.

At this point the database level of concurrent requests doubled, you will find online database load is increasing.

Every time the peak of the pressure will be very high disk IO, network IO, memory consumption, CPU load, and we are very worried that the database server can be withstood.

Yes, in general, the kind of online database for common configuration, it is recommended that concurrent read and write together, according to the configuration of our example above, not more than 3000 / s.

Because the database too much pressure, the first problem is the peak system performance may be reduced, because of the impact on the performance of the database load is too high there.

Another, had a lot of pressure on your database to hang out how to do?

So at this point you have to make to the system + sub-library separate read and write sub-table, which is a library to split into multiple libraries, deployed on a plurality of database services, which is a main library bearer write request.

Each primary library is then mounted at least one from the library, from the library to be requested by the read bearer.

At this time, assuming that the level of concurrent read and write database is 3000 / s, where concurrent writes accounted for 1000 / s, concurrent read accounted for 2000 / s.

Once the sub-library then sub-table used to deploy the primary database on the database server to support both a write request, write each server is hosted concurrent 500 / s. Each mount a main library from the library server deployment, the two concurrent read from the library is 1000 / s each supported from the library.

A brief summary, when the amount of concurrency continues to grow, we need to focus at the database level: sub-library sub-table, separate read and write.

At this time, the structure shown in FIG follows:

 

 

(5) the introduction of the cluster cache

Then be easier, if you are a registered user amount increases, then you can keep adding machine, such as system level constantly adding machine, you can carry higher concurrent requests.

然后数据库层面如果写入并发越来越高,就扩容加数据库服务器,通过分库分表是可以支持扩容机器的,如果数据库层面的读并发越来越高,就扩容加更多的从库。

但是这里有一个很大的问题:数据库其实本身不是用来承载高并发请求的,所以通常来说,数据库单机每秒承载的并发就在几千的数量级,而且数据库使用的机器都是比较高配置,比较昂贵的机器,成本很高。

如果你就是简单的不停的加机器,其实是不对的。

所以在高并发架构里通常都有缓存这个环节,缓存系统的设计就是为了承载高并发而生。

所以单机承载的并发量都在每秒几万,甚至每秒数十万,对高并发的承载能力比数据库系统要高出一到两个数量级。

所以你完全可以根据系统的业务特性,对那种写少读多的请求,引入缓存集群。

具体来说,就是在写数据库的时候同时写一份数据到缓存集群里,然后用缓存集群来承载大部分的读请求。

这样的话,通过缓存集群,就可以用更少的机器资源承载更高的并发。

比如说上面那个图里,读请求目前是每秒2000/s,两个从库各自抗了1000/s读请求,但是其中可能每秒1800次的读请求都是可以直接读缓存里的不怎么变化的数据的。

那么此时你一旦引入缓存集群,就可以抗下来这1800/s读请求,落到数据库层面的读请求就200/s。

同样,给大家来一张架构图,一起来感受一下:

 

 

按照上述架构,他的好处是什么呢?

可能未来你的系统读请求每秒都几万次了,但是可能80%~90%都是通过缓存集群来读的,而缓存集群里的机器可能单机每秒都可以支撑几万读请求,所以耗费机器资源很少,可能就两三台机器就够了。

你要是换成是数据库来试一下,可能就要不停的加从库到10台、20台机器才能抗住每秒几万的读并发,那个成本是极高的。

好了,我们再来简单小结,承载高并发需要考虑的第三个点:

不要盲目进行数据库扩容,数据库服务器成本昂贵,且本身就不是用来承载高并发的 针对写少读多的请求,引入缓存集群,用缓存集群抗住大量的读请求

(6)引入消息中间件集群

接着再来看看数据库写这块的压力,其实是跟读类似的。

假如说你所有写请求全部都落地数据库的主库层,当然是没问题的,但是写压力要是越来越大了呢?

比如每秒要写几万条数据,此时难道也是不停的给主库加机器吗?

可以当然也可以,但是同理,你耗费的机器资源是很大的,这个就是数据库系统的特点所决定的。

相同的资源下,数据库系统太重太复杂,所以并发承载能力就在几千/s的量级,所以此时你需要引入别的一些技术。

比如说消息中间件技术,也就是MQ集群,他是非常好的做写请求异步化处理,实现削峰填谷的效果。

假如说,你现在每秒是1000/s次写请求,其中比如500次请求是必须请求过来立马写入数据库中的,但是另外500次写请求是可以允许异步化等待个几十秒,甚至几分钟后才落入数据库内的。

那么此时你完全可以引入消息中间件集群,把允许异步化的每秒500次请求写入MQ,然后基于MQ做一个削峰填谷。比如就以平稳的100/s的速度消费出来然后落入数据库中即可,此时就会大幅度降低数据库的写入压力。

ps:关于MQ削峰填谷的概念,在公众号之前讲消息中间件的文章中已详细阐述,如果大伙儿忘记了,可以回顾一下。

此时,架构图变成了下面这样:

 

 

大家看上面的架构图,首先消息中间件系统本身也是为高并发而生,所以通常单机都是支撑几万甚至十万级的并发请求的。

所以,他本身也跟缓存系统一样,可以用很少的资源支撑很高的并发请求,用他来支撑部分允许异步化的高并发写入是没问题的,比使用数据库直接支撑那部分高并发请求要减少很多的机器使用量。

而且经过消息中间件的削峰填谷之后,比如就用稳定的100/s的速度写数据库,那么数据库层面接收的写请求压力,不就成了500/s + 100/s = 600/s了么?

大家看看,是不是发现减轻了数据库的压力?

到目前为止,通过下面的手段,我们已经可以让系统架构尽可能用最小的机器资源抗住了最大的请求压力,减轻了数据库的负担。

系统集群化 数据库层面的分库分表+读写分离 针对读多写少的请求,引入缓存集群 针对高写入的压力,引入消息中间件集群,

初步来说,简单的一个高并发系统的阐述是说完了。

但是,其实故事到这里还远远没有结束。

(7)现在能hold住高并发面试题了吗?

看完了这篇文章,你觉得自己能回答好面试里的高并发问题了吗?

很遗憾,答案是不能。而且我觉得单单凭借几篇文章是绝对不可能真的让你完全回答好这个问题的,这里有很多原因在里面。

首先,高并发这个话题本身是非常复杂的,远远不是一些文章可以说的清楚的,他的本质就在于,真实的支撑复杂业务场景的高并发系统架构其实是非常复杂的。

比如说每秒百万并发的中间件系统、每日百亿请求的网关系统、瞬时每秒几十万请求的秒杀大促系统、支撑几亿用户的大规模高并发电商平台架构,等等。

为了支撑高并发请求,在系统架构的设计时,会结合具体的业务场景和特点,设计出各种复杂的架构,这需要大量底层技术支撑,需要精妙的架构和机制设计的能力。

最终,各种复杂系统呈现出来的架构复杂度会远远超出大部分没接触过的同学的想象。

如果大家想要看一下有一定发复杂度的系统的架构设计和演进过程,可以看一下之前写的一个系列专栏 《亿级流量系统架构演进》 。

但是那么复杂的系统架构,通过一些文章是很难说的清楚里面的各种细节以及落地生产的过程的。

其次,高并发这话题本身包含的内容也远远不止本文说的这么几个topic:分库分表、缓存、消息。

一个完整而复杂的高并发系统架构中,一定会包含各种复杂的自研基础架构系统、各种精妙的架构设计(比如热点缓存架构设计、多优先级高吞吐MQ架构设计、系统全链路并发性能优化设计,等等)、还有各种复杂系统组合而成的高并发架构整体技术方案、还有NoSQL(Elasticsearch等)/负载均衡/Web服务器等相关技术。

所以大家切记要对技术保持敬畏之心,这些东西都很难通过一些文章来表述清楚。

最后,真正在生产落地的时候,高并发场景下你的系统会出现大量的技术问题。

比如说消息中间件吞吐量上不去需要优化、磁盘写压力过大性能太差、内存消耗过大容易撑爆、分库分表中间件不知道为什么丢了数据,等等吧。

诸如此类的问题非常多,这些也不可能通过文章给全部说清楚。

(8)本文能带给你什么启发?

其实本文的定位,就是对高并发这个面试topic做一个扫盲,因为我发现大部分来问我这个问题的同学,连本文阐述的最最基本的高并发架构演进思路可能都没理解。

当然,也是因为毕竟没真的做过高并发系统,没相关经验,确实很难理解好这个问题。

所以本文就是让很多没接触过的同学有一个初步的感知,这个高并发到底是怎么回事儿,到底对系统哪里有压力,要在系统架构里引入什么东西,才可以比较好的支撑住较高的并发压力。

而且你可以顺着本文的思路继续思考下去,结合你自己熟悉和知道的一些技术继续思考。

比如说,你熟悉Elasticsearch技术,那么你就可以思考,唉?在高并发的架构之下,是不是可以通过分布式架构的ES技术支撑高并发的搜索?

上面所说,权当抛砖引玉。大家自己平时一定要多思考,自己多画图,盘点盘点自己手头系统的请求压力。计算一下分散到各个中间件层面的请求压力,到底应该如何利用最少的机器资源最好的支撑更高的并发请求。

这才是一个好的高并发架构设计思路。

如果起到这个效果,本文就成功了。剩下的,还是建议各位同学,对高并发这个话题,结合自己手头负责的系统多做思考。

比如当前业务场景下,你的系统有多大的请求压力?如果请求压力增长10倍你的架构如何支撑?如果请求压力增长100倍,你的架构如何支撑?如果请求压力增长1000倍,你的架构如何支撑?

平时一定多给自己设置一些技术挑战,敦促自己去思考自己的系统,最好多做写架构上的演练、落地和实践,自己实际操作一下,才有更好的感知。

然后在面试的时候,起码自己做过一定深度的思考,结合自己负责的系统做过一些实践,可以跟面试官有一个较为清晰和系统的阐述。

虽然大部分同学可能没机会经历那种真正大规模超高并发的系统架构的设计,但是本文如果能让大家平时对自己的项目多一些思考。在面试的时候,有一些系统性的思路和阐述,那么也就达到本文的目的了。

Guess you like

Origin www.cnblogs.com/ricoo/p/12072667.html