Alibaba Cloud's evolution of building a tens of millions-level architecture

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

foreword

A good architecture comes from evolution, not purely by design. At the beginning of architecture design, it is impossible for us to comprehensively consider the high performance, high scalability, high security and other factors of the architecture. With more and more business requirements and more and more business access pressure, the architecture continues to evolve and evolve, thus creating a mature and stable large-scale architecture. Such as Taobao, Facebook and other large-scale website architecture, all from a small-scale architecture, evolving and evolving into a large-scale website architecture.

With the advent of cloud computing, the current transition from the IT era to the DT era has begun. How to build a tens of millions-level architecture in the cloud, this article mainly combines the best practices of Alibaba Cloud to share with you how to gradually evolve from a small website to a tens of millions-level architecture.

The original stage of the architecture: an all-purpose single machine

The most primitive stage of the architecture, where one ECS server does everything. For applications such as traditional official websites and forums, only one ECS is required. The corresponding web server, database, static file resources, etc. can be deployed on an ECS. Generally, 50,000 pv to 300,000 pv visits, combined with kernel parameter tuning, web application performance parameter tuning, and database tuning, can basically run stably.

The architecture uses a single ECS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Architectural foundation phase: Physical separation of web and database

When the access pressure reaches 500,000 pv to 1,000,000 pv, service applications such as web applications and databases deployed on a server will compete for system resources such as CPU/memory/disk/bandwidth of the server. Obviously, a single machine has a performance bottleneck. We physically separate and deploy web applications and databases separately to solve corresponding performance problems. The architecture here uses ECS+RDS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Architecture dynamic and static separation stage: static cache + file storage

When the access pressure reaches 1 million pv to 3 million pv, we see performance bottlenecks in front-end web services. A large number of web requests are blocked, and the server's CPU, disk IO, and bandwidth are under pressure. At this time, on the one hand, we store website pictures, js, css, html, and application service-related files in oss, and on the other hand, we use CDN to distribute and cache static resources on each node to achieve "nearest access". By separating the access of dynamic requests and static requests ("dynamic and static separation"), the server's access pressure on disk IO and bandwidth can be effectively resolved.

The architecture adopts CDN + ECS + OSS + RDS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Architecture Distributed Phase: Load Balancing

When the access pressure reaches 3 million pv to 5 million pv, although "dynamic and static separation" effectively separates the pressure of static requests, the pressure of dynamic requests has already made the server "overwhelmed". The most intuitive phenomenon is that front-end access is blocked, delayed, server processes increase, cpu 100%, and common 502/503/504 error codes appear. Obviously, a single web server can no longer meet the demand. Here, multiple web servers need to be added through load balancing technology (different availability zones can be selected for ECS to further ensure high availability). So bid farewell to the era of stand-alone and change the stage of distributed architecture.

The architecture adopts CDN+SLB+ECS+OSS+RDS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Schema Data Caching Phase: Database Caching

When the access pressure reaches 5 million pv to 10 million pv, although load balancing combines multiple web servers to solve the performance pressure of dynamic requests. But at this time, we found that there was a pressure bottleneck in the database. The common phenomenon was that the number of RDS connections increased and blocked, CPU100%, and IOPS soared. At this time, we use database caching to effectively reduce database access pressure and further improve performance.

The architecture adopts CDN + SLB + ECS + OSS + ApsaraDB for memcache + RDS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Architecture Scaling Phase: Vertical Scaling

When the access volume reaches 10 million pv to 50 million pv, although at this time we can see that the performance problem of file storage has been solved through the distributed file system OSS, and the performance problem of static resource access has also been solved by CDN. But when the access pressure increases again, the web server and database are still bottlenecks at this time. Here, we further divide the pressure of web servers and databases through vertical expansion to solve performance problems.

"What is vertical expansion? It is divided into different servers (or databases) according to different businesses (or databases). This kind of segmentation is called vertical expansion."

The first measure of vertical expansion: business splitting

In the business layer, different functional modules can be split into different servers for separate deployment. For example, user modules, order modules, commodity modules, etc., are divided into different servers for deployment.

The second measure of vertical expansion: read and write separation

At the database layer, when combined with database caching, the database pressure is still very high. We further segment and reduce the pressure on the database by separating read and write.

The third measure of vertical expansion: sub-library

Combined with business splitting and read-write separation, at the database layer, for example, we can also divide user modules, order modules, and commodity modules. The database tables involved: user module table, order module table, commodity module table, etc., are stored in different databases, such as user module library, order module library, commodity module library, etc. Then deploy different databases to different servers respectively.

The architecture adopts CDN + SLB + ECS + OSS + cloud database memcache + RDS read and write separation:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

Architecture distributed + big data stage: horizontal expansion

When the number of visits reaches 50 million pv and above, and when the number of visits exceeds tens of millions of architectures, we can see that the architecture of vertical expansion has also begun to "run out". For example, the separation of read and write only solves the pressure of "reading". In the face of high traffic volume, the pressure of "writing" in the database is "powerless", resulting in a performance bottleneck. In addition, although the sub-library splits the pressure into different databases. However, the data volume of a single table has reached the TB level or more, which has obviously reached the limit of traditional relational database processing.

Horizontal scaling first trick: add more web servers

After the business is vertically split and deployed on different servers, when the subsequent pressure is further increased, more webservers are added for horizontal expansion.

The second trick of horizontal expansion: adding more SLBs

A single SLB also has the risk of a single point of failure, that is, the SLB also has performance limits, such as the maximum QPS of 50,000. Through DNS round-robin, the request round-robin is forwarded to the SLB in different availability zones to realize the horizontal expansion of the SLB.

The third measure of horizontal expansion: using distributed cache

Although the Alibaba Cloud memcache in-memory database is already a distributed structure, the same single entry has the risk of a single point of failure. And there are performance limits as well, like the maximum throughput peaking at 512Mbps. Therefore, we deploy multiple ApsaraDB for memcache versions, and the data can be cached in different ApsaraDB for memcache versions through the hash algorithm at the code layer.

The fourth trick of horizontal expansion: sharding + nosql

Facing the requirements of high concurrency and big data, traditional relational databases are no longer suitable. DRDS required

(mysql sharding distributed solution) + distributed database corresponding to OTS (column-based distributed database) to fundamentally solve the problem.

The architecture adopts CDN+DNS polling + SLB + ECS + OSS + cloud database memcache + DRDS+OTS:

Technical dry goods: Alibaba Cloud's evolution of building tens of millions of architectures

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326960752&siteId=291194637