The evolution of large-scale website architecture (after reading)

This article is to read the micro public account: Code Farmer has a way, and the feeling after reading it later.

The technical challenges of large websites mainly come from the huge number of users, high concurrent access and massive data . Once any simple business needs to process tens of millions of data and face hundreds of millions of users, the problem will become It's tricky. Large-scale network architecture mainly solves such problems.

Website Architecture at the Beginning Stage

No matter how large a website is developed from a small website, there are not many people visiting the small website at first, and only one server is more than enough. The website structure at this time is shown in the following figure:

Features: All resources such as applications, databases, files, etc. are on one server.

Application server and data service separation

With the rapid development of business, one server can no longer meet the demand: more and more user access leads to worse performance, and more and more data leads to insufficient storage space. This is where applications and data need to be separated . After the application and data are separated, the entire website will use 3 servers: application server, file server and database server. The 3 servers have different hardware resource requirements based on the needs of their services:

(1) The application server needs to process a large amount of business logic, so a computer with a powerful CPU is required as a server;

(2) The database server needs fast disk retrieval and database caching, so a computer with high performance and large memory is required as the server;

(3) The file server needs to store a large number of files uploaded by users, so a computer with a large hard disk is required as the server;

As shown below:


After the application and data are separated , servers with different characteristics assume different service roles, and the concurrent processing capability and data storage space of the website have been greatly improved, supporting the further development of the website business. However, as the number of users gradually increased, the website faced challenges again: too much pressure on the database led to access delays , which in turn affected the performance of the entire website, and the user experience was affected. At this time, the website structure needs to be further optimized.

Using caching to improve website performance

The characteristic of website visit is to follow the 28 law: 80% of business visits are concentrated on 20% of data. Since most business access is concentrated on a small part of data, if this small part of data is cached in memory, the access pressure on the database can be reduced, the data access speed of the entire website can be improved, and the write performance of the database can be improved. There are two types of caches used by websites: local caches cached on application servers and remote caches cached on specialized distributed cache servers . As shown below:


(1) The access speed of the local cache is faster, but limited by the memory of the application server, the amount of cached data is limited, and there will be memory contention with the application;

(2) The remote distributed cache can use the cluster method to deploy a large-memory server as a dedicated cache server, which can theoretically achieve a cache service that is not limited by memory capacity.

After using the cache, the pressure of database access is effectively relieved, but the request connection that a single application server can handle is limited. During the peak period of website access, the application server becomes the bottleneck of the entire website.

Using application server clusters

Using clusters is a common method for websites to solve high concurrency and massive data problems. When a server has insufficient processing power and storage space, do not try to replace a more powerful server. For large websites, no matter how powerful the server is, it cannot meet the business needs of the website's continuous growth. In this case, it is more appropriate to add a server to share the access and storage pressure of the original server. For the website architecture, as long as the load pressure can be improved by adding one server, the system performance can be continuously improved by continuously increasing the server in the same way, thereby realizing the scalability of the system. Application server implementation cluster is a relatively simple and mature type of website scalable architecture design, as shown in the following figure:


Through the load balancing scheduling server, the access requests from the user's browser can be distributed to any server in the application server cluster. If there are more users, more application servers will be added to the cluster to reduce the pressure on the application server. No longer becomes the bottleneck of the entire website.

Database read and write separation

After the website uses the cache, most data read operations can be accessed without going through the database, but there are still some read operations (cache access misses, cache expiration) and all write operations that need to access the database. When the number of users reaches a certain scale, the database becomes the bottleneck of the website due to excessive load pressure. At present, most mainstream databases provide the master-slave hot backup function. By configuring the master-slave relationship of two databases, the data update of one database server can be synchronized to another server. The website utilizes this function of the database to realize the separation of reading and writing of the database, thereby improving the load pressure of the database . As shown below:


When the application server writes data , it accesses the master database , and the master database synchronizes data updates to the slave database through the master-slave replication mechanism, so that when the application server reads data , it can obtain data from the slave database . In order to facilitate the application program to access the database after the read-write separation, a special data access module is usually used on the application server side, so that the database read-write separation is transparent to the application.

Accelerate website response with reverse proxy and CDN

With the continuous development of the website business, the scale of users is getting larger and larger. Due to the complex network environment in China, users in different regions have great differences in speed when accessing the website. Studies have shown that website visit delay is positively related to user churn rate. The slower the visit, the easier it is for users to lose patience and leave. In order to provide a better user experience and retain users, websites need to speed up website access. The main means are to use CDN and reverse proxy . As shown below:


The basic principle of both CDNs and reverse proxies is caching. The purpose of using CDN and reverse proxy is to return data to users as soon as possible, on the one hand, to speed up user access, and on the other hand to reduce the load pressure on back-end servers.

(1) CDN is deployed in the computer room of the network provider, so that users can obtain data from the computer room of the network provider closest to themselves when requesting website services

(2) The reverse proxy is deployed in the central computer room of the website. When the user requests to the central computer room, the first server to access is the direction proxy server. If the user request resource is cached in the reverse proxy server, it will be directly returned to the user. .

Using Distributed File Systems and Distributed Databases

No single powerful server can meet the growing business demands of a large website. After the database is separated from reading and writing, it is split from one server into two servers. However, with the development of the website business, the demand cannot be concealed. At this time, a distributed database needs to be used. The same goes for the file system, which requires the use of a distributed file system. As shown below:


Distributed database is the last resort for website database splitting, and it is only used when the scale of table data is very large. As a last resort, the more commonly used data splitting method for websites is business sub-database, which deploys databases of different businesses on different indoor servers.

Using NoSQL and Search Engines

As the website business becomes more and more complex, the requirements for data storage and retrieval are also becoming more and more complex. The website needs to adopt some non-relational database technologies such as NoSQL and non-database query technologies such as search engines; as shown below:


business split

In order to cope with the increasingly complex business scenarios, large websites divide the entire website business into different product lines by using the method of divide and conquer. For example, large shopping transaction websites will split the homepage, stores, orders, sellers, buyers, etc. into different product lines, which are assigned to different business teams.

In terms of technology, it will also divide a website into many different applications according to product lines, and each application will be deployed independently. A relationship can be established between applications through a hyperlink (the navigation links on the home page each point to a different application address), or data can be distributed through message queues. Of course, most of them are formed by accessing the same data storage system. The associated complete system, as shown below:


Distributed service

As the business split becomes smaller and smaller, the storage system becomes larger and larger, the overall complexity of the application system increases exponentially, and deployment and maintenance become more and more difficult. Since all applications need to be connected to all database systems, in a website with tens of thousands of servers, the number of these connections is the square of the server size, resulting in insufficient database connection resources and denial of service.

Since each application system needs to perform many of the same business operations, such as user management, commodity management, etc., these shared services can be extracted and deployed independently. These reusable services are connected to the database to provide common business services, while the application system only needs to manage the user interface and call the shared services through distributed services to complete specific operations, as shown in the following figure:


Summarize:

After the traffic goes up: separation of application server and data server --> use cached data --> application server cluster --> database read and write separation --> use reverse proxy and CDN to speed up website response --> to file system and database All use distributed deployment --> use non-relational database --> business split --> distributed deployment. The ultimate goal is to reduce the business to a range that can be controlled.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325944347&siteId=291194637