The evolution of large-scale website system architecture

foreword

The system architecture of a mature large-scale website (such as Taobao, JD.com, etc.) is not designed to have complete features such as high performance, high availability, and security. It always evolves and improves with the increase in the number of users and the expansion of business functions. Yes, during this process, great changes have taken place in the development model, technical architecture, and design thinking, and even the technical staff has developed from a few people to a department or even a product line. Therefore, a mature system architecture is perfected with business expansion, not overnight; systems with different business characteristics will have their own focus, such as Taobao, which needs to solve the search, order, and payment of massive commodity information, such as Tencent , To solve the real-time message transmission of hundreds of millions of users, Baidu has to deal with massive search requests. They have their own business characteristics and different system architectures. Despite this, we can also find out the common technologies from the background of these different websites. These technologies and means can be widely used in the architecture of large-scale website systems. Let us understand these technologies by introducing the evolution process of large-scale website systems. and means.

First, the initial website structure

In the initial architecture, applications, databases, and files are all deployed on one server, as shown in the figure:

image

2. Separation of applications, data and files

With the expansion of the business, one server can no longer meet the performance requirements. Therefore, applications, databases, and files are deployed on independent servers, and different hardware is configured according to the purpose of the server to achieve the best performance.

image

3. Use cache to improve website performance

While optimizing the performance of hardware, it also optimizes performance through software. In most website systems, caching technology is used to improve system performance. The use of caching is mainly due to the existence of hot data, and most website visits follow the 28 principle (That is, 80% of the access requests end up on 20% of the data), so we can cache hot data to reduce the access paths of these data and improve the user experience.

251844453265971

The common way of cache implementation is local cache and distributed cache. Of course, there are also CDNs, reverse proxies, etc., which will be discussed later. Local cache, as the name implies, caches data locally on the application server, which can exist in memory or in files. OSCache is a commonly used local cache component. The local cache is characterized by fast speed, but the amount of cached data is also limited due to limited local space. The characteristic of distributed cache is that it can cache massive data, and it is very easy to expand. It is often used in portal websites, and its speed is not as fast as local cache. Commonly used distributed caches are Memcached and Redis.

4. Using clusters to improve application server performance

As the entrance of the website, the application server will bear a large number of requests. We often share the number of requests through the application server cluster. A load balancing server is deployed in front of the application server to schedule user requests, and distribute requests to multiple application server nodes according to the distribution policy.

251844471702801

Commonly used load balancing technologies include F5 for hardware, which is relatively expensive, and software such as LVS, Nginx, and HAProxy. LVS is a four-layer load balancing, and internal servers are selected according to the target address and port. Nginx is a seven-layer load balancing and HAProxy supports four-layer and seven-layer load balancing, and an internal server can be selected according to the content of the message. Therefore, the LVS distribution path is better than Nginx and HAProxy has higher performance, while Nginx and HAProxy are more configurable, for example, they can be used for dynamic and static separation (select static resource server or application server according to the characteristics of request packets).

5. Database read-write separation and sub-database sub-table

With the increase in the number of users, the database becomes the biggest bottleneck. The commonly used methods to improve the performance of the database are to separate the read-write and the table. . The sub-database sub-table is divided into horizontal segmentation and vertical segmentation, and the horizontal switch is to split a very large table in a database, such as a user table. Vertical segmentation is to switch according to different businesses, such as user business, commodity business related tables are placed in different databases.

260851219209749

6. Use CDN and reverse proxy to improve website performance

If our servers are deployed in the computer room in Chengdu, the access for users in Sichuan is faster, but the access for users in Beijing is slower. This is because Sichuan and Beijing belong to different developed regions of China Telecom and China Unicom, respectively. Beijing users need to go through a long path to access the server in Chengdu through the Internet router, and the return path is the same, so the data transmission time is relatively long. In this case, CDN is often used to solve the problem. CDN caches data content in the operator's computer room, and users obtain data from the nearest operator when accessing, which greatly reduces the path of network access. More professional CDN operators include Lanxun and Wangsu.

The reverse proxy is deployed in the computer room of the website. When the user request arrives, the reverse proxy server is first accessed, and the reverse proxy server returns the cached data to the user. If there is no cached data, it will continue to go to the application server to obtain it. It also reduces the cost of acquiring data. Reverse proxy has Squid, Nginx.

260851254513595

7. Use a distributed file system

The number of users is increasing day by day, the business volume is increasing, and more and more files are generated, and a single file server can no longer meet the demand. Requires distributed file system support. A commonly used distributed file system is NFS.

260851282647353

8. Use NoSql and search engines

For the query of massive data, we can achieve better performance by using the nosql database plus the search engine. Not all data needs to be in relational data. Commonly used NOSQL include mongodb and redis, and search engines include lucene.

260851321075527

9. Split the application server for business

With the further expansion of the business, the application becomes very bloated. At this time, we need to split the application into business, such as Baidu into news, web pages, pictures and other businesses. Each business application is responsible for relatively independent business operations. Businesses communicate through messages or share databases.

260851352481788

10. Building distributed services

At this time, we found that each business application will use some basic business services, such as user service, order service, payment service, and security service. These services are the basic elements supporting each business application. We extract these services to build distributed services using a partial service framework. Taobao's Dubbo is a good choice.

260851397174320

summary

The architecture of large-scale websites is constantly improved according to business needs, and specific designs and considerations will be made according to different business characteristics. This article only describes some technologies and means involved in a conventional large-scale website.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325561779&siteId=291194637