The evolution process of the system architecture of large e-commerce websites

The system architecture of a mature large-scale website (such as Taobao, Tmall, Tencent, etc.) is not designed with complete high performance, high availability, and high scalability. The expansion of functions has
gradually evolved and perfected. During this process, the development mode, technical architecture, and design thinking have also undergone great changes. Even the technical staff has developed from a few people to a department or even a product line. Therefore, the mature
system architecture is gradually improved with the expansion of the business, not overnight; systems with different business characteristics will have their own focus, such as Taobao, to solve the search, order, and payment of massive commodity information,
For example , Tencent needs to solve the real-time message transmission of hundreds of millions of users, and Baidu needs to deal with massive search requests. They have their own business characteristics and different system architectures. Despite this, we can also find out the common technologies from the background of these different websites. These technologies and means are widely used in the architecture of large-scale website systems. The following will introduce the evolution process of large-scale website systems to understand these technologies and means.

First, the initial website structure

In the initial architecture, applications, databases, and files are all deployed on one server, as shown in the figure:

image

2. Separation of applications, data and files

With the expansion of the business, one server can no longer meet the performance requirements. Therefore, applications, databases, and files are deployed on independent servers, and different hardware is configured according to the purpose of the server to achieve the best performance.

image

3. Use cache to improve website performance

While
optimizing the performance of hardware, it also optimizes performance through software. In most website systems, caching technology is used to improve system performance. The use of caching is mainly due to the existence of hot data, and most website
visits follow the 28 principle (That is, 80% of the access requests end up on 20% of the data), so we can cache hot data to reduce the access paths of these data and improve the user experience.

image


The common way of cache Of course, there are also CDNs, reverse proxies, etc., which will be discussed later. Local cache, as the name implies, caches data locally on the application server, which can exist in memory
or in files. OSCache is a commonly used local cache component. The local cache is characterized by fast speed, but the amount of cached data is also limited due to limited local space. The characteristic of distributed cache is that
it can cache massive data, and it is very easy to expand. It is often used in portal websites, and its speed is not as fast as local cache. Commonly used distributed caches are Memcached and Redis.

4. Using clusters to improve application server performance

As the entrance of the website, the application server will bear a large number of requests. We often share the number of requests through the application server cluster. A load balancing server is deployed in front of the application server to schedule user requests, and distribute requests to multiple application server nodes according to the distribution policy.

Architecture 4

Commonly used
load balancing technologies include F5 for hardware, which is relatively expensive, and software such as LVS, Nginx, and HAProxy. LVS is a four-layer load balancing, and internal
servers . Nginx and HAProxy are seven-layer load balancing, and internal servers can be selected according to the content of the message. Therefore, the LVS distribution path is better than Nginx and HAProxy, and the performance is higher. On the other hand,
Nginx and HAProxy are more configurable, for example, they can be used for dynamic and static separation (select a static resource server or an application server according to the characteristics of the request message).

5. Database read-write separation and sub-database sub-table

With
the increase in the number of users, the database becomes the biggest bottleneck. The commonly used methods to improve the performance of the database are to separate the read and write and sub-database and sub
- To achieve data synchronization. The sub-database sub-table is divided into horizontal segmentation and vertical segmentation, and the horizontal segmentation is to split a very large table in a database, such as a user table. Vertical segmentation is based on different businesses, such as user
business, commodity business related tables are placed in different databases.

Architecture 3

6. Use CDN and reverse proxy to improve website performance

If
our servers are all deployed in the computer room in Chengdu, the access is faster for users in Sichuan, but slower for users in Beijing. This is because Sichuan and Beijing belong to different developed regions of China Telecom and China Unicom, respectively.
In the area, users in Beijing need to go through a long path to access the server in Chengdu through an interconnected router, and the return path is the same, so the data transmission time is relatively long. In this case, CDN is often used
to solve the problem. CDN caches data content in the operator's computer room, and users obtain data from the nearest operator when accessing, which greatly reduces the path of network access. More professional CDN operators include Lanxun and Wangsu.

The reverse proxy is deployed in the computer room of the website. When the user request arrives, the reverse proxy server is first accessed, and the reverse proxy server returns the cached data to the user. If there is no cached data, it will continue to access the application server to obtain it. Doing reduces the cost of acquiring data. Reverse proxy has Squid, Nginx.

Architecture 5

7. Use a distributed file system

The number of users is increasing day by day, the business volume is getting larger and larger, and more and more files are generated. A single file server can no longer meet the demand. At this time, the support of the distributed file system is needed. Commonly used distributed file systems include GFS, HDFS, and TFS.

Architecture 5.5

8. Use NoSql and search engines

For the query and analysis of massive data, we use the nosql database plus search engine to achieve better performance. Not all data needs to be in relational data. Commonly used NOSQL are mongodb, hbase, redis, and search engines include lucene, solr, and elasticsearch.

Architecture 6

9. Split the application server for business

With the further expansion of the business, the application becomes very bloated. At this time, we need to split the application into business, such as Baidu into news, web pages, pictures and other businesses. Each business application is responsible for relatively independent business operations. Businesses communicate through messages or share databases.

Architecture 7

10. Building distributed services

At this time, we found that each business application will use some basic business services, such as user service, order service, payment service, and security service. These services are the basic elements supporting each business application. We extract these services to build distributed services using a partial service framework. Ali's Dubbo is a good choice.

Architecture 8

summary

The architecture of large-scale websites is constantly improved according to business needs, and specific designs and considerations will be made according to different business characteristics. This article only describes some technologies and means involved in a conventional large-scale website.

Article source: http://blog.csdn.net/u012388609/article/details/58086636

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325988177&siteId=291194637