The development history of B / S structure

The system architecture of a mature large website (such as Taobao, Jingdong, etc.) is not designed to have complete high performance, high availability, security and other characteristics. It always evolves and improves with the increase of the number of users and the expansion of business functions. During this process, great changes have taken place in the development model, technical architecture, and design ideas. Even the technical staff has grown from a few people to a department or even a product line. Therefore, the mature system architecture is perfected with business expansion, and it is not achieved overnight; systems with different business characteristics will have their own focuses, such as Taobao, to solve the search, order, and payment of massive commodity information, such as Tencent To solve the real-time message transmission of hundreds of millions of users, Baidu has to deal with massive search requests. They all have their own business characteristics and system architectures are also different. Nevertheless, we can also find out the common technologies from these different website backgrounds. These technologies and methods can be widely used in the architecture of large-scale website systems. Let ’s understand these technologies by introducing the evolution of large-scale website systems. And means.
1. The initial site architecture

In the initial architecture , applications, databases , and files are deployed on a server , as shown in the figure:

2. Separation
of applications, data, and files With the expansion of business, a server can no longer meet the performance requirements. Therefore, deploy applications, databases, and files on separate servers, and configure different hardware according to the purpose of the server to achieve The best performance effect.

Third, use cache to improve website performance.
While hardware optimizes performance, it also optimizes performance through software. In most website systems, cache technology is used to improve system performance. The use of cache is mainly due to the presence of hot data. Some website visits follow the 28 principle (that is, 80% of the access requests, which ultimately fall on 20% of the data), so we can cache hot data, reduce the access path of these data, and improve the user experience.

Common ways to implement cache are local cache and distributed cache. Of course, there are CDN, reverse proxy, etc., which will be discussed later. Local cache, as the name implies, is to cache data locally on the application server, which can be stored in memory or in files. OSCache is a commonly used local cache component. The characteristics of local cache are fast, but because of the limited local space, the amount of cached data is also limited. The characteristics of the distributed cache are that it can cache massive data and it is very easy to expand. It is often used in portal websites. The speed is not as fast as the local cache. The commonly used distributed caches are Memcached and Redis.
Fourth, use clusters to improve the performance of
application servers Application servers as the entrance to the website will bear a large number of requests, we often share the number of requests through the application server cluster. A load balancing server is deployed in front of the application server to schedule user requests and distribute the requests to multiple application server nodes according to the distribution strategy.

Commonly used load balancing technology hardware is F5, the price is relatively expensive, and software is LVS, Nginx, HAProxy. LVS is a four-layer load balancing, select the internal server according to the target address and port, Nginx is a seven-layer load balancing and HAProxy supports four-layer and seven-layer load balancing, you can choose the internal server according to the content of the message, so the LVS distribution path is better than Nginx and HAProxy has higher performance, while Nginx and HAProxy are more configurable, for example, they can be used for dynamic and static separation (based on the characteristics of the request message, choose a static resource server or an application server).
Fifth, database read-write separation and sub-library sub-table
As the number of users increases, the database becomes the biggest bottleneck. Common methods for improving database performance are read-write separation and sub-table. As the name implies, read-write separation is to divide the database into read libraries And write the library to achieve data synchronization through the main and standby functions. The sub-library and sub-table are divided into horizontal and vertical segmentation, and horizontal switching is to split a large database table, such as a user table. Vertical segmentation is to switch according to different business, such as user business, commodity business related tables are placed in different databases.

6. Use CDN and reverse proxy to improve website performance.
If our servers are all deployed in a computer room in Chengdu, access is faster for users in Sichuan, but slower for users in Beijing. This is because Sichuan and Beijing belongs to different developed regions of China Telecom and China Unicom respectively. Beijing users need to access Internet servers through a long path through Internet routers. The return path is the same, so the data transmission time is relatively long. For this situation, the CDN is often used to solve the problem. The CDN caches the data content in the operator's computer room, and the user first obtains data from the nearest operator when accessing, which greatly reduces the path of network access. More professional CDN operators include Lanxun and Wangsu.
The reverse proxy is deployed in the computer room of the website. When the user's request is reached, the reverse proxy server is first accessed. The reverse proxy server returns the cached data to the user. It also reduces the cost of acquiring data. Reverse proxy has Squid, Nginx.

Seven, use distributed file system

Users are increasing day by day, the business volume is increasing, more and more files are generated, and a single file server can no longer meet the demand. Need distributed file system support. Commonly used distributed file system has NFS.

Eight, using NoSql and search engine
For massive data query, we use nosql database plus search engine to achieve better performance. Not all data should be placed in relational data. Commonly used NOSQL are mongodb and redis, and the search engine is lucene.

Nine, split the application server business
As the business further expands, the application becomes very bloated, then we need to split the application business, such as Baidu divided into news, web pages, pictures and other services. Each business application is responsible for relatively independent business operations. Businesses communicate through messages or share databases.

Ten, build distributed services
At this time we found that each business application will use some basic business services, such as user services, order services, payment services, security services, these services are the basic elements to support each business application. We extracted these services and built a distributed service using a partial service framework. Taobao's Dubbo is a good choice.

Summary
The structure of large websites is continuously improved according to business needs, and specific designs and considerations will be made according to different business characteristics.

Published 51 original articles · Like 4 · Visitor 7893

Guess you like

Origin blog.csdn.net/u012174809/article/details/103070733