Do you understand that the large-scale website pages of BAT manufacturers are static?

When our little friends visit large websites such as Taobao and Netease, have we considered how to deal with the homepage, product detail page and news detail page of the website? How can it support such a large traffic of visits?

Many friends will propose that they all adopt a static solution, so that users request to obtain static data HTML directly without accessing the database, and the performance will be greatly improved; it will also improve website SEO optimization. So today Lao Gu will take you to talk about staticization. Share with your friends the problems Lao Gu encountered with static solutions in his previous work scenarios and how they evolved.

Regarding the CDN technology related to static files, Lao Gu will not talk about it here. This large-scale website will definitely use it. What is CDN, friends can check it online, it is relatively simple; we focus on technical solutions here.

Solution 1: Static HTML of web pages

This solution is the first one used by Lao Gu. Let’s take the CMS system as an example, which is similar to NetEase’s news website; the core flow chart

42c632bf9866b97805de78a68826342b.jpeg

The core idea of ​​the above picture:

1) After the management background successfully calls the news service to create an article, it sends the message to the message queue.

2) The static service listens to the news and makes the article static, that is, generates an html file

3) Install a file synchronization tool on the static server. The function of this tool can only synchronize files that have changed, that is, do incremental synchronization (Old Gu used it for a long time and forgot the name of the tool)

4) Synchronize html files to all web servers through synchronization tools

In this way, when users access some pages that have not changed much, they directly access the HTML files and return them directly to the web server. There is no need to access the database, and the system throughput is relatively high.

Problems with this solution:

1. The web page layout style is rigid and cannot be modified.

If the product manager feels that the layout of the news details page needs to be adjusted, the current one is not beautiful enough, or other modules should be added, it would be a scam. We need to make all the static HTML articles static again. This is unrealistic, because as big as NetEase, the amount of news is huge, and it will be killed.

2. There will be temporary time inconsistency on the page.

There will be a message that the user just watched the latest news, but after refreshing it it no longer exists. This is because the synchronization tool takes time to synchronize to the web server. It has synchronized to web server A, but web server B has not yet had time to synchronize. When the user accesses, load balancing is performed through nginx, and the request is randomly assigned to the web server. Of course, you can adjust the nginx load balancing strategy to solve it.

3. There are too many Html files and cannot be maintained

This is an obvious problem. There will be more and more HTML files, which require a lot of storage space, and each web server is the same, which wastes disk space; future migration and maintenance will also cause a lot of trouble.

4. Instability of synchronization tools

Because once there are too many files, there will be problems with the stability of the synchronization tool.

This solution should be more traditional (not recommended)

Option 2: Pseudo-static

What is Pseudo-Static state?

For example: we generally access an article, the general link address is: http://www.xxx.com/news?id=1 represents the request for the article with id 1. However, this link method is not very friendly to SEO (SEO is too important to the website); so it is generally modified: http://www.xxx.com/news/1.html This looks like a static page. Generally, we can use nginx to rewrite url. If you are interested, you can learn by yourself, it is relatively simple.

The reason why it is pseudo-static actually requires dynamic processing.

In response to the above problems of plan 1, the plan further evolved, as shown below

d8fe0d7b9447cd091f7003d87a884c4a.jpeg

The core idea of ​​this plan

1) After the management background successfully calls the news service to create an article, it sends the message to the message queue.

2) The cache service listens to the message and caches the article content to the cache server.

3) The user initiates a request, and the web server directly queries the cache server based on the ID.

4) Get the data back to the user

This solution solves a big problem of solution one, which is the problem of too many html files, because there is no need to generate html, and the cache method is used to solve the problem of not needing to access the database and improve the system throughput.

But the problem with this solution:

1. The maintenance cost of the web page layout style is relatively high, because this solution still puts all the content in the cache. If you need to modify the layout, you need to reset the cache.

2. The distributed cache is under great pressure. Once the cache fails, all requests will query the database, causing the system to crash.

There is also a small problem, which is real-time data processing, that is, prices and inventory on the page need to be read in the background. Of course, my friends may say that it can also be processed. After the user requests the product content, then use the browser to send an asynchronous ajax request to get the product quantity. This is an invisible increase in requests. (This problem can be ignored)

This solution is similar to that used by many companies, such as: Tongcheng Travel, etc.

Solution 3: Layout style template

For the problem of solution 2, we can adopt the openresty technology solution and use the http template plug-in lua script to solve it. Lao Gu will not introduce the openresty+lua technology here. Interested partners can visit https://www.roncoo .com/view/139 This video course.

As shown below:

d5a4f1245c1deb7383c9cb10355d208d.jpeg

Let me explain here that we do not need to understand everything in the picture above. This is a relatively comprehensive product detail page solution, which involves the concept of three-level caching. Here, Lao Gu will not talk about three-level caching in depth.

We mainly look at how there are two layers of ngnix above, the distribution layer and the application layer. What does this mean?

Application layer nginx

Lao Gu first introduces what the application layer nginx means? Nginx is generally used for load balancing. In fact, nginx has many functions, especially its openresty extension + lua scripting language can complete many functions. Friends can understand that lua scripting language is similar to java language, which can dynamically process business. Such as: local cache processing, remote http access, access to redis, etc.

The application layer nginx uses http templates + cache web page rendering completed through lua scripts

httptemplate

b060eb71b2cc837c95db591b75e8069a.jpeg

1) The application layer nginx first obtains the local product data through the lua scripting language, and then renders it with the http template to form the final product details page and return it to the user

2) If the local cache of nginx in the application layer does not have this product data, it will initiate an http request through the Lua script to access the web server to obtain the product data.

3) The web server will request product data from redis or the local ehcache (the concept of three-level cache is involved here). If the product data exists, it will be returned directly to the user; if it does not exist, it will request the microservice to access the database.

This idea is to solve the layout style problem in Solution 2 through http templates. If you need to adjust the layout, you only need to change the template, which is very convenient. It also solves the real-time problem. The nginx local cache involved here is actually to ensure that there is no need to access the database and improve system throughput. Friends just need to understand the idea. If you don’t know openresty and lua, you can go online to find out by yourself, or you can contact Lao Gu.

Distribution layer ngnix

Why is there a distribution layer on top? This is because large websites have too many products, and the local cache of application layer nginx is limited. It is impossible to cache all product data in the local cache of the same server; an application layer nginx can only cache part of the product data. Speaking of which, my friends should know why, right? It is to use the hash consistency algorithm to route and distribute to the same application layer ngnix server according to the product id.

17cc42b0cb13a942e31e919c04ea9ad7.jpeg

The role of ngnix at the distribution layer is to balance the load of the hash strategy, ensuring that the product id is routed to a fixed application layer server.

The third-level cache ensures the stability of the system. Even if the redis cache crashes, there are two other cache guarantees.

Summarize:

  1. Option 3 is a relatively complete solution, used by many major manufacturers, and can withstand hundreds of millions of traffic, but the system is more complex.

  2. If the real-time requirements are not high, and the layout style adjustment is not frequent, you can consider the second option, the system is relatively simple

Guess you like

Origin blog.csdn.net/lxw1844912514/article/details/132242070