How to design a product details page system with large traffic and large data?

How to design a fast and reliable storage architecture to support commodity systems?

The main function of the e-commerce product system is to add, delete, modify and check product information. There is no very complicated business logic. The main page supported is the product details page (hereinafter referred to as: business details). However, when designing the storage of this system, you still need to consider two aspects.

First, we must consider the issue of high concurrency. No matter what e-commerce system it is, the business details page must be one of the pages with the highest DAU (daily average visits) in the entire system. This is not difficult to understand. When users are shopping, they will not necessarily buy after seeing the business details. They will definitely look at multiple business details and shop around before buying. Therefore, the number of views of the business details is much higher than that of other pages in the system. If the issue of high concurrency is not considered when designing storage, the product system that supports the product details page will definitely be the first system to be overwhelmed by traffic during a big sale.

Second, something to consider is the scale of product data. I summed up the data scale of the business details page in six words: large quantity, large weight.

Let’s first talk about why the quantity is large. For domestic first-tier e-commerce companies, the number of SKUs (literally translated as: stock keeping unit, in the e-commerce industry, you can directly understand it as “goods”) is on the order of hundreds of millions to billions. Of course, there are not actually so many kinds of products. There are many reasons for this. For example, the same product has different versions and models. For example, for the purpose of promotion, merchants may put the same product on and off the shelves repeatedly or match the same product with different labels. vests, this has led to an explosion in the number of SKUs.

As for this "heavy weight", you can open an e-commerce business details page and scroll down from top to bottom to see how long it is? Business detail pages within ten screens are called short, and there are not only a lot of text, but also a lot of pictures and videos, and even AR/VR gameplay in it, so every business detail page has He is a "big fat man".

To support the storage of commodity systems, it is an arduous task to save so many "big fat guys" and to support high concurrency.

What data does the commodity system need to save?

Let’s first take a look at what information needs to be saved on a business details page. I summarized all the information in a business details page and put it in the mind map below.

Here, the gray part on the right comes from other e-commerce systems. We will ignore these for the time being. The colored part on the left is the content that the product system needs to store.

How to store so much content? Can you design a product table and put all this data in it just like you save order data? Is it okay to add a few more sub-tables if one table cannot be stored? Don’t say no, this is what these major e-commerce companies did in the early stages of their development. Today's complex distributed storage architecture has evolved bit by bit.

The advantage of doing this is that it is fast, simple, reliable and easy to implement, but it cannot support much data volume or concurrency. If you want to quickly build a small-scale e-commerce at low cost, this is really a reasonable choice.

Of course, if the scale is larger, you can't do this. If a database cannot be used, which storage system should be chosen to save such complex product data? No kind of storage can satisfy the needs. The solution is to divide and conquer. We can divide the data that the product system needs to store according to its characteristics into basic product information, product parameters, pictures and videos, and product introductions to store them separately.

How to store basic product information?

Let’s first analyze the basic information of the product, which includes the main and subtitles of the product, price, color and other most basic and main attributes of the product. These attributes are fixed and are unlikely to change due to demand or different products, and this part of the data will not be too large. Therefore, it is still recommended that you create a table in the database to save the basic information of the product.

Then, you need to add a cache in front of the database to help the data withstand most read requests. For this cache, you can use Redis or Memcached. Both storage systems are memory-based KV storage and can solve the problem.

Next, let me take a brief look at how to use pre-caching to cache product data.

When processing a read request for product information, first search the cache, and if found, directly return the data in the cache. If it is not found in the cache, check the database again, return the product information found from the database to the page, and put a copy of the data in the cache.

When updating product information, while updating the database, the data in the cache must also be deleted. Otherwise, this situation may occur: the data in the database has changed, but the data in the cache has not changed, and the old data is still seen on the business details page.

This cache update strategy, called Cache Aside, is the simplest and most practical cache update strategy and has the widest scope of application. If you want to cache data, and there are no special circumstances, you should consider using this strategy first.

In addition to Cache Aside, there are several strategies such as Read/Write Through and Write Behind, which are suitable for different situations. I will talk about them specifically in the following courses.

When designing the basic product information table, one thing that needs to be reminded is that you must remember to retain every historical version of the product data. Because product data changes at any time, it is important that the product data associated with the order must be the product data at the time the order is placed. You can save a snapshot of each historical version of product data, create a historical table and save it in MySQL, or you can save it in some KV storage.

Use MongoDB to save product parameters

Let's analyze the product parameters again. The parameters are the characteristics of the product. For example, the memory size of the computer, the screen size of the mobile phone, the alcohol content, the lipstick color, etc. Like the basic attributes of products, they are all structured data. But the trouble is that different types of products have completely different parameters.

If we design a product parameter table, the table will have too many fields, and every time a category of product is added, fields will be added to the table. This solution will not work.

Since one table cannot solve the problem, then create a table for each category. For example, create a computer parameter table with fields such as CPU model, memory size, graphics card model, hard drive size, etc.; and create a wine parameter table with fields such as alcohol content, flavor, origin, etc. If there are relatively few categories, within 100, it is also possible to use dozens of tables to store the product parameters of different categories. But is there a better way?

Most databases require data tables to have a fixed structure. But there is a kind of database that does not have this requirement. It is particularly suitable for storing data with unfixed attributes such as "product parameters". This database is MongoDB.

MongoDB is a NoSQL database for document storage. In MongoDB, the concepts corresponding to tables, rows, and columns are: collection, document, and field. In fact, they are all the same thing. To make it easier for you to understand, we will not be literal here. Let's use "table, row, column" to explain.

The biggest feature of MongoDB is that its "table structure" does not need to be defined in advance. In fact, there is no table structure in MongoDB at all. Since there is no table structure, it allows you to put any data in the same table. You can even save data with completely different structures such as product data, order data, logistics information, etc. in one table. Moreover, it can also support querying based on a certain field of the data.

How is it done? Each row of data in MongoDB is simply converted into BSON format at the storage layer and stored. This BSON is a more compact JSON. Therefore, even in the same table, the structure of each row of data can be different. Of course, such flexibility also comes at a price. MongoDB does not support SQL. Multi-table joint queries and complex transactions are relatively weak, and it is not suitable for storing general data.

However, for product parameter information, large amounts of data, and inconsistent data structures, MongoDB can satisfy these requirements. We don’t need transactions and multi-table joint queries. MongoDB is simply tailor-made for storing product parameters.

Save images and videos using object storage

Because pictures and videos take up relatively large storage space, the general storage method is to save only the ID or URL of the picture and video in the database, and the actual pictures and videos are stored separately in the form of files.

Nowadays, image and video storage technology has become very mature, and the preferred way is to store them in object storage (Object Storage). All major cloud vendors provide object storage services, such as domestic Qiniu Cloud, AWS's S3, etc. There are also open source object storage products, such as MinIO, which can be deployed privately. Although the API of each product is different, the functions are similar.

Object storage can be simply understood as a KV storage of large files with unlimited capacity. Its storage unit is an object, which is actually a file. It can be a picture, a video, or any other file. Each object has a unique key, and you can use this key to access the corresponding object at any time. The basic functions are to write, access and delete objects.

Most cloud service vendors' object storage provides client APIs, which can be accessed directly from web pages or apps without having to go through back-end services. In this way, when apps and pages upload pictures and videos, they can be directly saved to the object storage, and then the corresponding key can be saved in the product system.

When accessing pictures and videos, the real pictures and video files do not need to go through the back-end service of the product system. The page is directly accessed through the URL provided by the object storage, which saves trouble and bandwidth. Moreover, almost all object storage cloud services come with CDN (Content Delivery Network) acceleration service, and the response time is shorter than that of directly requesting business servers.

The object storage of many domestic cloud vendors has made a lot of targeted optimizations for pictures and videos. The most useful thing is to scale images and video transcoding. You only need to throw the images and videos into the object storage, and you can get images of any size at any time. The videos will also be automatically transcoded into versions of various formats and bit rates. Adapt to various apps and scenarios. All I can say is, whoever uses it will know, it smells great!

Make product introduction static

Product introduction accounts for the largest proportion of the product details page and contains a large amount of formatted text, pictures and videos. Pictures and videos are naturally stored in object storage. The text of product introductions is usually static along with the product details page and stored in HTML files.

What is staticization? Static is compared to dynamic pages. Generally, the Web system we deploy to Tomcat returns dynamic pages, which are dynamically generated during Web requests. For example, for the business details page, a Web request comes with SKUID and the business details page module in Tomcat, and then accesses various databases, calls back-end services, dynamically spells out the business details page, and returns it to the browser.

However, basically no system will do this now. Do you think that for the business details page of each SKU, wouldn’t the content of the page you dynamically generate every time be exactly the same? Generating so many times not only wastes server resources, but is also slow. The key problem is that the amount of concurrency that Tomcat can withstand is not in the same order of magnitude as Nginx.

Most of the content on the business details page is product introduction, which does not change very much. It would be better to generate this page in advance and save it as a static HTML. When accessing the business details page, this HTML will be returned directly. This is static.

After the business details page is made static, it can not only save server resources, but also use CDN acceleration to place the business details page on the CDN server closest to the user, making the business details page access faster.

As for product prices, promotion information and other information that needs to change frequently, it cannot be statically staticized into the page. You can use AJAX on the front-end page to request the product system to obtain it dynamically. This takes into account the advantages brought by staticization and can also solve the problem of real-time updating of product prices and other information.

summary

Finally, let’s review today’s content. The storage of the product system needs to provide basic information of the product, product parameters, pictures and videos, product introduction and other data. The basic information and product parameters of the product are stored in MySQL and MongoDB respectively, using Redis as the front cache, pictures and videos are stored in the object storage, and the product introduction is staticized into the static product details page along with the product details page.

I drew the storage of the commodity system as the following picture:

Let’s take a look at the picture. What is the final effect of the storage of such a commodity system? The solid line in the figure indicates the data that needs to be actually transmitted each time the business details page is accessed, and the dotted line indicates that data transmission is only required when the data on the business details page changes. When a user opens the product details page of a SKU, he first goes to the CDN to obtain the HTML of the product details page, and then accesses the product system to obtain frequently changing information such as prices. This information is obtained from the Redis cache. Picture and video information are also obtained from the object storage CDN.

Analyzing the effect, the pictures, videos and product introductions with the largest amount of data are obtained from the CDN service provider closest to the user, which is fast and saves bandwidth. The real requests to the product system are product information that needs to be obtained dynamically such as price. Generally, a Redis query is enough, and there will basically be no traffic to MySQL.

The storage architecture of such a commodity system transfers most requests to cheap and fast CDN servers, which can withstand a large number of concurrent requests with a very small amount of server and bandwidth resources.

Guess you like

Origin blog.csdn.net/moshowgame/article/details/131849868