MongoDB's practice in commenting on the platform

This article mainly describes the technical exploration and practice of Vivo Review China and Taiwan in database design.

1. Business background
With the development of the company's business and the increase in the number of users, many projects are creating their own comment functions, and the business forms of comments are basically similar. At that time, each project was designed and implemented individually, and there was a lot of repetitive workload; and there were islands of data between different businesses, which made it difficult to establish contact. Therefore, we decided to build a company-level review service center to provide various business parties with quick access to review services. After analyzing the competing products of the major mainstream APP review services, we found that most of the review business forms have the functions of comment, reply, second reply, and likes.

The details are shown in the figure below:

The core business concepts involved are:

[Topic topic] the subject of the comment, the product of the mall, the APP of the application store, the post of the community

【Comment】The content posted by the user on the topic

[Reply] The content posted by the user in response to a comment, including the first-level reply and the second-level reply

2. Database storage selection The
team compared a variety of mainstream databases during database selection and design, and finally made a choice between MySQL and MongoDB storage.

Due to the particularity of the review business, it needs the following capabilities:

[Field Extension] There are certain differences in the fields stored in different comment models of the business side, which need to support dynamic automatic extension.

[Massive data] As a company's mid-office service, the amount of data has doubled as the number of business parties increases, requiring fast and convenient horizontal expansion and migration capabilities.

[High availability] As a mid-range product, it needs to provide fast and stable reading and writing capabilities, capable of reading and writing separation and automatic recovery.

The review business does not involve user assets and is not very demanding for transactions. Therefore, we chose the MongoDB cluster as the lowest data storage method.

3. In-depth understanding of the MongoDB
3.1 cluster architecture.
Since a single machine has disk/IO/CPU and other bottlenecks, MongoDB provides a cluster deployment architecture, as shown in the figure:

It is mainly composed of the following three parts:

mongos: routing server, responsible for managing specific links on the application side. After the application side requests the mongos service, mongos forwards the specific read and write requests to the corresponding shard node for execution. A cluster can have 1~N mongos nodes.

config: Configuration server, used to store the metadata and configuration information of the shard collection. It must be deployed in a replication set (poke me about the concept of replication set). Mongos configures server and metadata information through config.

shard: The mongod service used to store the sharded data of the collection must also be deployed in a replica set.

3.2 The shard key
MongoDB data is stored in the collection (corresponding to the MySQL table). In the cluster mode, the collection is split into multiple intervals according to the shard key, and each interval forms a chunk, which is distributed in different shards according to the rules. And the formation of metadata is registered in the config service for management.

The shard key can only be specified when the shard set is created, and cannot be modified after it is specified. There are two main types of shard keys:

Hash sharding: Hashing is performed through the hash algorithm, and the data distribution is more even and scattered. Support single-column and multi-column hash.

Range fragmentation: According to the value distribution of the specified fragment key, continuous keys are often distributed in continuous intervals, which is more suitable for scope query scenarios. The hashability of single data is guaranteed by the shard key itself.

3.3 Comment on the practice of the middle station
3.3.1
As the extension of the cluster serves as the middle station, for different access business parties, the data is distinguished by table isolation. Take the comment form as an example. Each access business party creates a separate table. The business party A table is comment_clientA and the business party B table is comment_clientB. Both tables and corresponding index information are created during access. But there are several problems with this design:

A single cluster cannot meet the needs of physical isolation of some business data.

Cluster tuning (such as split migration time) is difficult to differentiate settings for business characteristics.

The data of a single business party caused by horizontal expansion is too scattered.

Therefore, we have extended the MongoDB cluster architecture:

The expanded comment MongoDB cluster adds the concepts of [logical cluster] and [physical cluster]. A business party belongs to a logical cluster, and a physical cluster contains multiple logical clusters.

The routing layer design is added, and the application is responsible for extending Spring's MongoTemplate and connection pool management, which realizes the switch selection service between the business and the MongoDB cluster.

Different MongoDB sharded clusters realize the possibility of physical isolation and differential tuning.

3.3.2 Selection of
shard key In a MongoDB cluster, the data deployment of a collection is scattered among multiple shard shards and chunks, and we hope that a query of a review list should only access one shard shard, so it is determined The way of range fragmentation.

In the beginning, only a single key was used as the sharding key. Take the comment form as an example. The main fields are {"_id": unique id, "topicId": topic id, "text": text content, "createDate": time}, consider The comments to a topic id are distributed as continuously as possible, and the sharding key we set is topicId. With the intervention of performance testing, we found two very fatal problems:

jumbo chunk problem

Unique key problem

jumbo chunk:

In official documents, the chunk size in MongoDB is limited to 1M-1024M. The value of the shard key is the only basis for chunk division. When the amount of data continuously written exceeds the set value of chunk size, the MongoDB cluster will automatically split or migrate. The write to the same shard key belongs to a chunk and cannot be split, which will cause jumbo chunk problems.

For example, if we set 1024M as the size of a chunk, and a single document is calculated as 5KB, then a single chunk can store about 21W documents. Consider hot topic comments (such as WeChat comments), the number of comments may reach 40W+, so a single chunk can easily exceed 1024M. Chunks exceeding the maximum size can still provide read and write services, but will not be split and migrated, which will cause data imbalance between clusters in the long run.

Unique key problem:

The unique key setting of the MongoDB cluster increases the limit, and it must contain the shard key; if _id is not the shard key, the _id index can only guarantee the uniqueness on a single shard.

You cannot specify a unique constraint on a hashed index

For a to-be-sharded collection, you cannot shard the collection if the collection has other unique indexes

For an already-sharded collection, you cannot create unique indexes on other fields

So we deleted the data and the collection, adjusted topicId and _id to recreate the collection as the joint shard key. This breaks the limitation of chunk size and also solves the uniqueness problem.

3.4 Migration and expansion
As data is written, when the size of the data in a single chunk exceeds the specified size (or the number of files in the chunk exceeds the specified value). The MongoDB cluster will automatically trigger chunk splitting when inserting or updating.

Splitting will cause uneven distribution of data blocks in the collection. In this case, the MongoDB balancer component will trigger the migration of data blocks between clusters. The balancer component is a background process that manages data migration. If the difference in the number of chunks between each shard shard exceeds the threshold, the balancer will perform automatic data migration.

The balancer can migrate data online, but the migration process will have a greater impact on the load of the cluster. The general suggestion can be set as follows, when the business is low (more on the official website)

db.settings.update(
{ _id: "balancer" },
{ $set: { activeWindow : { start : "<start-time>", stop : "<stop-time>" } } },
{ upsert: true }
)

The expansion of MongoDB is also very simple. You only need to prepare a new shard replica set and execute it in the Mongos node:

sh.addShard("<replica_set>/<hostname><:port>")

During the expansion period, the migration of chunks will also reduce the availability of the cluster, so it can only be carried out at low peaks of business.

4. At the end, the
MongoDB cluster has been online for more than a year in the review of the China-Taiwan project. During the process, about 10 business parties have been accessed, and it has stored 100 million+ review response data, and its performance is relatively stable. BSON's unstructured data also supports the rapid upgrade of our multiple versions of business. The memory storage engine for popular data greatly improves the efficiency of data reading.

But for MongoDB, cluster deployment is an irreversible process. After clustering, it also brings many restrictions such as indexes and sharding strategies. Therefore, when general business uses MongoDB, the replica set method can support TB-level storage and query, and it does not necessarily need to use the clustering method.

The above content is based on the features of MongoDB 4.0.9 version, and the details are slightly different from the latest version of MongoDB.

Reference materials: https://docs.mongodb.com/manual/introduction/

Author: vivo official website mall development team

Guess you like

Origin blog.51cto.com/14291117/2642218