Database segmentation "basic ideas", "common problems"

advantage:

The sub-library reduces the load of the single-point machine;

Sub-tables improve the efficiency of data operations, especially the efficiency of write operations

 

1. Vertical segmentation

Applicable scenarios: There are many tables and many data.

Features: Simple rules, clear business logic, and very low business coupling. Tables used according to the same business are placed in the same database. In the vertically segmented table aggregation, find the "root element", and perform horizontal segmentation according to the "root element", that is, starting from the "root element", put all the data directly and indirectly related to it into a shard In (fragment), for example, for social networking sites, almost all data will eventually be associated with a user, and segmentation based on users is the best choice. Another example is the forum system. The user and forum modules should be divided into two shards when vertically split. For the forum module, Forum (forum) is obviously the aggregate root. It is natural that all posts and replies in the Forum are placed in a shard with the Forum.

 

 

2. Horizontal segmentation

Applicable scenarios: less table and more data.

Features: The splitting rules are complex, and the later maintenance is complex. For a large number of tables, split and concatenate, and different data in the same table are split into different databases.

The advantages of segmentation: the index overhead is reduced, and the table lock time of a single table write operation is reduced.

For example, there are 5000w pieces of data in the article table. At this time, we need to add (insert) a new piece of data to this table. After the insert is completed, other databases will re-index this table, and 5000w rows of data will be indexed. System development The cost cannot be ignored. But conversely, if we divide this table into 100 tables, from article_001 to article_100, 5000w rows of data are averaged, each sub-table contains only 500,000 rows of data, at this time we send a table with only 50w rows of data After inserting the data, the indexing time will be reduced by an order of magnitude, which greatly improves the runtime efficiency of the db and improves the concurrency of the db.

Slicing rules:

a. Divide by number

eg: id is the distinction, the corresponding db1 of 1~1000, the corresponding db2 of 1001~2000, the corresponding db3 of 2001~2100, and so on

id is the distinction, the corresponding db1 of 1 to 1000, the corresponding db2 of 1001 to 2000, and so on

Advantages: Partial migration possible

Disadvantage: uneven distribution of data

 

b.hash modulo

Hash the id (or use the value of the id directly if the id is numeric), and then use a specific number. For example, if you need to divide an other database into 4 other databases in application development, we will use 4. The number performs modulo operation on the hash value of the id, that is, id% 4. In this case, there are four possibilities for each operation: when the result is 1, it corresponds to db1; when the result is 2, it corresponds to db2; when the result is 3, it corresponds to db3; when the result is 0, it corresponds to db4, so that the data is distributed into 4 dbs very evenly.

Pros: Evenly distributed data

Disadvantages: It is troublesome to migrate data, and data cannot be allocated according to machine performance

 

c. Save other database configurations in the authentication library

It is to create a db, which saves the mapping relationship between user_id and db. Every time you access other databases, you must first query this other database to get specific db information, and then you can perform the query operations we need.

Pros: Flexibility, one-to-one relationship

Disadvantages: One more query is required before each query, and the performance is greatly reduced

 

Usually, the system is used in combination with horizontal and vertical segmentation. The system performs vertical segmentation, and individual large tables are horizontally segmented, that is, vertical segmentation first and then horizontal segmentation.

 

3. Common problems of segmentation and coping strategies

a. Transaction issues:

There are currently two feasible solutions to solve the transaction problem: distributed transaction and implementation of transaction through joint control of application program and database. Let's make a simple comparison between the two solutions.

Option 1: Use Distributed Transactions

   Advantages: managed by database, simple and effective

   Disadvantage: High performance cost, especially as more and more shards grow

Option 2: Controlled by the application and the database

    Principle: Split a distributed transaction across multiple databases into multiple

          Small transactions on a single database and overall control through the application

          Small things.

    Advantages: performance advantage

    Disadvantage: Requires the application to do flexible design on transaction control. If using   

          With spring's transaction management, changes will face certain difficulties.

b. The problem of cross-node Join

   As long as it is time-line segmentation, cross-node Join queries are inevitable. But good design and segmentation can reduce this occurrence. The common practice to solve this problem is to query the implementation in two times. Find the id of the associated data in the result set of the first query, and initiate a second request to obtain the associated data according to these ids.

c. Cross-node count, order by, group by and aggregation function problems

   These are a class of problems because they all require computation based on the entire set of data. Most proxies do not automatically handle merging. Solution: Similar to solving the cross-node join problem, the results are obtained on each node and merged on the application side. Unlike join, the query of each node can be executed in parallel, so it is often much faster than a single large table. But if the result set is large, the consumption of application memory is a problem.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326996161&siteId=291194637