[Technical solution selection] MySQL database sub-database sub-table scheme

foreword

As the project continues to iterate, the number of users continues to increase. The data in some tables in the database is gradually expanding, and it is rapidly approaching a single table. So recently the leader is also considering sub-database and sub-table, and write this article to record it.

1. What is sub-database and sub-table?

Sharding: The process of splitting a single database into multiple databases, where data is scattered across multiple databases.
Table splitting: the process of splitting a single table into multiple tables, and scattering data across multiple tables.

2. Why separate databases and tables?

With the business development of the platform, there may be more and more data, even reaching billions. Taking MySQL as an example, the performance of a single database is better when the amount of data is less than 50 million. After the threshold is exceeded, the performance will decrease significantly as the amount of data increases. If the data volume of a single table exceeds 1000w, the performance will also drop seriously. This will lead to a longer query time, and when the number of concurrent operations reaches a certain amount, it may be stuck, and even the system will be dragged down.

3. How to choose the sub-database and sub-table strategy?

Segmentation scheme solved problem
Only sub-database not sub-table Database read? Write QPS is too high, and the number of database connections is insufficient
Only the tables are not divided into databases The data in a single table is too large, and the storage performance encounters a bottleneck
Divide database and table Storage performance bottleneck caused by insufficient number of connections + excessive data volume

4. The method of sub-database sub-table and the problems it brings?

Sub-database and sub-table effectively alleviate the performance and pressure brought by big data and high concurrency, and can also break through the bottleneck of network IO, hardware resources, and number of connections, but it also brings some problems.

4.1, transaction consistency problem

​ Since the sub-database and sub-table distribute data in different databases or even different servers, it will inevitably lead to distributed transaction problems, and we need additional programming to solve this problem.

4.2, cross-node join

​ Before sub-database sub-table, we can use the following SQL to perform associated query on store information when retrieving products:

SELECT p.*,s.[店铺名称],s.[信誉]
FROM [商品信息] p 
LEFT JOIN [店铺信息] s ON p.id = s.[所属店铺]
WHERE...ORDER BY...LIMIT...

However, after sub-database and table division, [product information] and [store information] are not in the same database or table, or even on the same server, and cannot be associated with queries through sql statements. We need additional programming to solve this problem.

4.3. Cross-node pagination, sorting and aggregation functions

When querying across nodes and multiple databases, issues such as limit paging, order by sorting, and aggregation functions become more complicated. It is necessary to sort and return the data in different shard nodes first, and then summarize and re-sort the result sets returned by different shards. For example, for the product library after horizontal segregation, sort the pages in reverse order by ID, and take the first page:

​ The above process is to take the data of the first page, which has little impact on performance, but because the distribution of product information in each database may be random, if the Nth page is taken, the data of the first N pages of all nodes needs to be taken out and merged, and then the overall sorting is performed. The operation efficiency can be imagined, so the larger the number of requested pages, the worse the system performance will be.

​ When using functions such as Max, Min, Sum, and Count to perform calculations, similar to sorting and paging, it is also necessary to execute the corresponding function on each shard first, then summarize and recalculate the result sets of each shard, and finally return the result.

4.4. Primary key avoidance

​ In the sub-database and sub-table environment, since the data in the table exists in different databases at the same time, the self-growth of the primary key value usually used will be useless, and the ID generated by a partitioned database cannot be guaranteed to be globally unique. Therefore, global primary keys need to be designed separately to avoid duplication of primary keys across databases.

insert image description here

​ After the sub-database and table are divided, the data is scattered in different servers, databases and tables. Therefore, the operation on the data cannot be completed in a conventional way, and it also brings a series of problems. We need to use some middleware to solve these problems during the development process. There are many middleware on the market for us to choose from, among which Sharding-JDBC and mycat are more popular.

5. Use sub-database and sub-table components to help us solve some problems

The technical solutions for sub-database and sub-table are generally divided into two categories: application layer dependency middleware and middle layer proxy middleware.

When we choose a technical solution, we mainly consider open source, development cost, learning cost, technical complexity, the number of technical users, and the number of reference materials.

Because I have not used every technology myself. So here is just a preliminary understanding and selection within the scope of ability. At present, these components have relatively complete solutions to some major problems of sub-database and sub-table, and the difference is only some details. In addition, combined with the current project location, only lightweight sub-databases and sub-tables are needed. So I still prefer solutions with lower cost and lower complexity.

Currently, mycat and sharding-jdbc are widely used on the market. mycat belongs to the middle layer proxy middleware, and sharding-jdbc belongs to the application layer dependency middleware

5.1. Atlas

Qihoo 360
Keywords: sub-database sub-table Atlas
Baidu finds about 707,000 relevant results for you
Middle-tier proxy middleware

https://github.com/Qihoo360/Atlas
was last maintained on github 4 years ago

  • advantage

    1. Realized the separation of read and write (and through hint/master/ can force access to the main library, and added weight configuration for read load balancing
    2. It maintains a set of connection pools by itself, reducing the performance consumption caused by creating connections
    3. Support DB dynamic online and offline, convenient for horizontal expansion
    4. Support ip filtering, realize simple permission control
    5. Can record all sql, realize simple audit function
  • shortcoming

    1. The performance loss of using atlas is about 30%-35% compared to directly connecting to DB
    2. Using atlas than directly connected to DB, the response time is about 1.5~2 times that of directly connected to DB
    3. The support for table splitting is not very good, and table splitting between different databases is not supported
    4. The atlas configuration does not support dynamic loading of configuration parameters for the time being. If you modify the configuration, you need to restart atlas, which may have a little impact on the business (but generally you can do HA or restart the business during low peak hours, and this problem is not particularly urgent). Business can consider trying to use

5.2. Cobar

Ali
Keywords: sub-database and sub-table Cobar
Baidu finds about 936,000 relevant results for you
Middle-tier proxy middleware

https://github.com/alibaba/cobar/
The last maintenance time on github was 3 years ago

There seem to be many slots, I can’t finish it haha
​​https://www.modb.pro/db/175242

5.3. TDDL

Ali
Keywords: sub-database and sub-table TDDL
Baidu finds about 1,080,000 relevant results for you
Application layer dependent middleware

TDDL on https://github.com/alibaba/tb_tddl
github is stagnant, probably because some functions are not open source.

TDDL must rely on the diamond configuration center (diamond is a system used internally by Taobao to manage persistent configuration, which is currently the configuration of most systems in Taobao)

5.4. heisenberg

Baidu
Keyword: sub-database sub-table heisenberg
Baidu finds about 74,900 relevant results for you
Middle-tier proxy middleware

The data are scarce and are not considered.

5.5. The ocean

58.com
Keyword: sub-database sub-table Oceanus
Baidu finds about 118,000 relevant results for you


The last maintenance time on https://github.com/wuba/Oceanus github was 3 years ago

There are few data, so don't consider it.

5.6. OneProxy

Developed by Lou Fangxin, the former chief architect of Alipay
Key words: sub-database and sub-table OneProxy
Baidu finds about 139,000 related results for you,
it should not be open source

5.7. speed

YouTube
keywords: sub-database and sub-table vitess
Baidu finds about 177,000 relevant results for you
Middle-tier proxy middleware

https://github.com/vitessio/vitess
github is currently active

Vitess is a database solution for deploying, scaling and managing large clusters of MySQL instances. It is open source and has many stars on github, but there are few domestic applications and not much information. The technical structure is complex, and this one is even more important.

5.8. TSharding

Mogujie
Keyword: sub-database sub-table TSharding
Baidu finds about 100,000 related results for you

https://github.com/baihui212/tsharding
The last maintenance time on github was 5 years ago,
so it should not be open source

There are very few data, do not consider

5.9. dal

Ctrip
Keywords: sub-database sub-table dal
Baidu finds about 315,000 related results for you
Application layer dependent middleware

https://github.com/ctripcorp/dal
github is currently active, providing some tutorials and demos (requires scientific Internet access)

The open source scope includes code generators, Java clients and C# clients.
There are few domestic materials, so it is necessary to go online scientifically.

5.10. seemed

Alipay
Keywords: Sub-database and sub-table zdal
Baidu finds about 30,600 relevant results for you
Middle-tier proxy middleware

There are few domestic materials, so it should not be open source.

5.11.MyCat

Based on the cobar community open source
Key words: sub-database and sub-table MyCat
Baidu finds about 9,030,000 related results for you
Middle-level proxy middleware
http://mycatone.top/
The community is currently active and there is no need to scientifically access the Internet
A lot of information is also open source, which can be considered.

5.11.1 Items not supported

  • DDL statement

    • Modification of split keys is not supported
    • Views that support physical libraries are used as ordinary tables
    • Only ordinary tables support foreign keys
    • DML statement
  • DELETE statement

    • Subqueries involving distributed operations are not supported.
    • Multi-table delete is not supported.
  • UPDATE statement

    • Subqueries involving distributed operations are not supported.
    • Multi-table update is not supported.
  • SELECT statement

    • For the update statement, all tables appearing in SQL will be locked.
    • Whether it is a row lock or a table lock depends on the SQL statement.
    • SELECT INTO OUTFILE is not supported.
  • SET statement

    • Variables at the SET SESSION level are supported, but variables cannot be referenced by prepared statements, only autocommit variables have correct semantics
    • Variables at the SET GLOBAL level are not supported
    • SET USER level variables are not supported
  • SHOW statement

    • All SHOW statements are processed as compatible SQL and sent to the prototype node for processing, so they do not have advanced functions of distributed semantics

    • Does not support user-defined data types (change code), custom functions (change code)

    • Physical views are supported, but logical views in Mycat are not supported

    • Limited support for stored procedures

    • Cursors are not supported

    • Triggers are not supported

5.12.Sharding-jdbc

Dangdang is open source and has joined the apache luxury package
Keywords: sub-database and sub-table Sharding-jdbc
Baidu finds about 4,240,000 related results for you
Application layer dependent middleware

https://shardingsphere.apache.org/The
community is currently active, no need to go online scientifically

A lot of information is also open source, you can consider it.
More references written before: Sub-database and sub-table Sharding-JDBC

5.12.1 Items not supported

  • DataSource interface

    • Timeout related operations are not supported.
  • Connection interface

    • Stored procedures, functions, and cursor operations are not supported;
    • Does not support the execution of native SQL;
    • Does not support savepoint related operations;
    • Schema/Catalog operations are not supported;
    • Custom typemaps are not supported.
  • Statement and PreparedStatement interfaces

    • Statements that return multiple result sets (that is, stored procedures, non-SELECT multiple data) are not supported;
    • Operations that do not support internationalized characters.
  • ResultSet interface

    • Does not support the judgment of the result set pointer position;
    • It does not support changing the position of the result pointer through non-next methods;
    • Modification of the result set content is not supported;
    • Does not support obtaining internationalized characters;
    • Array is not supported.
  • JDBC 4.1

    • New features of the JDBC 4.1 interface are not supported.

6. Detailed comparison

main indicators Sharding-jdbc Mycat
ORM support Applicable to any JDBC-based ORM framework, such as: JPA, Hibernate, Mybatis, Spring JDBC Template or use JDBC directly arbitrarily
affairs Comes with XA, two (three) phase transactions, flexible transaction BASE (eventually consistent) XA transaction
Sub-library support support
sub-table support support
to develop It is better to integrate springboot, and the code is intruding (need to write some configuration classes, etc.) Small development cost and small code intrusion
Affiliated company Dangdang open source, join apache Based on the secondary development of Ali Cobar, community maintenance
database support Support any database that implements the JDBC specification, currently supports MySQL, PostgreSQL, Oracle, SQLServer and any database that can be accessed using JDBC Mysql、Oracle、 SQL Server、DB2、mongodb
Activity high activity The community is very active, and some companies are already using
monitor have have
read-write separation support support
material Less information, github, official website, online discussion posts A lot of information, github, official website, Q group, books
operation and maintenance low maintenance cost high maintenance cost
limit Some JDBC methods do not support, SQL statement restrictions SQL statement limit
connection pool Support any third-party database connection pool, such as: DBCP, C3P0, BoneCP, HikariCP, etc. no request
configuration difficulty generally complex

Summarize

The two most used ones were selected for comparison. On the whole, I feel that Sharding-jdbc is more trouble-free. There is no need to deploy middleware, and only the jar package is used to perform sub-database and sub-table operations, saving some things. And if there is one more middleware, the system stability will also be reduced. For now, only light-weight sub-databases and tables are needed, and not many functions are needed, so it is better to choose Sharding-jdbc.

Guess you like

Origin blog.csdn.net/u011397981/article/details/131819748