One article quick start sub-database sub-table (compulsory course)

There were a lot Java of fans who had just entered the pit  before leaving messages, wanting to systematically learn about the technology of sub-database and sub-table, but I have not made up my mind to do it. I am catching up with the company’s project using  sharing-jdbc  the existing  MySQL architecture to transform the sub-database and sub-table. Therefore, taking this opportunity to produce a series of articles on the implementation of sub-databases and sub-tables can be regarded as a summary of my own architecture learning.

I have also read some articles about sub-databases and tables one after another on the Internet, and found that there are a lot of homogenized information on the Internet, and the knowledge points are relatively fragmentary, and there is no detailed actual case. In order to learn more deeply, I bought some paid courses on some platforms. After reading a few courses, I found that those with a little experience are okay, but for novices, it is actually quite difficult to learn.

In order for novices to understand, I may use more space to describe some knowledge points. I hope you don’t be too long-winded. I will PDF open it as a document after the end of this series of articles on sub-databases and tables.  , Can you count one for one! If you find any errors or inaccuracies in the text, you are welcome to communicate and correct.

Before the specific practice of sub-database sub-table, I will go back and review the basic concepts of sub-database sub-table.

What is sub-database sub-table

In fact,  分库 sum  分表 is two concepts, but usually the operations of sub-library and sub-table are carried out at the same time, so that we habitually combine them together as sub-library and sub-table.

The sub-database sub-table is to solve the problem that the database performance continues to decline due to the excessive amount of database and table data. According to certain rules, split a database with a large amount of data into multiple separate databases, and split a table with a large amount of data into several data tables, so that the performance of a single database and table can achieve the best effect (response Fast) to improve overall database performance.

How to sub-database and sub-table

The core concept of sub-database sub-table is to segment the data ( Sharding), and how to quickly locate the data and integrate the query results after the segmentation. The sub-database and sub-table can be divided from two latitudes : 垂直(vertical) and  水平(horizontal).

8b498ac07e096b0a62ec0f9ca0c26fd9.pngSub-database and sub-table


Below we take an example of order-related business to see how to do database, table  垂直 and  水平 segmentation.

Vertical split

Vertical slicing with a  垂直 sub-library and a  垂直sub-table.

1. Vertical sub-library

Vertical library is relatively well understood, it is the core concept of the words: 专库专用.

The tables are classified according to the type of business, and the corresponding tables like orders, payments, coupons, points, etc. are placed in the corresponding database. Developers cannot directly connect to other business databases across databases. If they want other business data, the corresponding business party can provide  API interfaces. This is the initial form of microservices.

The vertical sub-database depends largely on the division of business, but sometimes the division between businesses is not so clear. For example, the division of order data should consider the relationship with other businesses, not directly related to orders The table is as simple as putting it in a library.

To a certain extent, vertical sharding seems to improve some database performance, but in fact it does not solve the performance problem caused by the excessive amount of data in a single table, so it needs to be solved with horizontal sharding.

9ddb4da72aa56cb0d4064cdbb5d4df0e.pngVertical sub-library

2. Vertical sub-table

垂直分表It is segmented based on the columns (fields) of the data table, which is a mode of splitting large tables into small tables.

For example: an  order order table, separate frequently accessed fields such as order amount, order number, etc. into a single table,  blob and split large fields such as type or infrequently accessed fields to create a separate extended table  work_extend . The table only stores part of the fields of the original table, and then the split tables are distributed to different databases.

f5a0596c8cc5b9719cb579aea15baad2.pngVertical sub-table

We know that the database loads data into memory in units of rows. After splitting, most of the core tables are fields with higher access frequency, and the field lengths are also shorter, so more data can be loaded into memory to increase queries. Hit rate, reduce disk IO, in order to improve database performance.

Advantages of vertical segmentation :

  • Data decoupling between businesses, independent maintenance, monitoring, and expansion of data of different businesses.

  • In high concurrency scenarios, the pressure on the database is relieved to a certain extent.

Disadvantages of vertical segmentation :

  • Increased the complexity of development. Due to the isolation of services, many tables cannot be accessed directly, and data must be aggregated through interfaces.

  • Increased difficulty in distributed transaction management.

  • The database still has the problem of excessive data volume in a single table, which has not been fundamentally solved, and needs to cooperate with horizontal segmentation.


Horizontal split

As mentioned earlier, vertical segmentation still has the problem of excessive single database and table data volume. When our application can no longer be fine-grained vertical segmentation, there are still single-database read/write and storage performance bottlenecks. Together with horizontal segmentation, horizontal segmentation can greatly improve database performance.

1. Horizontal sub-library

Horizontal database splitting is to split the same table into different databases according to certain rules. Each database can be located on a different server to achieve horizontal expansion. It is a common way to improve database performance.

This solution can often solve the problem of single-database storage and performance bottlenecks, but because the same table is allocated in different databases, data access requires additional routing work, so the complexity of the system is also increased.

For example, FIG, 订单DB_1, 订单DB_1, 订单DB_3 there are three identical table in the database  order, we can access through a single order of the order modulo the order number  订单编号 mod 3 (数据库实例数) , specifies the order in which the database should be operated.

1b78b2849bf85cc64a39e9c6d0ae16f7.pngHorizontal sub-library

2. Level score table

The horizontal sub-table is to divide a large data table into multiple tables with the same structure in the same database according to certain rules, and each table only stores part of the original table data.

For example: an  order order form 9 million data through horizontal split up three tables, order_1, , order_2, order_3each table contains data 3 million, and so on.

db2b663eec4a744a707a4643919d5bf0.pngLevel score table

Although the horizontal table splits the table, the sub-tables are still in the same database instance. It only solves the problem of excessive data volume in a single table, and does not disperse the split tables to different machines. Compete for the CPU, memory, network IO, etc. of the same physical machine. If you want to further improve performance, you need to distribute the split tables to different databases to achieve a distributed effect.

3da180981b54338cf404011c5824619a.pngSub-database and sub-table

Advantages of horizontal segmentation:

  • Solve the problem of excessive data volume in a single database under high concurrency, and improve system stability and load capacity.

  • 业务系统改造的工作量不是很大。

水平切分的缺点:

  • 跨分片的事务一致性难以保证。

  • 跨库的join关联查询性能较差。

  • 扩容的难度和维护量较大,(拆分成几千张子表想想都恐怖)。


一定规则是什么

我们上边提到过很多次 一定规则 ,这个规则其实是一种路由算法,就是决定一条数据具体应该存在哪个数据库的哪张表里。

常见的有 取模算法 和 范围限定算法

1、取模算法

按字段取模(对hash结果取余数 (hash() mod N),N为数据库实例数或子表数量)是最为常见的一种切分方式。

还拿 order 订单表举例,先对数据库从 0 到 N-1进行编号,对 order 订单表中 work_no 订单编号字段进行取模,得到余数 ii=0存第一个库,i=1存第二个库,i=2存第三个库....以此类推。

这样同一笔订单的数据都会存在同一个库、表里,查询时用相同的规则,用 work_no 订单编号作为查询条件,就能快速的定位到数据。

优点:

  • 数据分片相对比较均匀,不易出现请求都打到一个库上的情况。

缺点:

  • 这种算法存在一些问题,当某一台机器宕机,本应该落在该数据库的请求就无法得到正确的处理,这时宕掉的实例会被踢出集群,此时算法变成hash(userId) mod N-1,用户信息可能就不再在同一个库中了。

2、范围限定算法

按照 时间区间 或 ID区间 来切分,比如:我们切分的是用户表,可以定义每个库的 User 表里只存10000条数据,第一个库只存 userId 从1 ~ 9999的数据,第二个库存 userId 为10000 ~ 20000,第三个库存 userId 为 20001~ 30000......以此类推,按时间范围也是同理。

优点:

  • 单表数据量是可控的

  • 水平扩展简单只需增加节点即可,无需对其他分片的数据进行迁移

  • 能快速定位要查询的数据在哪个库

缺点:

  • 由于连续分片可能存在数据热点,比如按时间字段分片,可能某一段时间内订单骤增,可能会被频繁的读写,而有些分片存储的历史数据,则很少被查询。

分库分表的难点

1、分布式事务

由于表分布在不同库中,不可避免会带来跨库事务问题。一般可使用 "三阶段提交 "和 "两阶段提交" 处理,但是这种方式性能较差,代码开发量也比较大。通常做法是做到最终一致性的方案,如果不苛求系统的实时一致性,只要在允许的时间段内达到最终一致性即可,采用事务补偿的方式。

这里我应用阿里的分布式事务框架Seata 来做分布式事务的管理,后边会结合实际案例。

2、分页、排序、跨库联合查询

分页、排序、联合查询是开发中使用频率非常高的功能,但在分库分表后,这些看似普通的操作却是让人非常头疼的问题。将分散在不同库中表的数据查询出来,再将所有结果进行汇总整理后提供给用户。

3、分布式主键

分库分表后数据库的自增主键意义就不大了,因为我们不能依靠单个数据库实例上的自增主键来实现不同数据库之间的全局唯一主键,此时一个能够生成全局唯一ID的系统是非常必要的,那么这个全局唯一ID就叫 分布式ID

4、读写分离

不难发现大部分主流的关系型数据库都提供了主从架构的高可用方案,而我们需要实现 读写分离 + 分库分表,读库与写库都要做分库分表处理,后边会有具体实战案例。

5、数据脱敏

数据脱敏,是指对某些敏感信息通过脱敏规则进行数据转换,从而实现敏感隐私数据的可靠保护,如身份证号、手机号、卡号、账号密码等个人信息,一般这些都需要进行做脱敏处理。

分库分表工具

I still said, try not to make your own wheels, because your own wheels may not be so round. The industry already has many mature sub-database and sub-table middleware. We choose according to our own business needs and put more energy Put it on business realization.

  • sharding-jdbc(Dangdang)
  • TSharding(mushroom Street)
  • Atlas(Qihoo 360)
  • Cobar(Alibaba)
  • MyCAT(Based on Cobar)
  • Oceanus(58 same city)
  • Vitess(Google)

Why choose sharding-jdbc

sharding-jdbc It is a lightweight  Java framework that jar provides services in  package form. It is a client product that does not require additional deployment. It is equivalent to an enhanced version of the  JDBC driver; in contrast,  Mycat such server products that require separate deployment services, It's a little more complicated. Besides, I want to focus more on realizing the business and don't want to do additional operation and maintenance work.

  • sharding-jdbcCompatibility is also very strong, based for any  JDBC of  ORM the frame, such as: JPAHibernate, Mybatis, Spring JDBC Template or used directly  JDBC.
  • Perfectly compatible with any third-party database connection pool, such as: DBCPC3P0BoneCP, DruidHikariCP and so on, almost all support for all relational databases.

It is not difficult to find that it is indeed a relatively powerful tool, and it is very intrusive to the project. There is almost no need to make any code layer modification, and no need to modify the  SQL statement, just configure the data table to be divided into the database and table.

to sum up

Briefly review the basic knowledge of sub-library and sub-table. The following article will introduce sharding-jdbcthe various function points in sub-library and sub-table in cooperation with actual projects  .


Guess you like

Origin blog.51cto.com/14989525/2547190