Sub-library sub-table (1) --- theory

Sub-library sub-table theory ---

When a table of data to reach tens of millions of queries once spent time becomes longer. A recognized capacity single table MySQL 10 million or less optimal state, because then it BTREE index tree height between 3 and 5.

Data segmentation can be divided into: 垂直切分and 水平切分.

A vertical segmentation

Vertical slicing can be divided into: 垂直分库and 垂直分表.

1, the vertical sub-libraries

概念The service coupling is, in the different databases low degree of association stored in different tables. Similar approach with large-scale systems split into multiple smaller systems, an independent division by business classification. And the practice of "micro-management service" is similar

Each micro service uses a separate database.

Figure:

说明

We are beginning a single service, so there is only one database, all the tables in the library.

Later, due to business needs, the service becomes a single micro-management services. So the library before a commodity, split into multiple databases. Each micro service for a database.

2, the vertical sub-table

概念The plurality of fields in a table are split into multiple tables, the general field split by cold, hot fields in a table, a table cold field. So as to enhance database performance.

Figure:

说明

All fields beginning merchandise table contains the goods, but we found:

1.商品详情和商品属性字段较长2.商品列表的时候我们是不需要显示商品详情和商品属性信息,只有在点进商品商品的时候才会展示商品详情信息

It could be considered a separate product listings and product attributes slicing a table to improve query efficiency.

3, the vertical slicing disadvantages

优点

- 解决业务系统层面的耦合,业务清晰
- 与微服务的治理类似,也能对不同业务的数据进行分级管理、维护、监控、扩展等
- 高并发场景下,垂直切分一定程度的提升IO、数据库连接数、单机硬件资源的瓶颈

缺点

- 分库后无法Join,只能通过接口聚合方式解决,提升了开发的复杂度
- 分库后分布式事务处理复杂
- 依然存在单表数据量过大的问题(需要水平切分)


Second, the level of segmentation

When an application is difficult to re-grained or vertical slicing huge amount of data the number of rows after slicing, there is a single write library storage performance bottleneck, this time on the need for the horizontal segmentation.

The level of segmentation can be divided into: 水平分库and 水平分表.

1, the level of sub-libraries

水平分库的原因

Although the above product library is divided into three libraries, but with the increase in orders for a business of QPS library also appeared too high, too late to speed response database, mysql single general also about the 1000 QPS, if more than 1000 to be considered sub-libraries.

Figure

2, the horizontal sub-table

概念 A table of the data we generally do not exceed 10 million, more than 10 million if the table data, and the data is still growing, it can be considered sub-table.

Figure

3, the vertical slicing disadvantages

优点

- 不存在单库数据量过大、高并发的性能瓶颈,提升系统稳定性和负载能力
- 应用端改造较小,不需要拆分业务模块

缺点

- 跨分片的事务一致性难以保证
- 跨库的Join关联查询性能较差
- 数据多次扩展难度和维护量极大


Third, the data fragmentation rules

We consider that we go to level segmentation table, a table horizontally segmented into multiple tables, which relates to the rules of data fragmentation, the more common are: Hash取模分表, 数值Range分表, 一致性Hash算法分表.

1, Hash table modulo division

概念Hash commonly used segmentation modulo manner, for example: Assuming goods_id Table 4 minutes. (Goods_id% 4 rounded determination table)

优点

 - 数据分片相对比较均匀,不容易出现热点和并发访问的瓶颈。

缺点

- 后期分片集群扩容时,需要迁移旧的数据很难。
- 容易面临跨分片查询的复杂问题。比如上例中,如果频繁用到的查询条件中不带goods_id时,将会导致无法定位数据库,从而需要同时向4个库发起查询,
再在内存中合并数据,取最小集返回给应用,分库反而成为拖累。

2, sub-table values ​​Range

概念According to the time interval or intervals ID segmentation. For example: a recording goods_id 1 to 1000 assigned to the first table, of 1001 to 2000 assigned to the second table, and so on.

Figure

优点

- 单表大小可控
- 天然便于水平扩展,后期如果想对整个分片集群扩容时,只需要添加节点即可,无需对其他分片的数据进行迁移
- 使用分片字段进行范围查找时,连续分片可快速定位分片进行快速查询,有效避免跨分片查询的问题。

缺点

- 热点数据成为性能瓶颈。
例如按时间字段分片,有些分片存储最近时间段内的数据,可能会被频繁的读写,而有些分片存储的历史数据,则很少被查询

3, consistent Hash algorithm

Hash algorithms can be very good consistency 解决因为Hash取模而产生的分片集群扩容时,需要迁移旧的数据的难题. As for the specific principle can not say here in detail,

You can refer to a blog: consistent hashing algorithm (sub-library sub-table, load balancing, etc.)


Fourth, the sub-library sub-table problems caused by

Everything has two sides, the sub-library sub-table is no exception, if a sub-library sub-table, will introduce new problems

1, distributed transaction issues

The use of middleware solutions for distributed transactions, in particular by strong consistency or eventual consistency distributed transactions, see the business needs, do not say here.

2, cross-node problems associated with the query Join

Before segmentation, we can be done by Join. But after segmentation, data may be distributed over different nodes, then the problem Join brought more trouble, taking into account performance, try to avoid using Join query.

Some of the ways to solve this problem:

全局表

Global table can also be seen as a " data dictionary tables ", the table is that some systems may rely on all modules, in order to avoid cross-database Join query, you can be such tables in each database can save a copy . These data are usually

Rarely make changes, so do not worry about the issue of consistency.

字段冗余

Use space for time, in order to avoid the join query performance . For example: Orders table userId save time, will also keep a userName redundancy, so check your order details when you do not need to go to the query "buyer user table" was.

数据组装

At the system level, twice the query . Results of the first set of the query to find related data id, then initiate a second request is based on the id associated data. The resulting data will be given the final field assembled.

3, across nodes paging, sorting, function problems

When cross-node multi-database query, Limit pagination, Order by sorting problems. Paging need to be sorted according to a specified field, when the sort field is fragmentation field, it is easier to locate specific fragments by the fragmentation rules;

When the non-fragmented field sort fields, it becomes more complicated. Will need to make a different slice in the sort and return to the node data, then returns the result set of different fragments aggregated and sorted again, ultimately returned to the user.

4, avoid heavy primary key global issues

If you are using 主键自增certainly unreasonable, if UUIDit can not be done according to the primary key order, so we can consider adopting 雪花IDas a primary key database,

Written before about the snowflake ID can refer to my blog: static inner classes implement singleton pattern snowflake algorithm

5, data migration issues

Use 双写的方式, modify the code, all related to the increased sub-library table of the sub-table, delete, change the code, the new library should be additions and deletions. At the same time, then there is a data extraction service, continue to draw data from the old database to the new library to write,

While writing time comparison data is not up to date.


reference

1, sub-library sub-table

2, talk about the sub-library sub-table, right?

3, the database sub-library sub-table ideas




 我相信,无论今后的道路多么坎坷,只要抓住今天,迟早会在奋斗中尝到人生的甘甜。抓住人生中的一分一秒,胜过虚度中的一月一年!(16)


Guess you like

Origin www.cnblogs.com/qdhxhz/p/11608222.html