The key steps and technical difficulties of level sub-database sub-table

In the previous article , I introduced several manifestations and gameplay of sub-library and sub-table, and also focused on the problems and solutions brought by vertical sub-library. In this article, we will continue to talk about some skills of level sub-database sub-table.

The origin of sharding technology

The relational database itself is relatively easy to become a system performance bottleneck, and the storage capacity, number of connections, and processing capacity of a single machine are limited. The "statefulness" of the database itself makes it not as easy to expand as the Web and application servers. Under the test of massive data and high concurrent access in the Internet industry, smart technicians proposed the technique of sub-database and table (also called sharding, sharding in some places). At the same time, popular distributed system middleware (such as MongoDB, ElasticSearch, etc.) all support Sharding friendly, and their principles and ideas are similar.

Distributed globally unique ID

In many small and medium-sized projects, we often directly use the database auto-increment feature to generate the primary key ID, which is really simple. In the environment of sub-database and sub-table, the data is distributed on different shards and cannot be directly generated by the self-growth feature of the database, otherwise the primary keys of the data tables on different shards will be duplicated. A brief introduction to several ID generation algorithms used and understood.

 

  1. Twitter's Snowflake (aka "Snowflake Algorithm")
  2. UUID/GUID (supported by general applications and databases)
  3. MongoDB ObjectID (similar to UUID)
  4. Ticket Server (database survival method, Flickr uses this method)

Among them, Twitter's Snowflake algorithm is the most used by the author in distributed system projects in recent years, and no duplication or concurrency problems have been found. The algorithm generates a 64-bit unique Id (composed of 41-bit timestamp + 10-bit custom machine code + 13-bit accumulation counter). I won’t introduce too much here, and interested readers can consult relevant information.


Common sharding rules and strategies


How to choose the sharding field

Before starting sharding, we must first determine the sharding field (also known as the "shard key"). In many common examples and scenarios, ID or time fields are used for splitting. This is not absolute. My suggestion is to combine the actual business, through statistical analysis of the SQL statements executed in the system, and select the most frequently used or most important field in the table that needs to be fragmented. Field.

Common sharding rules

Common fragmentation strategies include random fragmentation and continuous fragmentation, as shown in the following figure:

When you need to use the shard field for range search, continuous shards can quickly locate the shards for efficient query, and in most cases can effectively avoid the problem of cross-shard query. In the later stage, if you want to expand the capacity of the entire shard cluster, you only need to add nodes, without the need to migrate the data of other shards. However, continuous sharding may also have the problem of data hotspots. Just like the example of slicing by time field in the figure, some nodes may be subject to frequent query pressure, and hot data nodes become the bottleneck of the entire cluster. And some nodes may store historical data and rarely need to be queried.

Random sharding is not random, and it also follows certain rules. Usually, we will use the Hash modulo method to split the fragments, so sometimes it is also called discrete fragmentation. Randomly sharded data is relatively uniform, and it is not prone to bottlenecks of hot spots and concurrent access. However, the later expansion of the sharded cluster requires the migration of old data. Using consistent Hash algorithm can avoid this problem to a large extent, so many shard clusters of middleware will use consistent Hash algorithm. Discrete shards are also easy to face the complex problems of cross-shard queries.

Data migration, capacity planning, expansion and other issues

Few projects will start to consider sharding design in the early stage, and they are generally prepared in advance when the rapid development of the business faces performance and storage bottlenecks. Therefore, it is inevitable to consider the issue of historical data migration. The general approach is to read the historical data first through the program, and then write the data to each sharding node according to the specified sharding rules.

In addition, we need to plan the capacity based on the current data volume and QPS, and integrate cost factors to calculate the approximate number of shards needed (generally, it is recommended that the data volume of a single table on a single shard does not exceed 1000W).

If you use random sharding, you need to consider later expansion issues, which will be relatively troublesome. If it is a range slicing, you only need to add nodes to automatically expand.

Cross-sharding technical issues

Sort paging across shards

Generally speaking, paging needs to be sorted according to the specified field. When the sort field is a fragment field, we can easily locate the specified fragment through fragmentation rules. When the sort field is not a fragment field, the situation becomes more complicated. For the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result sets returned by different shards, and finally return to the user. As shown below:

上面图中所描述的只是最简单的一种情况(取第一页数据),看起来对性能的影响并不大。但是,如果想取出第10页数据,情况又将变得复杂很多,如下图所示:

有些读者可能并不太理解,为什么不能像获取第一页数据那样简单处理(排序取出前10条再合并、排序)。其实并不难理解,因为各分片节点中的数据可能是随机的,为了排序的准确性,必须把所有分片节点的前N页数据都排序好后做合并,最后再进行整体的排序。很显然,这样的操作是比较消耗资源的,用户越往后翻页,系统性能将会越差。

跨分片的函数处理

在使用Max、Min、Sum、Count之类的函数进行统计和计算的时候,需要先在每个分片数据源上执行相应的函数处理,然后再将各个结果集进行二次处理,最终再将处理结果返回。如下图所示:

跨分片join

Join是关系型数据库中最常用的特性,但是在分片集群中,join也变得非常复杂。应该尽量避免跨分片的join查询(这种场景,比上面的跨分片分页更加复杂,而且对性能的影响很大)。通常有以下几种方式来避免:

全局表

全局表的概念之前在“垂直分库”时提过。基本思想一致,就是把一些类似数据字典又可能会产生join查询的表信息放到各分片中,从而避免跨分片的join。

ER分片

在关系型数据库中,表之间往往存在一些关联的关系。如果我们可以先确定好关联关系,并将那些存在关联关系的表记录存放在同一个分片上,那么就能很好的避免跨分片join问题。在一对多关系的情况下,我们通常会选择按照数据较多的那一方进行拆分。如下图所示:

这样一来,Data Node1上面的订单表与订单详细表就可以直接关联,进行局部的join查询了,Data Node2上也一样。基于ER分片的这种方式,能够有效避免大多数业务场景中的跨分片join问题。

内存计算

随着spark内存计算的兴起,理论上来讲,很多跨数据源的操作问题看起来似乎都能够得到解决。可以将数据丢给spark集群进行内存计算,最后将计算结果返回。

跨分片事务问题

跨分片事务也分布式事务,想要了解分布式事务,就需要了解“XA接口”和“两阶段提交”。值得提到的是,MySQL5.5x和5.6x中的xa支持是存在问题的,会导致主从数据不一致。直到5.7x版本中才得到修复。Java应用程序可以采用Atomikos框架来实现XA事务(J2EE中JTA)。感兴趣的读者可以自行参考《分布式事务一致性解决方案》,链接地址:

http://www.infoq.com/cn/articles/solution-of-distributed-system-transaction-consistency

我们的系统真的需要分库分表吗

读完上面内容,不禁引起有些读者的思考,我们的系统是否需要分库分表吗?

其实这点没有明确的判断标准,比较依赖实际业务情况和经验判断。依照笔者个人的经验,一般MySQL单表1000W左右的数据是没有问题的(前提是应用系统和数据库等层面设计和优化的比较好)。当然,除了考虑当前的数据量和性能情况时,作为架构师,我们需要提前考虑系统半年到一年左右的业务增长情况,对数据库服务器的QPS、连接数、容量等做合理评估和规划,并提前做好相应的准备工作。如果单机无法满足,且很难再从其他方面优化,那么说明是需要考虑分片的。这种情况可以先去掉数据库中自增ID,为分片和后面的数据迁移工作提前做准备。

很多人觉得“分库分表”是宜早不宜迟,应该尽早进行,因为担心越往后公司业务发展越快、系统越来越复杂、系统重构和扩展越困难…这种话听起来是有那么一点道理,但我的观点恰好相反,对于关系型数据库来讲,我认为“能不分片就别分片”,除非是系统真正需要,因为数据库分片并非低成本或者免费的。

这里笔者推荐一个比较靠谱的过渡技术–“表分区”。主流的关系型数据库中基本都支持。不同的分区在逻辑上仍是一张表,但是物理上却是分开的,能在一定程度上提高查询性能,而且对应用程序透明,无需修改任何代码。笔者曾经负责优化过一个系统,主业务表有大约8000W左右的数据,考虑到成本问题,当时就是采用“表分区”来做的,效果比较明显,且系统运行的很稳定。

小结

最后,有很多读者都想了解当前社区中有没有开源免费的分库分表解决方案,毕竟站在巨人的肩膀上能省力很多。当前主要有两类解决方案:

  1. 基于应用程序层面的DDAL(分布式数据库访问层) 

    比较典型的就是淘宝半开源的TDDL,当当网开源的Sharding-JDBC等。分布式数据访问层无需硬件投入,技术能力较强的大公司通常会选择自研或参照开源框架进行二次开发和定制。对应用程序的侵入性一般较大,会增加技术成本和复杂度。通常仅支持特定编程语言平台(Java平台的居多),或者仅支持特定的数据库和特定数据访问框架技术(一般支持MySQL数据库,JDBC、MyBatis、Hibernate等框架技术)。

  2. 数据库中间件,比较典型的像mycat(在阿里开源的cobar基础上做了很多优化和改进,属于后起之秀,也支持很多新特性),基于Go语言实现kingSharding,比较老牌的Atlas(由360开源)等。这些中间件在互联网企业中大量被使用。另外,MySQL 5.x企业版中官方提供的Fabric组件也号称支持分片技术,不过国内使用的企业较少。 

    中间件也可以称为“透明网关”,大名鼎鼎的mysql_proxy大概是该领域的鼻祖(由MySQL官方提供,仅限于实现“读写分离”)。中间件一般实现了特定数据库的网络通信协议,模拟一个真实的数据库服务,屏蔽了后端真实的Server,应用程序通常直接连接中间件即可。而在执行SQL操作时,中间件会按照预先定义分片规则,对SQL语句进行解析、路由,并对结果集做二次计算再最终返回。引入数据库中间件的技术成本更低,对应用程序来讲侵入性几乎没有,可以满足大部分的业务。增加了额外的硬件投入和运维成本,同时,中间件自身也存在性能瓶颈和单点故障问题,需要能够保证中间件自身的高可用、可扩展。

总之,不管是使用分布式数据访问层还是数据库中间件,都会带来一定的成本和复杂度,也会有一定的性能影响。所以,还需读者根据实际情况和业务发展需要慎重考虑和选择。

作者介绍

丁浪,技术架构师。关注高并发、高可用的架构设计,对系统服务化、分库分表、性能调优等方面有深入研究和丰富实践经验。热衷于技术研究和分享。

Guess you like

Origin blog.csdn.net/dinglang_2009/article/details/53195871