Recently I have communicated with peers in science and technology, and I am often asked how to choose between database and table and distributed database. There are also many articles on middleware + traditional relational database (sub-database and table) and NewSQL distributed database, but there are some opinions and judgments. I think it is extreme. It is not fair to evaluate the plan outside the environment.

This article compares the key characteristics of the two modes, hoping to clarify their true advantages and disadvantages and applicable scenarios as objectively and neutrally as possible.

Where is NewSQL database advanced?

First of all, on whether "middleware + relational database sub-database sub-table" is considered a NewSQL distributed database problem, there is a foreign paper pavlo-newsql-sigmodrec, if according to the classification in this article, Spanner, TiDB, OB can be regarded as the first new architecture Type, Sharding-Sphere, Mycat, DRDS and other middleware solutions are considered to be the second type (there is also a third type of cloud database in the article, this article will not introduce it in detail). Is it a distributed architecture based on middleware (including SDK and Proxy) + traditional relational database (sub-database and sub-table) model? I think so, because storage is indeed distributed, and horizontal expansion can also be achieved. But not a "pseudo" distributed database? From the perspective of the advanced nature of the architecture, there is some truth to this. "Pseudo" is mainly reflected in the repeated SQL parsing and execution plan generation between the middleware layer and the underlying DB, and the storage engine is based on B+Tree, which is actually redundant and inefficient in the distributed database architecture. In order to avoid the war of words that caused the authenticity of distributed databases, the NewSQL database in this article specifically refers to this new architecture NewSQL database.

Compared with middleware + sub-database sub-table, NewSQL database is advanced? Draw a simple architecture comparison chart:

27e87b4b42794ac49124558bb29d75e0

Traditional databases are disk-oriented, and memory-based storage management and concurrency control are not as efficient as the NewSQL database.
The middleware mode SQL analysis, execution plan optimization, etc. repetitive work in the middleware and the database, the efficiency is relatively low;
Compared with XA, the distributed transaction of NewSQL database is optimized and has higher performance;
The NewSQL database storage design of the new architecture is based on multiple copies of the paxos (or Raft) protocol. Compared with the traditional master-slave mode of the database (there is also the problem of losing counts after semi-synchronous to asynchronous), it achieves real high availability and high reliability. (RTO<30s, RPO=0)
NewSQL database naturally supports data sharding, and data migration and expansion are automated, which greatly reduces the DBA's work and is transparent to the application. There is no need to specify the database and table keys in SQL.

Most of these are also the main propaganda points of NewSQL database products, but are these beautiful features really so? Next, I will elaborate on my understanding of the above points.

Distributed transaction

This is a double-edged sword.

CAP restrictions

Think about why the NoSQL database that appeared earlier does not support distributed transactions (the latest version of mongoDB, etc. also start to support), is it lack of theoretical and practical support? No, the reason is that the CAP theorem is still a curse on the head of distributed databases. It will inevitably sacrifice availability A or partition tolerance P while ensuring strong consistency. Why don't most NoSQL provide distributed transactions?

So does the NewSQL database break the limit of the CAP theorem? not at all. Google Spanner, the nose master of NewSQL database (most distributed databases are currently designed according to the Spanner architecture) provides consistency and availability greater than 5 9s, claiming to be a "actually CA", its true meaning The probability that the system is in the CA state is very high. The probability of service outage due to network partition is very small. The real reason is that it has built a private global network to ensure that there will be no network partitions caused by network interruptions. In addition, it is efficient Operation and maintenance team, this is also the selling point of cloud spanner. For details, see "Spanner, TrueTime and CAP Theory" by Eric Brewer, the creator of CAP.

Recommend an interesting article about distributed systems. Standing on the distributed shoulders of giants, it mentions: In a distributed system, you can know where the work is, or you can know when the work is completed, but you cannot understand at the same time Both; the two-phase agreement is essentially an anti-availability agreement.

Completeness:

Does the two-phase commit protocol strictly support ACID, and can all abnormal scenarios be covered? 2PC sends an exception during the commit phase. In fact, similar to the best-effort one-phase commit, there will be some visible problems. Strictly speaking, the atomicity of A and the consistency of C cannot be guaranteed for a period of time (the recovery mechanism can guarantee the final A and C). Complete distributed transaction support is not a simple matter. It needs to be able to deal with various abnormalities of the network and various hardware including network cards, disks, CPU, memory, and power supplies, and pass strict tests. Before communicating with a friend, they even said that the currently known NewSQL is incomplete in distributed transaction support. They all have cases where they can’t run. People in the circle are so sure that it also illustrates the completeness of distributed transaction support. In fact, the levels are uneven.

But distributed transactions are a very important underlying mechanism of these NewSQL databases. Cross-resource DML, DDL, etc. are all dependent on its implementation. If the performance and completeness of this block are compromised, the correctness of the upper-level cross-shard SQL execution will be affected. Great influence.

performance

Traditional relational databases also support distributed transaction XA, but why is it rarely used in high concurrency scenarios? Because XA's basic two-phase commit protocol has problems such as high network overhead, long blocking time, deadlock, etc., this also causes it to be rarely used in large-scale OLTP systems based on traditional relational databases. The distributed transaction implementation of NewSQL database is still mostly based on the two-phase commit protocol. For example, the google percolator distributed transaction model uses atomic clock + MVCC + Snapshot Isolation (SI). This method ensures global consistency through TSO (Timestamp Oracle). MVCC avoids locks, and converts part of the commit to asynchronous through primary lock and secondary lock. Compared with XA, it does improve the performance of distributed transactions.

SI is optimistic locking. In hot data scenarios, a large number of submission failures may occur. In addition, the isolation level of SI is not exactly the same as that of RR. It will not read fantasy, but will have write skew.

However, no matter how optimized, compared to 1PC, 2PC's extra GID acquisition, network overhead, and prepare log persistence will still bring great performance losses, especially when the number of cross-nodes is relatively large, such as in banking scenarios To do a batch deduction, one file may have W accounts. No matter how you do this, the throughput will not be high.

Distributed transaction test data given by Spanner

1c691ebdbaa247d8a453751fe79577d1

Although NewSQL distributed database products are fully advertised to support distributed transactions, this does not mean that applications can be completely free of data splitting. The best practices of these databases will still be written, and most of the application scenarios should avoid distributed transactions as much as possible. Affairs.

Since the performance cost of strongly consistent transactions is too great, we can reflect on whether such strongly consistent distributed transactions are really needed? Especially after the microservices are split, many systems are unlikely to be placed in a unified database. Trying to weaken the consistency requirements is a flexible transaction, abandoning ACID (Atomicity, Consistency, Isolation, Durability), and switching to BASE (Basically Available, Soft state, Eventually consistent), such as Saga, TCC, reliable message guarantee final consistency and other models, For large-scale and high-concurrency OLTP scenarios, I personally recommend flexible transactions rather than strongly consistent distributed transactions. Regarding flexible transactions, the author has also written about a technical component before, and some new models and frameworks have emerged in recent years (for example, Fescar, which has just been open sourced by Alibaba). Due to space limitations, I will not repeat them, and I will write a separate article when I have time.

Can only two-phase commit protocol be used to solve distributed transactions? The idea of avoiding distributed transactions through updateserver in oceanbase1.0 is very enlightening, but after version 2.0 it has become 2PC. Distributed transactions in the industry are not only the solution of two-phase submission, there are also other solutions its-time-to-move-on-from-two-phase (if it cannot be opened, there is a translation version in China https://www.jdon. com/51588)

HA and live more in different places

主从模式并不是最优的方式，就算是半同步复制，在极端情况下（半同步转异步）也存在丢数问题，目前业界公认更好的方案是基于paxos分布式一致性协议或者其它类paxos如raft方式，Google Spanner、TiDB、cockcoachDB、OB都采用了这种方式，基于Paxos协议的多副本存储，遵循过半写原则，支持自动选主，解决了数据的高可靠，缩短了failover时间，提高了可用性，特别是减少了运维的工作量，这种方案技术上已经很成熟，也是NewSQL数据库底层的标配。当然这种方式其实也可以用在传统关系数据库，阿里、微信团队等也有将MySQL存储改造支持paxos多副本的，MySQL也推出了官方版MySQL Group Cluster，预计不远的未来主从模式可能就成为历史了。

分布式一致性算法本身并不难，但具体在工程实践时，需要考虑很多异常并做很多优化，实现一个生产级可靠成熟的一致性协议并不容易。例如实际使用时必须转化实现为multi-paxos或multi-raft，需要通过batch、异步等方式减少网络、磁盘IO等开销。

需要注意的是很多NewSQL数据库厂商宣传基于paxos或raft协议可以实现【异地多活】，这个实际上是有前提的，那就是异地之间网络延迟不能太高。以银行“两地三中心”为例，异地之间多相隔数千里，延时达到数十毫秒，如果要多活，那便需异地副本也参与数据库日志过半确认，这样高的延时几乎没有OLTP系统可以接受的。

**数据库层面做异地多活是个美好的愿景，但距离导致的延时目前并没有好的方案。**之前跟蚂蚁团队交流，蚂蚁异地多活的方案是在应用层通过MQ同步双写交易信息，异地DC将交易信息保存在分布式缓存中，一旦发生异地切换，数据库同步中间件会告之数据延迟时间，应用从缓存中读取交易信息，将这段时间内涉及到的业务对象例如用户、账户进行黑名单管理，等数据同步追上之后再将这些业务对象从黑名单中剔除。由于双写的不是所有数据库操作日志而只是交易信息，数据延迟只影响一段时间内数据，这是目前我觉得比较靠谱的异地度多活方案。

另外有些系统进行了单元化改造，这在paxos选主时也要结合考虑进去，这也是目前很多NewSQL数据库欠缺的功能。

Scale横向扩展与分片机制

paxos算法解决了高可用、高可靠问题，并没有解决Scale横向扩展的问题，所以分片是必须支持的。NewSQL数据库都是天生内置分片机制的，而且会根据每个分片的数据负载(磁盘使用率、写入速度等)自动识别热点，然后进行分片的分裂、数据迁移、合并，这些过程应用是无感知的，这省去了DBA的很多运维工作量。以TiDB为例，它将数据切成region，如果region到64M时，数据自动进行迁移。

分库分表模式下需要应用设计之初就要明确各表的拆分键、拆分方式（range、取模、一致性哈希或者自定义路由表）、路由规则、拆分库表数量、扩容方式等。相比NewSQL数据库，这种模式给应用带来了很大侵入和复杂度，这对大多数系统来说也是一大挑战。

分库分表模式也能做到在线扩容，基本思路是通过异步复制先追加数据，然后设置只读完成路由切换，最后放开写操作，当然这些需要中间件与数据库端配合一起才能完成。

**这里有个问题是NewSQL数据库统一的内置分片策略（例如tidb基于range）可能并不是最高效的，因为与领域模型中的划分要素并不一致，这导致的后果是很多交易会产生分布式事务。**举个例子，银行核心业务系统是以客户为维度，也就是说客户表、该客户的账户表、流水表在绝大部分场景下是一起写的，但如果按照各表主键range进行分片，这个交易并不能在一个分片上完成，这在高频OLTP系统中会带来性能问题。

分布式SQL支持

常见的单分片SQL，这两者都能很好支持。NewSQL数据库由于定位与目标是一个通用的数据库，所以支持的SQL会更完整，包括跨分片的join、聚合等复杂SQL。中间件模式多面向应用需求设计，不过大部分也支持带拆分键SQL、库表遍历、单库join、聚合、排序、分页等。但对跨库的join以及聚合支持就不够了。NewSQL数据库一般并不支持存储过程、视图、外键等功能，而中间件模式底层就是传统关系数据库，这些功能如果只是涉及单库是比较容易支持的。NewSQL数据库往往选择兼容MySQL或者PostgreSQL协议，所以SQL支持仅局限于这两种，中间件例如驱动模式往往只需做简单的SQL解析、计算路由、SQL重写，所以可以支持更多种类的数据库SQL。

SQL支持的差异主要在于分布式SQL执行计划生成器，由于NewSQL数据库具有底层数据的分布、统计信息，因此可以做CBO，生成的执行计划效率更高，而中间件模式下没有这些信息，往往只能基于规则RBO（Rule-Based-Opimization），这也是为什么中间件模式一般并不支持跨库join，因为实现了效率也往往并不高，还不如交给应用去做。

这里也可以看出中间件+分库分表模式的架构风格体现出的是一种妥协、平衡，它是一个面向应用型的设计；而NewSQL数据库则要求更高、“大包大揽”，它是一个通用底层技术软件，因此后者的复杂度、技术门槛也高很多。

存储引擎

传统关系数据库的存储引擎设计都是面向磁盘的，大多都基于B+树。B+树通过降低树的高度减少随机读、进而减少磁盘寻道次数，提高读的性能，但大量的随机写会导致树的分裂，从而带来随机写，导致写性能下降。NewSQL的底层存储引擎则多采用LSM，相比B+树LSM将对磁盘的随机写变成顺序写，大大提高了写的性能。不过LSM的的读由于需要合并数据性能比B+树差，一般来说LSM更适合应在写大于读的场景。当然这只是单纯数据结构角度的对比，在数据库实际实现时还会通过SSD、缓冲、bloom filter等方式优化读写性能，所以读性能基本不会下降太多。NewSQL数据由于多副本、分布式事务等开销，相比单机关系数据库SQL的响应时间并不占优，但由于集群的弹性扩展，整体QPS提升还是很明显的，这也是NewSQL数据库厂商说分布式数据库更看重的是吞吐，而不是单笔SQL响应时间的原因。

成熟度与生态

分布式数据库是个新型通用底层软件，准确的衡量与评价需要一个多维度的测试模型，需包括发展现状、使用情况、社区生态、监控运维、周边配套工具、功能满足度、DBA人才、SQL兼容性、性能测试、高可用测试、在线扩容、分布式事务、隔离级别、在线DDL等等，虽然NewSQL数据库发展经过了一定时间检验，但多集中在互联网以及传统企业非核心交易系统中，目前还处于快速迭代、规模使用不断优化完善的阶段。相比而言，传统关系数据库则经过了多年的发展，通过完整的评测，在成熟度、功能、性能、周边生态、风险把控、相关人才积累等多方面都具有明显优势，同时对已建系统的兼容性也更好。对于互联网公司，数据量的增长压力以及追求新技术的基因会更倾向于尝试NewSQL数据库，不用再考虑库表拆分、应用改造、扩容、事务一致性等问题怎么看都是非常吸引人的方案。对于传统企业例如银行这种风险意识较高的行业来说，NewSQL数据库则可能在未来一段时间内仍处于探索、审慎试点的阶段。基于中间件+分库分表模式架构简单，技术门槛更低，虽然没有NewSQL数据库功能全面，但大部分场景最核心的诉求也就是拆分后SQL的正确路由，而此功能中间件模式应对还是绰绰有余的，可以说在大多数OLTP场景是够用的。

限于篇幅，其它特性例如在线DDL、数据迁移、运维工具等特性就不在本文展开对比。

总结

如果看完以上内容，您还不知道选哪种模式，那么结合以下几个问题，先思考下NewSQL数据库解决的点对于自身是不是真正的痛点：

强一致事务是否必须在数据库层解决？
数据的增长速度是否不可预估的？
扩容的频率是否已超出了自身运维能力？
相比响应时间更看重吞吐？
是否必须做到对应用完全透明？
是否有熟悉NewSQL数据库的DBA团队？

If 2 to 3 of the above are affirmative, then you can consider using NewSQL database. Although it may require a certain learning cost in the early stage, it is the development direction of the database and the future benefits will be higher, especially in the Internet industry. As the amount of data increases by leaps and bounds, the pain caused by sub-databases and tables will increase day by day. Of course, you must be prepared to take certain risks when choosing NewSQL database. If you haven’t made a decision yet, think about the following questions:

Can the final consistency meet the actual scenario?
Can the total amount of data in the next few years be estimated?
Is there a system maintenance window for operations such as capacity expansion and DDL?
Is it more sensitive to response time than throughput?
Does it need to be compatible with existing relational database systems?
Is there an accumulation of traditional database DBA talents?
Can you tolerate the intrusion of sub-databases and sub-tables on the application?

If most of these questions are positive, then it is better to sub-database and sub-table. There are few perfect solutions in the software field, and NewSQL database is not a silver bullet for data distributed architecture. In comparison, database and table sharding is a lower cost and less risky solution. It reuses the traditional relational database ecology to the greatest extent. Through middleware, it can also meet most of the functions and customization capabilities of sharding tables. Stronger. In the current stage when the NewSQL database is not yet fully mature, sub-database sub-table can be said to be a solution with a low upper limit but a high lower limit, especially the core system of traditional industries. If you still intend to use the database as a black box product, it is practical Practical and good sub-database and sub-table will be considered a safe choice.

At last

Thank you all for seeing here, the article has deficiencies, welcome to point out; if you think it is well written, then give me a thumbs up.

MySql is not fragrant? Why give up MySql and choose NewSql?