POLARDB compared with other relational database

https://baijiahao.baidu.com/s?id=1610828839695075926&wfr=spider&for=pc

Foreword

In the choice of database, MySQL has become a favorite Chinese developers. SQLServer database relatively conservative compared to the characteristics of Chinese developers prefer open database, while taking into account the price, then the price of money Oracle also blocked a large number of developers. Because open source, price and other factors, on the database selection, either low-cost, open-source MySQL, Oracle or is on tall. - Yunqi community "2017 Chinese developers report"

 

 

 

According to Yunqi community "2017 Chinese developers report" we can learn on a global scale, particularly in the domestic MySQL has a very high utilization rate, a large number of products and projects are dependent on MySQL as a relational database, so further optimization and transformation are promising on MySQL, so there will be a derivative version of MySQL includes: MariaDB, Percona, AliSQL, PhxSQL and so on, but MySQL itself is actually a "lightweight" database, compared to SQL Server and Oracle database business is actually somewhat inadequate. So all kinds of new eco-compatible MySQL database began.

2017 is the landing of various types of new databases, various NewSQL have ended dormant period and began commercial output, especially various types of MySQL-based eco-compatible and MySQL database protocol of new product development and also began to start commercial output, including a further optimization in the OLTP POLARDB, Aurora, X-DB, etc., as well as optimization of on-compatible OLTP and OLAP scene HTAP HybridDB, TiDB, BaikalDB and so on.

To the protagonist of this article - POLARDB, POLARDB was released in September 2017 and entered the beta stage, and ends on April 18 in the beta officially entered the commercial stage and continue to release new features to improve the experience. It is worth mentioning that the same is also on October 17 joint Tencent cloud-based TiDB PingCAP release of HTAP database, but as of posting still remain in the testing phase.

 

Introduction

POLARDB Ali cloud self-development of a new generation of cloud database, compatible with MySQL 100%, the highest performance of up to 6 times MySQL, enterprise-class OLTP (online transaction processing) both structured data and concurrent queries scene. It combines both commercial databases stable and reliable, high-performance, scalable features, but also has the advantages of simple open open-source database, and cost only 1/10 of a commercial database. POLARDB using separate storage and computing architecture that provides automatic storage expansion, elastic specifications to calculate lift, fast fault recovery and data backup disaster recovery services.

POLARDB of MySQL has been a certain transformation, in fact, to its launch MySQL database is based on the development of stand-alone, but stand-alone single node is bound to encounter performance bottlenecks. Ever since, for MySQL with the transformation in both directions.

One is to read and write separation scheme, because in many common sites of fact, we read request to go more on Ms. Taobao shopping, saw a lot of goods may be only a single case so not even a single case will add a collection.

 

For example pain points

Prior to re-introduce the advantages POLARDB, we have a first look at the experience of the industry's top cloud database MySQL some pain points (pain points other competing products may be more manufacturers' ~):

First, when the data storage capacity is large enough, the backup becomes a very complicated matter, time and effort.

Second, the standby mode, no problem occurs if the primary database will not switch to the standby database, library prepared only played a role in case of not usually share the pressure, but has to bear the cost of the part by the library.

Three, separate read and write in the field, a read-only library to add each have substantially identical later a storage space, if it is a primary library 1T, then the cost of two read-only library is 2 1T.

Four, or read and write separation scene, on a TB database capacity plus a read-only copy will take one to two days, very troublesome.

Fifth, the traditional mode in order to protect the database will not reach 100% will set aside 10% to 20% capacity as the threshold to 80% on upgrade or expansion, in fact, this time the reservation threshold is wasted, but it I had to waste.

Sixth, the storage capacity bottlenecks, relational database cloud computing vendors are basically a problem that storage capacity bottlenecks, when data reaches about 2T when it is already a bottleneck that can not be further increased, and some database application scenarios precisely require large mass storage.

Seven, performance bottlenecks when using database performance bottlenecks encountered when in fact it is a bad heart, if it is resolved through the upgrade configuration that comes in line, if the problem is because the database software itself and replace database software in turn means higher the amount of costs and risks.

 

Product Features

Then the next introduction POLARDB product features everyone will feel more comfortable.

 

Snapshot backup

POLARDB using snapshot (Snapshot) in the form of backup mode, does not mean that the data intact backup copy, but sharing the load backup data to the actual data after the write time window Snapshot creation occurred, in order to achieve backup , rapid response recovery.

Snapshot is a popular backup program storage device based on the block. Its essence is the use of mechanisms Copy-On-Write by the metadata change recording block device, copying respect to the write block device write operation occurs, the write operation of the content changes to a block device newly copied out to achieve recovery to a point in time snapshot of the purpose of the data. Snapshot is a typical time-based and post-processing mechanism wrote load model. Snapshot and provides a mechanism based POLARDB Redo log in the user data is restored according to the schedule function, incremental data restoration Binlog more efficient than conventional full volume of data binding.

Measured words, POLARDB backup minutes at level two or three minutes to backup, compared to the level of cloud database-hour wait can be said to be a great experience. On the backup and recovery experience POLARDB cloud database experience substantially uniform, and by backup set according to the schedule or the backup instances may

 

 

 

FailOver 多活

POLARDB is a default master for later read-only instance + instance formed Active-Active mode, which mechanism can live FailOver immediately select a primary read-only instance in the instance down "given" read and write capability to become a new primary instance. Each instance is a read-only "prepared by the library" under such a model, but the library is prepared to participate in the work, there is no waste phenomenon.

得益于数据共享(后面会提到)的模式,只读节点的增加无需再进行数据的完全复制,共用一份全量数据和 Redo log,只需要同步元数据信息,支持基本的 MVCC,保证数据读取的一致性即可。这使得系统在主节点发生故障进行 Failover 时候,切换到只读节点的故障恢复时间能缩短到 30 秒以内。

 

分布式共享存储架构

POLARDB使用了第三代分布式共享存储架构,实现了计算节点(主要做SQL解析以及存储引擎计算的服务器)与存储节点(主要做数据块存储,数据库快照的服务器)的分离,提供了即时生效的可扩展能力和运维能力。

 

 

由上图我们可以看到,POLARDB通过将数据库文件以及 Redolog 等存放在共享存储设备(POLATSTORE)上而不是一个库一个本地存储。由于数据共享,只读实例的添加就再也不需要对数据进行完全复制了,而是共用一份全量数据和 Redo log,只需要同步元数据信息,这使得系统在主节点发生故障进行Failover时候,切换到只读节点的故障恢复时间能缩短到30秒以内(有没有发现这一句话FailOver那边提过)。系统的高可用能力进一步得到增强。而且,只读节点和主节点之间的数据延迟也可以降低到毫秒级别。

 

存储费用按量付费

POLARDB 的存储计费是按量付费的模式,也就是用多少扣多少,而不是预付费的模式。 云计算方法论中很重要的两点就是 弹性和按量 。相对于 ECS 集群的后付费和流量的按量后付费,数据库的按量后付费其实更可控和可预估,并不会出现天价费用,而且在数据库容量的扩容上用户也的确会遇到不好的体验(痛点那里有提到)。

所以用户会很愿意接受 POLARDB 的按量付费模式。再也不需要为那10%的扩容阈值浪费成本了。

 

大容量存储

POLARDB 支持高达 100TB 的存储容量,是云数据库 2T 容量上限的 50 倍。如果数据库的使用场景中的确需要超大存储再也无须担心了。

 

超高性能

POLARDB 作为一款 云原生 的数据库,在软件设计、产品架构、基础设施上都是顶尖的(如果用最顶尖的可能会违反广告法~)。 在性能上 POLARDB 远超 MySQL ,在特殊场景下最高可以实现6倍于 MySQL。

软件设计上的删繁就简,仅能更进一步。 下面那张图上门出现过,可以看到传统的MySQL下读写分离其实非常的繁琐,而且要写入大量的逻辑日志。POLARDB 在 MySQL 上进行了大量的修改包括有:使用共享存储物理复制、锁优化、日志提交优化、复制性能优化、读节点性能 等等。 同时 POLARDB 是基于 Docker 来隔离资源的,免去了一次虚拟化带来不必要的性能损耗。

 

 

 

超规格底层硬件提供更高性能。3D Xpoint、NVMe、RDMA网卡这些名词都是在极客玩家中经常有听到的,它们都意味着超高的性能,同时也意味着高昂的价格。前文中有提到的 POLARSTORE 存储,就是基于这些极致的硬件设备而来的,但是阿里云将他们集成到 POLARSTORE 并以云计算的形式普惠输出,让大家可以用低廉的价格享受最前沿的技术和产品。

软硬件一体化设计,但是软件、硬件单方面的提升都无法成就 600% 于 MySQL 的性能表现,POLARDB 将全新的软件针对最酷的硬件进行优化实现软硬件一体,所以也是非常推荐大家可以阅读一下关于 PolarFS VLDB2018 的 Paper。 我是传送门

(猜测)未来的 多主进群(Multi-Master) 机制,这个纯属我瞎猜,但是 POLARDB 大概率是会做的,那就是在多个可用区中创建多个读取主实例。这样一来,应用程序就可以在集群的多个数据库实例中读取和写入数据,极大的扩展分布式写的性能,这简直就是抢 DRDS 的饭碗嘛! 多主集群还会进一步提高高可用性,如果其中的一个主实例发生故障,集群中的其他实例将立即接替该实例,从而在发生实例故障甚至完全 AZ 故障时保持读写可用性,应该是可以做到将应用程序停机时间降到零。

 

100% MySQL 兼容

POLARDB 针对 MySQL 生态 100% 兼容。 为什么这个都要拿出来说呢? 举两个例子:

一是在本文的第一张图中可以看到 Oracle 在中国有不小的份额并占据第二的位置,为什么?根据我对客户上云的一些经验来看,使用 Oracle 的客户大多都是政企客户,系统依赖 Oracle 有历史包袱,贸然迁出要面临不小的工作量而且出问题了势必会背锅,所以尽管有什么高度兼容 Oracle 的方案,95%也好 99% 也好,势必意味着不可预知的风险。

二是像谷歌的 CLOUD SPANNER 就不兼容已有的数据库生态,这就导致用户必须针对其全新开发而且未来势必对GCP有非常强的依赖,难以脱身。

因此 POLARDB 的 100% 兼容 MySQL 生态绝对是一大特性,让客户可以无痛的就使用高性能的数据库产品来解决现有遭遇的数据库性能、功能瓶颈或者说是使用期新特性来提高业务可靠性和稳定性。

 

混淆OLTP和OLAP

POLARDB 的百TB级的存储和高规格软硬件带来的低延时一定程度上模糊了 OLTP 和 OLAP 的边界,在追求数据量实时性的场景下可以更好的进行 OLAP 分析,而避免要将数据库放到数仓然后再进行 OLAP 分析。

 

性能测试

这里使用 SysBench 1.0.15 进行小规格版本的测试。

 

测试准备

ECS 自建: 自建 MariaDB 10.1 (基于 MySQL 5.6), 底层服务器:计算型C5 2C4G 150G

SSD云盘

RDS 主实例: 云数据库 MySQL 版 5.6 高可用,2C4G 版

RDS 读写分离: 云数据库 MySQL 版 5.6 高可用,2C4G 版,主实例 + 1个 只读实例

POLARDB 主实例: POLARDB 2C4G ,主实例,不使用只读实例

POLARDB 读写分离: POLARDB 2C4G ,主实例 + 1个只读实例

由于 ECS 自建读写分离场景太费时费力了,就不创建了。

 

测试命令

 

 

 

测试结果

SysBench 读场景结果(越大越好)

 

 

 

SysBench 写场景结果(越大越好)

 

 

 

SysBench 超过95%平均耗时场景结果(越小越好)

 

 

 

无论是 POLARDB 还是 RDS 都是配置越高实例越多性能越好的,这里测试的都是入门款,所以评测效果并不是怪兽级的。

不过我们依然可以看到这是 POLARDB 的碾压局,同配置下 POLARDB 较 RDS 有近一倍的性能提升,和自建 ECS 并且是我的“弱鸡”调参比几乎是碾压。

值得一提的是,我貌似读写场景都没有把读写分离和单主实例的性能差异测出来,如果有测试方式有问题欢迎大家斧正。

 

横向“云评测”

在科技产品界有一种说法较“云评测”,那就是明明某小编产品实际没摸过,但是小编还是能一本正经的能来波横向测评。 这次我也来一波云测评~

目前阿里云平台上的 MySQL 兼容产品就有: 云数据库 MySQL 版(RDS)、分布式关系型数据库(DRDS)、云数据库POLARDB、HybridDB for MySQL (原PetaData)。 那么有人就会问,四款 MySQL 怎么选怎么用呢?

首先,我们可以看一下一张加入了 POLARDB 的阿里云数据库家族上云指导图:

 

 

 

大致的我们就可以知道,HybridDB for MySQL (原PetaData) 是专门用于 HTAP 场景的,有强 OLAP 需求还是得考虑 HybridDB,不过其为了 OLAP 兼容还是丧失了一些 MySQL 特性的,比如说不支持约束和一些高级特性。

接下来三个数据库当中,都是常见的 MySQL OLTP 场景,而且都可以添加只读实例,同质化很严重。

云数据库 MySQL 版依旧是最简单的 MySQL 目前支持的功能最多,但是性能略有不足,而且所有升级操作都是小时级的操作体验相对不太好。未来的话我认为更适合小规格数据库的入门级使用。

DRDS 和 POLARDB 都有分布式的属性,DRDS 的分布式是实例级的,POLARDB 是共享分布式存储。

DRDS 是集合多台 RDS 的性能来提升性能避免单台数据库的性能瓶颈,适合非常大规模的业务,但是分库分表等操作对操作人员要求较高,必须得需要有一定的数据库功底,要对数据库进行更改,对使用者来说有一定学习成本。DRDS 本身是一款中间件产品(尽管已经划分到数据库分类下了),底层还是依赖于 RDS 实例的,所以 DRDS 离不开 RDS,因此在大规模存储的情况下,我觉得超过2T的存储,DRDS 也吃力,其亮点还是分布式的性能表现优异。

POLARDB 则是一款全新的类型,对 MySQL 有了内核级的改造,在读的能力上大大提升并且性能优异,但是相对来说 写 还是依赖于主实例,DRDS 的分布式可以提升写的能力。 如果 POLARDB 完成了 多主进群(Multi-Master) 的支持的话,写的能力也会大大提升。 并且 POLARDB 不需要修改数据库,使用者可以无痛的切换,而且先进的分布式存储可以让其实现百T级的存储能力。 所以未来 POLARDB 和 DRDS 的竞争也是大有看头。

 

展望

POLARDB 的特性使得 MySQL 性能有了极大的提高,POLARDB 是一个分布式的理念,POLARDB 的 POLARSOTRE 架构其实是可以延伸到其他开源数据库上的。

未来 POLARDB for PostgreSQL、POLARDB for MongoDB、POLARDB for PPAS 都是非常可期的,未来更高性能的 PG、MongoDB 也是令人向往。

Guess you like

Origin www.cnblogs.com/zhangfengshi/p/11589982.html