Distributed HTAP database PetaData: new changes brought by the big data industry

I. Introduction

In the era when big data promotes the development of the industry, large-scale enterprise-level applications often choose a variety of database products to support online transactions, report generation, log storage, offline analysis, etc., to drive the rapid development of business, but this combined solution , it is necessary to finely control the data flow and consistency between different products, and it is very difficult to use. The data synchronization and redundancy between each database product also brings a high cost overhead, which further limits the use of enterprise-level applications. developing.

      In recent years , Gartner has proposed the concept of HTAP database, a database can support both OLTP (online transaction processing) and OLAP (online analytical processing) , covering the needs of most enterprise-level applications, and solving these problems in one place. Database cloud service providers have responded to support one after another, and enterprise-level application cases have sprung up.

     What innovations has the HTAP database made in its architecture and what key problems have it solved? What application difficulty and cost can be reduced for enterprise-level applications? This article will reveal to you the new changes that the HTAP database brings to the big data industry.

二、OLTP + OLAP vs. HTAP

      Two business scenarios for enterprise-level applications: online transactions and data analysis, are typical applications of OLTP and OLAP . Online transactions have strict requirements on the ACID characteristics of the database, and pay more attention to the ability of the database in terms of low latency and high concurrency. Data analysis does not have high requirements for concurrency and latency, but pays more attention to the algorithm support, capacity, and computing processing capability of the database. At different growth stages of enterprise applications, the technologies chosen for these two types of business are very different:

  1. Small application stage: In order to save costs, enterprises choose to run these two types of business in the same OLTP database, which can run well when the data scale is small;

  2. Medium-sized application stage: When the data scale increases, it will face the problem of resource competition: the analysis business will consume a large amount of CPU and IO resources of the database, which will affect the delay of transaction business, and ultimately make each business not well served. At this time, the enterprise chooses database read-write separation and time-sharing multiplexing, one main database is used for transactions, multiple read databases are used for analysis, and online business and offline business are time-sharing multiplexed;

  3. Large-scale application stage: The data scale further increases, a single main database can no longer meet the transaction requirements, and the reading database cannot run more and more complex analytical SQL . At this time, the enterprise chose sub-database, sub-table and analytical database, and used sub-database and sub-table middleware to split the main transaction database, horizontally expand transaction performance, and at the same time synchronize the data to the OLAP database for analysis and calculation, so as to be thorough. resource isolation;

  4. Giant application stage: The scale of data increases again. Every time the OLTP database is expanded, it consumes a lot of manpower and material resources. The delay and cost of synchronizing data to the OLAP database are very high. When using it, you need to select different database entrances for different services, and the management is complicated. Great degree. At this time, enterprises can choose the HTAP database to further improve the business structure, reduce costs, improve ease of use, and improve operation and maintenance experience;

Figure 1. Architecture evolution of enterprise-level applications

Please click here to enter image description

      After careful analysis of these different stages, it can be found that using HTAP database cloud service can save enterprises the trouble of selecting models:

  1. Regardless of the size of the business, the way enterprises use the HTAP database is always the same as the small application stage, and there is no need to change the usage habits;

  2. 业务规模扩大,企业也可以为HTAP数据库添加更多的计算存储资源,提升数据库的能力,以适应业务,每个阶段无需付出额外的成本;

  3. 企业无需关心数据库的运维,进一步减少了人力开销;

      阿里云提供的HybridDB for MySQL便是一款HTAP数据库云服务,兼容MySQL的协议、语法、生态,用户无需改变使用习惯,采用全自研的链路存储计算分离架构,可以满足不同业务规模的企业级应用需求,并与之共同成长。

三、HTAP数据库架构优势

      阿里云HybridDB for MySQL是松耦合分布式架构的HTAP数据库云服务,核心技术架构如下所示:

图2. 阿里云HybridDB for MySQL核心架构

1. 数据分区

      HybridDB for MySQL采用了数据分区的架构,分区间share nothing,从而支持线性扩容,链路、存储、计算分离,合理利用数据库的整体硬件资源,降低整体成本。

图3. 阿里云HybridDB for MySQL数据分区原理

数据分区架构使得节点扩容变得更为简单,加减节点只涉及到局部的数据搬动,而且不影响业务使用。统一的链路入口,不会改变用户的使用习惯,一份存储,不会带来更多的成本,独立的计算资源,充分适应不同业务的计算需求。

2. 统一的数据库云服务

      在数据库云服务方面,HybridDB for MySQLRDS for MySQL对齐,几个解决方案的综合对比如下:

 

HybridDB for MySQL

RDS for MySQL

OLTP+OLAP混合方案

访问入口

统一入口

统一入口

多点入口

ACID事务特性

全局ACID

全局ACID

组件间ACID

SQL兼容性

全局一致

全局一致

组件间兼容性不同

数据延迟

有同步延迟

稳定性

统一的稳定性保障

统一的稳定性保障

组件间稳定性不同

性能扩容

线性扩容

不支持线性扩容

线性扩容

计算功能扩展

多种计算功能扩展

不支持计算功能扩展

多种计算功能扩展

存储成本

一份存储

一份存储

多份存储

计算成本

一份计算

一份计算

多份计算

异构数据同步成本

数据同步成本较高

备份恢复

支持

支持

组件局部支持

监控

支持

支持

组件局部支持

表1. 阿里云HybridDB for MySQL与其他数据库服务的对比

3. 高可用

      HybridDB for MySQL全链路均有高可用设计,链路引擎、计算引擎为无状态设计,副本扩增可以带来更高的可用性,存储引擎为一主多备半同步复制的存储引擎,数据库本身也支持实时备份,并支持按备份集恢复。

图4. 阿里云HybridDB for MySQL高可用架构

四、应用场景

      HTAP数据库,常用于混合业务场景,以综合能力著称,可以替代大部分OLTP、OLAP数据库混用的技术架构,实际的应用场景可见下文。

1. 分库分表+实时分析

      企业级应用的最典型业务为在线交易和数据分析,使用HTAP数据库能有更多的收益:

  1. 在线交易业务使用单机数据库+分库分表中间件,而HTAP数据库的水平分区架构,天然兼容分库分表中间件的业务场景,企业级用户无需再关心底层单机数据库的运维问题;

  2. 数据分析业务使用数据同步+大数据处理平台,HTAP数据库支持直接对数据进行分析处理,且不影响在线业务,在时效性和成本方面,有很大的优势;


图5. 分库分表+实时分析业务使用HTAP数据库

2. 物联网实时数据处理

      物联网大数据应用,具有海量的传感器数据,实时更新和查询需求,非常密集,对数据库的性能要求很高。使用HTAP数据库,能够获得KV数据库的读写性能,NoSQL数据库的容量,OLTP关系数据库的多位查询能力,以及OLAP数据库的复杂分析能力。

图6. 物联网业务使用HTAP数据库

3. 实时数据仓库

数据仓库通常仅允许导入,并且是只读的,不允许实时更新,使用模式是将一批完整的数据导入到数据仓库中,然后利用数据仓库的计算和存储能力,进行各种维度的计算。通俗点讲,数据仓库存储的数据通常是“二手数据”,一般由关系数据库的“一手数据”生成,进入数据仓库的数据,对齐在事务边界。

      对于某些时效性要求极高的大数据业务,Hadoop+MapReduce甚至是Spark都无法满足低延时大数据服务的需求,此时可以选择HTAP数据库,既支持批量导入原始数据,进行实时聚合分析,又支持实时从大数据处理平台上同步结果,充当高性能缓存和二级数仓,提升企业级应用的整体响应能力。此外,HTAP数据库也能直接生成实时报表,进一步提升HTAP数据库在大数据业务的应用范围。

图7. 实时数据仓库业务使用HTAP数据库

五、后记

随着业务的爆炸式增长,越来越多的企业,需要重量级的数据库产品和更好的服务,来避免技术架构成为企业的瓶颈,从而解放企业,以更专注于核心业务。

      阿里云的创新产品HybridDB for MySQL,是阿里云全自研的HTAP数据库产品,紧贴企业级用户的需求,为企业级应用带来了新的选择,也体现了阿里云在数据库行业的技术实力和自研决心,HybridDB for MySQL会为用户带来更好的数据库服务体验。

原文链接:

http://click.aliyun.com/m/27304/   

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326513403&siteId=291194637