HBase heavyweight | ApsaraDB HBase data storage and analysis platform overview

I. Introduction

Time to 2019, the database has also developed to a new inflection point, there are three obvious trends:

  1. More and more databases will be CloudNative, and will continue to use new hardware and the advantages of the cloud itself to create CloudNative databases. In China, Alibaba Cloud’s Cloud HBase and PolarDB are represented. This article will have certain quotes. But it is not the focus of this article.

  2. NoSQL is solving problems in the BigData field. According to the Forrester NoSQL report, BigData NoSQL provides storage, computing processing, support for horizontal expansion, Schemaless, and a flexible data model. It specifically mentions the need to support complex computing, which is generally implemented by integrating Spark or implementing a separate computing engine. The Cassandra commercialization company Datastax provides products that integrate Spark directly on top of Cassandra, and the slogan on the homepage of ScyllaDB is The Real-Time Big Data Database. The 5V characteristics of big data, including Volume: large amount of data, including collection, storage, and calculation; Variety: diversified types and sources, including structured, semi-structured and unstructured data; Value: data value The density is relatively low, or it is precious in the waves; Velocity: the data grows fast, the processing speed is fast, and the timeliness requirements are high; Veracity: the accuracy and reliability of the data, that is, the quality of the data Need to be high. The 5V feature can be well satisfied with the BigData NoSQL database, and it can also satisfy real-time writing, analysis and display.

  3. More and more companies or products are integrating multiple capabilities. Strapdata integrates the capabilities of Cassandra and ElasticSearch; Datastax directly integrates Spark on Cassandra; SQL Server also integrates Spark to create Native Spark to meet DB computing capabilities Extended business demands.

After two years of development in the public cloud ( HBase alone has been developed within Alibaba for almost 9 years), Alibaba Cloud HBase integrates open source projects such as Apache HBase, Apache Phoenix, Apache Spark, and Apache Solr, plus a series of self-developed features. Satisfy [integrated data processing platform, providing one-stop capability], the basic structure is as follows:

image

We are standing on the shoulders of the Apache giant. We have self-developed ApsaraDB Filesystem, HBase cold and hot separation, SearchIndex, SparkOnX, BDS and other modules, optimized some patches of HBase, Phoenix, Spark and other kernels, and feedback to the community, maintenance and creation A series of platform capabilities such as model service and data workbench. The self-developed part is the core competitiveness of our platform. Every layer and every component are carefully built by us to meet the actual needs of customers' data-driven business. In order to lower the barriers to entry for customers, we provide Demo support on Github: aliyun-apsaradb-hbase-demo:

https://github.com/aliyun/aliyun-apsaradb-hbase-demo

Welcome everyone to pay attention and contribute code. Next, the author will introduce each layer, strive to be simple and popular, there are a lot of links in the article for derivative reading.


2. Business perspective and data flow

As a storage computing platform, the value is meeting different business needs. See the figure below:
This figure describes the source and channel of the data to the cloud HBase platform, and then use the Spark engine provided by the platform to mine value and feedback to the business system. This is similar to a circulatory system, which is called "business dataization, then data businessization" in Ali's internal image.

image

Combining architecture diagrams and business diagrams, this platform integrates storage (including real-time storage and offline storage), computing, and retrieval technologies. The entire system is built on top of the ApsaraDB Filesystem unified file layer. The search is packaged through Phoenix's SearchIndex to reduce ease of use, and the domain engine is built to meet the needs of the domain. The built-in BDS (data channel) real-time archive data to column storage, and then through Spark Engine mining value.

Detailed reference: [Top Ten Reasons for Choosing Alibaba Cloud Database HBase Edition]

https://yq.aliyun.com/articles/699489


Three. Unified file access layer

ApsaraDB Filesystem (abbreviated as ADB FS) builds a cloud HBase ecological file layer base based on the Hadoop FileSystem API. It provides a non-perceptual hybrid storage capability for the HBase ecosystem, which greatly simplifies the complex environment where the HBase ecosystem is connected to the cloud with multiple storage forms. Supports OSS, Alibaba Cloud HDFS, HDFS built on cloud disks or local disks, and systems built on shared cloud disks. Each distributed file system uses different hardware, different costs, different delays, and different throughputs (not expanded here). We can continue to expand, as long as we add an implementation of xxxFileSystem. The FS implemented directly based on OSS cannot have atomic metadata management capabilities. The implementation scheme is to store metadata on the namenode of HDFS, and the actual storage is stored on OSS. The Rename operation only needs to move the metadata, so it is very lightweight.

image


Four. HBase KV layer

HBase is an open source implementation based on Bigtable in the hadoop community and provides features such as sparse wide tables, TTL, and dynamic columns. HBase has been developed in Ali for 9 years, and there are already digital PMCs and Committers. It can be said that Ali's influence on HBase in China is still one of the best. There are also many patches in the community that are also contributed by Ali. In 18 years, Cloud HBase first commercialized HBase2.0 and contributed dozens of BugFix to the community. Many customers use the HBase API alone to meet business needs, and many customers use the Phoenix NewSQL layer. The NewSQL layer improves ease of use and provides many useful functions. At the HBase level, in addition to fixing bugs in the community, several larger features have also been made.
In terms of comparing relational data, HBase also has natural advantages. Refer to the article "Comparing MySQL to see through the capabilities and usage scenarios of HBase":

https://yq.aliyun.com/articles/702323

  • Cold and
    hot separation Cold and hot separation can reduce storage costs by about 66%. It is widely used in scenarios such as Internet of Vehicles and cold logs. We store the cold data on OSS, and users can also use the HBase API to access it. The basic principle is: store Hlog on HDFS, and then store cold HFile on OSS.

image

  • GC optimization
    GC has always been a hot topic discussed in Java applications. Especially in large online storage systems like HBase, the online real-time impact of the GC pause delay under a large heap (100 GB) has become the kernel and application developers A big pain point. The platform realized the new memory storage structure of CCSMap, combined with the optimization of offheap and the new ZenGC, and reduced the young GC time from 120ms to 15ms in the production environment, and further reduced it to about 5ms in the laboratory. Reference article "How to reduce 90% Java garbage collection time? Take Ali HBase's GC optimization practice as an example":

    https://yq.aliyun.com/articles/618575


5. Retrieval layer

HBase底层基于LSM,擅长前缀匹配和范围查找,数据模型上属于行存,大范围扫描数据对系统影响很大。我们知道,用户的需求往往是各式各样,不断变化的。对于要求高TPS,高并发,查询业务比较固定且简单的场景,HBase可以很好满足。更复杂一些,当用户对同一张表的查询条件组合有固定多个时,可以通过二级索引的方式来解决,但是二级索引有写放大问题,索引数量不能太多,一般建议不超过10个。当面对更复杂的查询模式,比如自由条件组合,模糊查询,全文查询等,用当前的索引技术是无法满足的,需要寻求新的解决方案。我们容易想到,搜索引擎,比如Lucene、Solr以及ElasticSearch,是专门面向复杂查询场景的。为了应对各种复杂的查询需求,搜索引擎运用到了大量跟LSM Tree十分不同的索引技术,比如倒排、分词、BKD Tree做数值类型索引、roaring bitmap实现联合索引、DocValues增强聚合和排序等。使用搜索引擎的技术来增强HBase的查询能力是一个十分值得深入探索的技术方向。

当前用户要想实现,复杂查询,只能重新购买新的搜索集群,通过导数据的方式将数据导入到新的搜索服务中。这种方式存在很多这样那样的问题:维护成本比较高,需要购买在线数据库,分析数据库和数据传输服务;学习门槛高,需要同时熟悉至少上诉三种服务;无法保证实时性,在线库入库和检索库入库效率不匹配;数据冗余存储,在线库索引数据和结果数据设计的所有数据都需要导入;数据一致性难保证,数据乱序问题十分常见,特别是对于分布式在线库更是如此。云HBase引入Solr,并在产品和内核上做了一系列工作,将其打造成统一的产品体验,一揽子解决了前述所有问题。用户在控制台上一键可以开通检索服务,参考文章”云HBase发布全文索引服务,轻松应对复杂查询“:

https://yq.aliyun.com/articles/690018

image

检索服务的架构如上图所示,最底层是分布式文件系统的统一抽象,HBase的数据和Solr中的数据都会存储在分布式文件系统中。最上层是分布式协调服务Zookeeper,HBase、Indexer、Solr都是基于其实现分布式功能。Indexer实现了存量HBase数据的批量导入功能,有针对性地实现了数据批量导入的分布式作业机制。Indexer服务也实现了实时数据的异步同步功能,利用HBase的后台Replication机制,Indexer实现了Fake HBase功能,接收到HBase的数据后,将其转换为Solr的document,并写入Solr。针对HBase写入速度比Solr快的问题,我们设计并实现了反压机制,可以将Solr中数据的延迟控制在用户设定的时间范围内,该机制同时也避免了HLog消费速度过慢的堆积问题。实时同步和批量导入可以同时运行,我们通过保序的时间戳保证了数据的最终一致性。为了提高产品的易用性,我们还基于Phoenix 实现了检索服务的SQL封装,并在存储查询等方面做了一系列优化升级,该部分在下个章节将会介绍。


六.NewSQL Phoenix

Phoenix是HBase之上的SQL层,Phoenix让HBase平台从NoSQL直接进化到了NewSQL。在HBase的基础之上,再支持了Schema、Secondary Indexes、View 、Bulk Loading(离线大规模load数据)、Atomic upsert、Salted Tables、Dynamic Columns、Skip Scan等特性。目前云上最大客户有200T左右,且50%+的客户都开通了Phoenix SQL服务。我们修复了社区数十个Bug及提了不少新特性,团队也拥有1位Committer及数位contributor。在18年我们在充分测试的基础上,先于社区正式商业化了Phoenix5.0,并支持了QueryServer,支持轻量的JDBC访问。同时,社区的5.0.1也将由我们推动发布。

Phoenix本身我们做了一系列稳定性,性能等方面的优化升级,主要有:客户端优化MetaCache机制,大数据量简单查询性能提升一个数量级;索引表回查主表,使用lookupjoin的方式优化,性能提升5到7倍;轻客户端优化batch commit,性能提升2到3倍;解决Phoenix时区问题,提高易用性,降低数据一致性问题概率;禁用DESC,扫全表等有风险功能;实现大批量数据导入的Bulkload功能;等等。这些稳定性和性能方面的提升,在用户侧得到了很好的反馈。

image

Phoenix目前基本的架构如图所示,我们让Phoenix支持了HBase和Solr双引擎,用户可以使用SQL实现对HBase和Solr数据的管理和查询,大大提高了系统的易用性。Solr和HBase之间的同步机制可以参考上节。在支持复杂查询方面,我们设计并实现了一种新的索引:Search Index,使用方式跟Phoenix的Global Index类似,主要区别在于Search Index的索引数据存储在Solr里面,而Global Index的索引数据是一张单独的HBase表。直接通过SQL管理Search Index的生命周期、数据同步和状态,自动映射数据字段类型,并通过SQL支持复杂查询,这极大降低了用户的使用门槛。Search Index可以统一根据HBase和Solr的特性做优化,由于原表在HBase中可以通过RowKey高效查询,Solr中只需要存储作为查询条件的字段的索引数据,查询字段的原数据不需要存储在Solr中,表中的非查询字段则完全不需要存储到Solr中。相对于用户单独购买检索产品,并同步数据的方案,Search Index可以大大降低存储空间。同时,根据索引特性,Phoenix在做执行计划优化时,可以动态选择最优的索引方案。

我们还打造了一个系列的文章,这些文章是很多国内用户熟悉和学习Phoenix的入门资料,在社区里面也收获了较高的影响力,参考”Phoenix入门到精通“:

https://yq.aliyun.com/articles/574090


七.多模领域层

数据类型有表格、文档、宽表、图、时序、时空等不同的类型。云HBase之上打造了 HGraphDB分布式图层、OpenTSDB分布式时序层、Ganos分布式空间层,分别满足3大子场景的诉求。每个都是分布式的组件,具备PB级别的存储、高并发读写及无限扩展的能力。

  • HGraphDB
    HGraphDB是云HBase完全自研的组件。HGraphDB基于Tinker pop3实现,支持集成Tinker pop3全套软件栈以及Gremlin语言。HGraphDB是一个OLTP图库,支持schema以及顶点和边的增删改查还有图的遍历。图数据库HGraphDB介绍:

    https://yq.aliyun.com/articles/684336

  • OpenTSDB
    OpenTSDB是社区在HBase的基础之上提供的时序引擎,以HBase为底座,满足PB级别的时序存储需求。团队做了大量优化,为了提升稳定性,其中【时间线压缩优化】是一个比较重要的优化,见“
    云HBase之OpenTSDB时序引擎压缩优化”:

    https://yq.aliyun.com/articles/696180

  • Ganos
    Ganos取名于大地女神盖亚(Gaea)和时间之神柯罗诺斯(Chronos),代表着“时空” 结合。Ganos空间算子增强、时空索引增强、GeoSQL扩展等,与Spark结合支持大规模遥感空间数据在线分析与管理。详细参考文章“
    阿里云时空数据库引擎HBase Ganos上线,场景、功能、优势全解析”:

    https://yq.aliyun.com/articles/680538


八.列式存储

行列混合HTAP一直是各大数据库梦寐追求大统一的技术,类似于M理论想统一量子力学与万有引力。目前看起来一份存储难以满足各种诉求,通用的做法是行存与列存的数据分开存,实现手段一种是通过同步的方案把行存的数据再转存一份列存,另一种是通过raft等变种协议的手段实现行列副本同时存在。
HBase擅长在线查询场景,底层的HFile格式实际还是行存,直接Spark分析HBase表在大范围查询的情况下性能一般(Spark On HBase也有很多优化点)。在这样的背景下我们构建了HBase的实时HLog增量同步归档到列存的链路,来有效满足用户对于HBase数据分析的需求。列存的压缩比比行存高,增加部分存储成本,有效的增强分析能力,用户是能够接受的。HBase搭配列存可以有效的驱动用户业务的发展,列存分析后的结果数据回流到HBase支持业务,让用户业务在HBase平台中快速迭代。在列存之中,也有类似LSM的 Delta+全量的,比如Kudu以及Delta Lake。云HBase参考了Delta Lake及Parquet技术,提供更加高效的一体化分析。

  • Parquet
    Parquet的灵感来自于2010年Google发表的Dremel论文,文中介绍了一种支持嵌套结构的存储格式,并且使用了列式存储的方式提升查询性能,目前Parquet已经是大数据领域最有代表性的列存方式,广泛应用于大数据数据仓库的基础建设。

  • Delta
    Delta原本是Spark的商业公司Databriks在存储方面做的闭源特性,偏向实时读写,已于近期开源,核心是解决了大数据分析场景中常见的数据更新的问题。具体做法按列式格式写数据加快分析读,增量更新数据Delta则采取行式写入支持事务和多版本,然后系统通过后台不断地进行合并。

  • 一键同步

image

用户可以根据自身的业务需求进行转存,对于对实时性要求比较高的用户,可以选择实时同步的方式,BDS服务会实时解析HLog并转存到Delta,用户可以通过Spark对Delta直接进行查询;而对于离线场景的转存,用户可以在控制台上根据自身业务需要进行配置,可以自定义在业务低峰期进行转存,也可以选择是否进行增量和全量合并,后台调度系统会自动触发转存逻辑。


九.分析层

在云HBase平台里面沉淀了不少数据,或者在进入云HBase平台的数据需要流ETL,参考业界的通用做法,目前最流行的计算引擎是Spark,我们引入Apache Spark来满足平台的数据处理需求。Spark采取的是DAG的执行引擎,支持SQL及编程语言,比传统的MR快100倍,另外支持流、批、机器学习、支持SQL&Python&Scala等多种编程语言。云HBase平台提供的能力有流式的ETL、Spark on HBase(也包括其它数据库)及HBase数据转为列存后的分析。为了满足Spark低成本运行的需求,我们即将支持Serverless的能力。Spark在数据库之间,处于一个胶水的作用,平台通过Spark打造数据处理的闭环系统以核心客户的核心问题,比如点触科技的游戏大数据平台

  • 支持流式处理
    大部分的系统之中,数据经过中间件之后需要一些预处理再写入到HBase之中,一般需要流的能力。Spark Streaming提供秒级别的流处理能力,另外Structured Streaming可以支持更低时延。平台支持Kafka、阿里云LogHub、DataHub等主要的消息通道。关于很多从业者关心的Spark跟Flink对比的问题,其一,Flink基于pipeline模式的流比Spark基于mini batch的流在延迟上要低,功能上也更强大,但是大部分用户很难用到毫秒级和高阶功能,Spark的流满足了大部分场景;其二,Spark生态要比Flink成熟,影响力也更大。

  • Spark On X
    分析层不仅仅支持HBase、Phoenix以外,也包括POALRDB、MySQL、SQLServer、PG、Redis、MongoDB等系统。比如:归档POLARDB数据做分析,Spark On X支持schema映射、算子下推、分区裁剪、列裁剪、BulkGet、优先走索引等优化。算子下推可以减少拉取DB的数据量,以及减少DB的运算压力,从而提高Spark On X的运算性能。HBase一般存储海量数据,单表可达千亿、万亿行数据,Spark On HBase的rowkey过滤字段下推到HBase,查询性能可达毫秒级别。


十.数据工作台

在线DB一般是业务系统连接DB的,但离线的作业与在线的平台不一样,需要提供Job的管理及离线定时运行,另外还需要支持交互式运行。在云HBase平台上,我们提供【数据工作台】来满足这一需求:

https://help.aliyun.com/document_detail/106531.html

数据工作台能力有:资源管理、作业管理、工作流、回话管理、交互式查询、及作业的告警。作业可以是Jar包、Python脚本、SQL脚本等;工作流可以把多个作业关联在一起,并可以周期性或者指定固定时间运行;回话管理可以启动一个在线的交互式Spark回话满足交互式查询的诉求;交互式查询可以满足在线运行SQL脚本、Python及Scala脚本。

image

image



十一.DBaaS

云HBase构建了一整套的管理系统,支持全球部署、监控报警(包括云监控及原生自带监控页面)、在线扩容、安全白名单、VPC网络隔离、在线修改配置、公网访问、小版本在线一键升级、分阶段低峰期MajorCompaction优化、自动检测集群可用状态紧急报警人工干预、磁盘容量水位报警等等运维操作及自动化优化。 平台提供7*24小时人工答疑及咨询,可直接咨询钉钉号 云HBase答疑。除此之外,打造了2大企业级特性,备份恢复、BDS服务。

  • 备份恢复
    HBase的数据也是客户的核心资产,为了保障客户的数据不被意外删除(经常是用户自己误删)时,我们内置了备份恢复的服务。此服务是直接独立于HBase内核,单独进程保障的。基本原理是全量数据拉HFile,增量数据拉Hlog。满足了数百TB数据的备份恢复,实时备份的延迟时间在数分钟以内。数据恢复可以满足按照时间点恢复,数百TB规模的集群基本在2天内完成恢复。不管是备份还是恢复都不影响原来的集群继续提供服务。其中细节点也较多,参考访问“
    云HBase备份恢复,为云HBase数据安全保驾护航”:

    https://yq.aliyun.com/articles/682894

  • BDS服务
    数据迁移是一个重的事项,尤其当类似如HBase数十TB数据的迁移。我们专门为云HBase打造数据迁移的服务,命名为BDS。此服务满足各类数据迁移及同步的场景,包括自建HBase集群迁移上阿里云HBase、跨地域迁移,例如从青岛机房迁移到北京机房、HBase1.x升级HBase2.x、网络环境经典网络切换成VPC等。


十二.后记

存储、检索、分析是BigData三大核心的能力,也是BigData NoSQL着力打造的核心能力,通过深度整合,更好解决客户风控、画像等数据驱动业务的问题。阿里云云HBase团队,基于云上环境的种种特性,打造了Native的众多优势,目前服务了数千家中小型企业。另外,为了服务中国广大的开发者,自从18年5月,发起成立了【中国HBase技术社区】,举办线下Meetup 9场次,邀请内外部嘉宾数十人,报名2801人,公众号1.1w人,直播观看2.1+w人,影响数万企业。特别为开发者提供免费版新人1个月的免费试用,以方便其开发学习以及交流:

https://promotion.aliyun.com/ntms/act/hbasefree.html

In the future, we will continue to closely match the needs of users on the cloud to polish products, build core competitiveness, improve ease of use, ensure system stability, and introduce serverless features to further reduce costs.

If not now, when? If not me, who?


image


Guess you like

Origin blog.51cto.com/15060465/2676909