Phoenix Compatibility | In-depth exploration of the story behind Lindorm compatibility with PhoenixSQL

User benefits

Alibaba Cloud recently released Lindorm, the industry’s first cloud-native multi-mode database. New users can enjoy a discount of 9.9 yuan/3 months. Technical exchange nail group: 35977898, please refer to the link for more content

1. Background

  As a semi-structured and structured storage system for big data scenarios, Lindorm has been developed in Alibaba for nearly ten years, and has always maintained rapid capability updates and technology upgrades. It is currently one of the core database products supporting Alibaba's economy business. . Many of its innovations in terms of function, performance, stability, etc. have undergone long-term large-scale practice tests, and have been fully applied to various business sectors such as Ali Group, Ant Group, Cainiao, and Grand Entertainment, and have become the company's internal data volume so far. The largest and most extensive database product. With the advent of the cloud-native and 5G/IoT era, the scale of customer data and application requirements continue to grow. In order to better serve customers, the Alibaba Cloud NoSQL database team integrates the previous technology accumulation of Lindorm and TSDB to release a cloud-native multi-mode database Lindorm integrates four models of wide table engine, timing engine, search engine, and file engine. It supports low-cost storage and processing and adaptive elastic scaling of data of various types and sizes. It serves the Internet, IoT, Internet of Vehicles, advertising, social networking, Scenarios such as monitoring, gaming, and risk control make corporate data "affordable and visible". For the overall structure of Lindorm cloud native multi-mode database and the thinking behind it, please refer to "Surviving, Visible-Cloud Native Multi-mode Database Lindorm Technology Analysis".
image
  On Alibaba Cloud, the standard version of HBase and Phoenix are hosted, and an enhanced version of HBase (the predecessor of the Lindorm wide table engine) is launched for the standard version of HBase. The performance has been greatly improved. For details, please refer to "Lindorm/HBase Enhanced Edition Technical Decryption| How does Ali's new generation database support 700 million requests per second? ". However, the enhanced version series do not have Phoenix-compatible products. Many customers cannot open Phoenix SQL services after choosing the enhanced version of HBase, which is a little regrettable. In order to fill this product gap and provide customers with a better product experience, Lindorm decided to be compatible with Phoenix. At present, Lindorm compatible Phoenix products have been officially released. For the usage method, please refer to using PhoenixSQL Java API to access Lindorm. This article mainly discusses the story behind Lindorm's compatibility with Phoenix.

2.Phoenix introduction

  Phoenix is ​​an HBase plug-in developed by James Taylor of Salesforce. It is dedicated to "put the SQL back in NoSQL" to improve the HBase experience while giving HBase OLTP and lightweight OLAP capabilities.
  Phoenix's position in the big data system is shown in the figure:
image
  Through Phoenix, users can use HBase like MySQL. Based on the standard JDBC interface, it can seamlessly connect with frameworks such as Mybatis and Spring. The framework can automatically generate SQL statements to further improve development efficiency.

2.1, Phoenix function

2.1.1 Rich syntax

  Phoenix SQL syntax follows the ANSI SQL-92 standard, has rich syntax features, and supports group by/order by/join/subquery/function and other functions. For details, please refer to Phoenix official website syntax introduction.

  Based on PhoenixSQL, you can easily express complex queries, such as an order table Join:

SELECT ItemName, O.OrderValue
FROM Items
JOIN
   (SELECT ItemID, sum(Price * Quantity) AS OrderValue
    FROM Orders
    WHERE CustomerID > 'C002'
    GROUP BY ItemID) AS O
ON Items.ItemID = O.ItemID;

2.1.2 Convenient operation

  Phoenix also provides MySQL-like Sqlline command line and Squirrel graphical interface tools to facilitate daily debugging and operation and maintenance management, so that users who are familiar with SQL databases can use it without any sense of contradiction.

2.2 The value of Phoenix

2.2.1 Phoenix is ​​the fastest real-time SQL engine on HBase

  为什么说Phoenix是HBase上最快的实时SQL引擎?我们先回顾一下大数据SQL技术的发展历程:
  从goolge三架马车GFS、BigTable、MapReduce开始,大数据技术开始蓬勃发展。开源社区相继推出GFS的开源版本HDFS,BigTable的开源版本HBase,以及MapReduce的开源版本Hadoop。大数据SQL引擎就是在这些基础上不断发展:
  1. 最初的开源SQL实现是Apache Hive,采取SQL on Hadoop的思路,将SQL转化为MapReduce,中间结果写入HDFS。其优势是适合批量处理,但是大量的中间结果写入HDFS导致实时性比较差。
  2. 为了解决中间结果写HDFS速度慢的问题,出现了很多产品,比如Google Dremel(不开源), 其开源产品是Apache Drill, 另外还有Pivotal HAWQ(不开源),Cloudera Impala等。主要思路是将Hive的MapReduce替换成内存计算,同时也能提供插件对接其他存储引擎。
  3. UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室) 开源的Spark也是将Hive的MapReduce替换成内存计算,不过实现上有些许差异。其采用RDD将数据分成小的片断进行计算,处理了任务容错等问题。同时采用小批来模拟实时,实现了流批一体。
  4. Yandex ClickHouse面向分析领域,推出列式存储,其高压缩率和向量化引擎极大地降低了存储成本,提升了计算性能,主要面向用户行为分析等领域。
  5. eBay Kylin、Apache Druid通过预聚合,提前计算出结果,空间换时间,用于提升查询性能。其中Druid主要用于时序领域。
  6. Facebook Presto,主要解决异构数据的联邦查询问题,提供了丰富的connector,可对接上百种数据库产品,主要面向数据湖分析领域。
  7. Phoenix基于HBase,充分利用HBase的Coprocessor能力,实现了二级索引;通过MPP并行执行,实现了毫秒级响应的交互式体验;另外其无状态的QueryServer设计,避免了Presto等系统Cordinator导致的并发度低的问题。
  从上面可以看出,HBase上的SQL引擎可以有多种实现,比如Hive on HBase,Impala on HBase,Spark on HBase等,但是Hive on HBase无法进行谓词下推,Impala on HBase无法利用Coprocessor进行计算下推,因此性能相比Phoenix要差很多。下面是Phoenix官网的性能对比数据:
image
image
  另外Spark SQL需要往yarn提交job,启动时间较长,适合执行大运算,不适合高并发实时查询。Spark Streaming用于处理实时数据流,适合ETL场景,不适合实时查询。
  因此要在HBase上实现高并发实时SQL查询,Phoenix是首选。

2.2.2 Phoenix适合的场景

HBase凭借其高性能,低成本的优势,配合Phoenix,适合于海量数据的存储与分析场景:
image

2.2.3 Phoenix在阿里云上的使用

Phoenix在阿里云上被广泛使用,据统计,阿里云上的标准版HBase用户一半以上都开通了Phoenix SQL服务。

2.2.4 Phoenix在阿里内部的使用案例

2.2.4.1 用户案例1: 移动数据分析 Quick A+

image

2.2.4.2 用户案例2: 蚂蚁离线搜索系统

image

3.Lindorm为什么要兼容Phoenix?

Phoenix凭借其丰富的功能,出色的性能以及完善的生态,有着广泛的群众基础。Lindorm团队凭借其在HBase领域多年的经验积累,有能力让Phoenix的性能更上一层楼,为用户提供更好的服务,同时也能完善Lindorm产品线,弥补增强版HBase无法开通SQL的缺憾。

4.Lindorm如何兼容Phoenix?

  总体架构如图所示,Lindorm采用了无状态的QueryServer设计,PhoenixSQL API与QueryServer之间通过Avatica协议来通信。通过兼容Avatica协议,实现了Phoenix接口的协议级兼容。
image
  Avatica基于Jetty和ProtocolBuffer来实现,用HTTP协议实现了标准JDBC接口,支持.NET/Go/Java/Python/JavaScript等多语言访问。
image
  轻量级的PhoenixSQL API,将计算下沉到QueryServer,降低了客户端的资源消耗。同时无状态的QueryServer使得计算层与存储层解耦,实现了计算层与存储层的独立扩缩容,能对外提供高并发的SQL读写能力。

5.Lindorm兼容Phoenix后有什么收益?

  如前文提到的,Lindorm兼容Phoenix完善了Lindorm产品线,对用户而言,最主要是提升了性能。通过将HBase内核升级为Lindorm内核,以及将二级索引实现替换为Lindorm原生二级索引,性能得到大幅提升。
  下图是LindormSQL二级索引与Phoenix二级索引性能对比:
image

6.Lindorm的适用场景

  Lindorm适用于轻量级分析,提供实时的交互式查询体验。下面是与Spark的对比:

image

  另外Lindorm具备多Zone实时同步和容灾功能,通过在备集群对接Spark进行离线分析,可以实现同一份数据的在离线一体化,省去用户将数据同步到其他系统的烦恼。

image

7.小结

  Lindorm is compatible with Phoenix, which lowers the threshold for users and improves performance. For stock Phoenix users, they can smoothly switch to Lindorm, which becomes a better choice in the cloud-native era.
  For new users who are accustomed to the relational model, MySQL used to be the best choice because of its easy installation and simple use. In contrast, the deployment of a NoSQL database is complicated, and the installation of many components makes many developers prohibitive. Nowadays, with the advent of the cloud-native era, NoSQL databases are fully hosted on the cloud, ready to use out of the box, and you can start using it with the click of a button. Using Lindorm, you will also get the same experience as MySQL, and have more advantages in storage cost, scalability, flexibility, etc. It is very suitable for the big data characteristics of the Internet and Internet of Things business, and can be used as a new application in database & storage selection The first choice.

To experience PhoenixSQL, please refer to using PhoenixSQL Java API to access Lindorm. For free consultation, please join Lindorm technical exchange group




Guess you like

Origin blog.51cto.com/15060465/2675790