HBase series (X) - HBase middle layer of SQL Phoenix

A, Phoenix Introduction

PhoenixSQL is an open source HBase intermediate layer, which allows you to use standard JDBC way to manipulate data on HBase. In the Phoenixbefore, if you want to access HBase, can only call it Java API, but compared to a single line of SQL data queries can be achieved, HBase API that is too complex. PhoenixThe idea is we put sql SQL back in NOSQLthat you can use standard SQL to complete the operation on HBase data. It also means you can integrate Spring Data JPAor Mybatisoperate HBase and other commonly used persistence frameworks.

Second, Phoenixperformance is very good, Phoenixthe query engine will SQL queries into one or more HBase Scan, by generating a standard JDBC parallel execution result set. It provides millisecond performance by directly using HBase API and the co-processor and custom filters, can query for small data, performance is provided second-level query million rows of data. Phoenix also has two characteristics while indexing HBase do not have, because of the above advantages, it Phoenixhas become the best SQL HBase intermediate layer.

Two, Phoenix installation

We can be installed according to the official installation instructions, official described as follows:

  • download and expand our installation tar
  • copy the phoenix server jar that is compatible with your HBase installation into the lib directory of every region server
  • restart the region servers
  • add the phoenix client jar to the classpath of your HBase client
  • download and setup SQuirrel as your SQL client so you can issue adhoc SQL against your HBase cluster

2.1 Download and unzip

The official version for Apache HBase and CDH version of the installation package is available, can be downloaded on demand. Official Download: http://phoenix.apache.org/download.html

# 下载
wget http://mirror.bit.edu.cn/apache/phoenix/apache-phoenix-4.14.0-cdh5.14.2/bin/apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz
# 解压
tar tar apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz

2.2 Copy Jar package

Follow the official document needs to be phoenix server jaradded to all Region Serversof the installation directory of the libdirectory.

Under this because I built a pseudo HBase cluster, you only need to copy the current machine HBase lib directory. If it is true cluster, use the scp command distributed to all Region Serversthe machines.

cp /usr/app/apache-phoenix-4.14.0-cdh5.14.2-bin/phoenix-4.14.0-cdh5.14.2-server.jar /usr/app/hbase-1.2.0-cdh5.15.2/lib

2.3 restart Region Servers

# 停止Hbase
stop-hbase.sh
# 启动Hbase
start-hbase.sh

2.4 Starting Phoenix

In the Phoenix unpacked directory binexecute the following command in the directory, you need to specify the address Zookeeper:

  • If HBase Standalone mode or using a pseudo-cluster model building, built using the default service Zookeeper, port 2181;
  • If it is HBase cluster model and the use of external Zookeeper cluster is specified according to their own situation.
# ./sqlline.py hadoop001:2181

2.5 Starting result

After starting the Phoenix into an interactive SQL command line, you can use !tableor !tablesview the list of all the current information

Three, Phoenix is ​​simple to use

3.1 Create a table

CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city));

The new table will be converted in accordance with specific rules for the table on HBase, information about the table can be viewed by Hbase Web UI:

3.2 Insert Data

Phoenix 中插入数据采用的是 UPSERT 而不是 INSERT,因为 Phoenix 并没有更新操作,插入相同主键的数据就视为更新,所以 UPSERT 就相当于 UPDATE+INSERT

UPSERT INTO us_population VALUES('NY','New York',8143197);
UPSERT INTO us_population VALUES('CA','Los Angeles',3844829);
UPSERT INTO us_population VALUES('IL','Chicago',2842518);
UPSERT INTO us_population VALUES('TX','Houston',2016582);
UPSERT INTO us_population VALUES('PA','Philadelphia',1463281);
UPSERT INTO us_population VALUES('AZ','Phoenix',1461575);
UPSERT INTO us_population VALUES('TX','San Antonio',1256509);
UPSERT INTO us_population VALUES('CA','San Diego',1255540);
UPSERT INTO us_population VALUES('TX','Dallas',1213825);
UPSERT INTO us_population VALUES('CA','San Jose',912332);

3.3 修改数据

-- 插入主键相同的数据就视为更新
UPSERT INTO us_population VALUES('NY','New York',999999);

3.4 删除数据

DELETE FROM us_population WHERE city='Dallas';

3.5 查询数据

SELECT state as "州",count(city) as "市",sum(population) as "热度"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;

3.6 退出命令

!quit

3.7 扩展

从上面的操作中可以看出,Phoenix 支持大多数标准的 SQL 语法。关于 Phoenix 支持的语法、数据类型、函数、序列等详细信息,因为涉及内容很多,可以参考其官方文档,官方文档上有详细的说明:

四、Phoenix Java API

因为 Phoenix 遵循 JDBC 规范,并提供了对应的数据库驱动 PhoenixDriver,这使得采用 Java 语言对其进行操作的时候,就如同对其他关系型数据库一样,下面给出基本的使用示例。

4.1 引入Phoenix core JAR包

如果是 maven 项目,直接在 maven 中央仓库找到对应的版本,导入依赖即可:

 <!-- https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-core -->
    <dependency>
      <groupId>org.apache.phoenix</groupId>
      <artifactId>phoenix-core</artifactId>
      <version>4.14.0-cdh5.14.2</version>
    </dependency>

如果是普通项目,则可以从 Phoenix 解压目录下找到对应的 JAR 包,然后手动引入:

4.2 简单的Java API实例

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;


public class PhoenixJavaApi {

    public static void main(String[] args) throws Exception {

        // 加载数据库驱动
        Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");

        /*
         * 指定数据库地址,格式为 jdbc:phoenix:Zookeeper 地址
         * 如果 HBase 采用 Standalone 模式或者伪集群模式搭建,则 HBase 默认使用内置的 Zookeeper,默认端口为 2181
         */
        Connection connection = DriverManager.getConnection("jdbc:phoenix:192.168.200.226:2181");

        PreparedStatement statement = connection.prepareStatement("SELECT * FROM us_population");

        ResultSet resultSet = statement.executeQuery();

        while (resultSet.next()) {
            System.out.println(resultSet.getString("city") + " "
                    + resultSet.getInt("population"));
        }

        statement.close();
        connection.close();
    }
}

结果如下:

实际的开发中我们通常都是采用第三方框架来操作数据库,如 mybatisHibernateSpring Data 等。关于 Phoenix 与这些框架的整合步骤参见下一篇文章:Spring/Spring Boot + Mybatis + Phoenix

参考资料

  1. http://phoenix.apache.org/

更多大数据系列文章可以参见 GitHub 开源项目大数据入门指南

Guess you like

Origin www.cnblogs.com/heibaiying/p/11416178.html