A, Phoenix Introduction
Phoenix
SQL is an open source HBase intermediate layer, which allows you to use standard JDBC way to manipulate data on HBase. In the Phoenix
before, if you want to access HBase, can only call it Java API, but compared to a single line of SQL data queries can be achieved, HBase API that is too complex. Phoenix
The idea is we put sql SQL back in NOSQL
that you can use standard SQL to complete the operation on HBase data. It also means you can integrate Spring Data JPA
or Mybatis
operate HBase and other commonly used persistence frameworks.
Second, Phoenix
performance is very good, Phoenix
the query engine will SQL queries into one or more HBase Scan, by generating a standard JDBC parallel execution result set. It provides millisecond performance by directly using HBase API and the co-processor and custom filters, can query for small data, performance is provided second-level query million rows of data. Phoenix also has two characteristics while indexing HBase do not have, because of the above advantages, it Phoenix
has become the best SQL HBase intermediate layer.
Two, Phoenix installation
We can be installed according to the official installation instructions, official described as follows:
- download and expand our installation tar
- copy the phoenix server jar that is compatible with your HBase installation into the lib directory of every region server
- restart the region servers
- add the phoenix client jar to the classpath of your HBase client
- download and setup SQuirrel as your SQL client so you can issue adhoc SQL against your HBase cluster
2.1 Download and unzip
The official version for Apache HBase and CDH version of the installation package is available, can be downloaded on demand. Official Download: http://phoenix.apache.org/download.html
# 下载
wget http://mirror.bit.edu.cn/apache/phoenix/apache-phoenix-4.14.0-cdh5.14.2/bin/apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz
# 解压
tar tar apache-phoenix-4.14.0-cdh5.14.2-bin.tar.gz
2.2 Copy Jar package
Follow the official document needs to be phoenix server jar
added to all Region Servers
of the installation directory of the lib
directory.
Under this because I built a pseudo HBase cluster, you only need to copy the current machine HBase lib directory. If it is true cluster, use the scp command distributed to all Region Servers
the machines.
cp /usr/app/apache-phoenix-4.14.0-cdh5.14.2-bin/phoenix-4.14.0-cdh5.14.2-server.jar /usr/app/hbase-1.2.0-cdh5.15.2/lib
2.3 restart Region Servers
# 停止Hbase
stop-hbase.sh
# 启动Hbase
start-hbase.sh
2.4 Starting Phoenix
In the Phoenix unpacked directory bin
execute the following command in the directory, you need to specify the address Zookeeper:
- If HBase Standalone mode or using a pseudo-cluster model building, built using the default service Zookeeper, port 2181;
- If it is HBase cluster model and the use of external Zookeeper cluster is specified according to their own situation.
# ./sqlline.py hadoop001:2181
2.5 Starting result
After starting the Phoenix into an interactive SQL command line, you can use !table
or !tables
view the list of all the current information
Three, Phoenix is simple to use
3.1 Create a table
CREATE TABLE IF NOT EXISTS us_population (
state CHAR(2) NOT NULL,
city VARCHAR NOT NULL,
population BIGINT
CONSTRAINT my_pk PRIMARY KEY (state, city));
The new table will be converted in accordance with specific rules for the table on HBase, information about the table can be viewed by Hbase Web UI:
3.2 Insert Data
Phoenix 中插入数据采用的是 UPSERT
而不是 INSERT
,因为 Phoenix 并没有更新操作,插入相同主键的数据就视为更新,所以 UPSERT
就相当于 UPDATE
+INSERT
UPSERT INTO us_population VALUES('NY','New York',8143197);
UPSERT INTO us_population VALUES('CA','Los Angeles',3844829);
UPSERT INTO us_population VALUES('IL','Chicago',2842518);
UPSERT INTO us_population VALUES('TX','Houston',2016582);
UPSERT INTO us_population VALUES('PA','Philadelphia',1463281);
UPSERT INTO us_population VALUES('AZ','Phoenix',1461575);
UPSERT INTO us_population VALUES('TX','San Antonio',1256509);
UPSERT INTO us_population VALUES('CA','San Diego',1255540);
UPSERT INTO us_population VALUES('TX','Dallas',1213825);
UPSERT INTO us_population VALUES('CA','San Jose',912332);
3.3 修改数据
-- 插入主键相同的数据就视为更新
UPSERT INTO us_population VALUES('NY','New York',999999);
3.4 删除数据
DELETE FROM us_population WHERE city='Dallas';
3.5 查询数据
SELECT state as "州",count(city) as "市",sum(population) as "热度"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;
3.6 退出命令
!quit
3.7 扩展
从上面的操作中可以看出,Phoenix 支持大多数标准的 SQL 语法。关于 Phoenix 支持的语法、数据类型、函数、序列等详细信息,因为涉及内容很多,可以参考其官方文档,官方文档上有详细的说明:
语法 (Grammar) :https://phoenix.apache.org/language/index.html
函数 (Functions) :http://phoenix.apache.org/language/functions.html
数据类型 (Datatypes) :http://phoenix.apache.org/language/datatypes.html
序列 (Sequences) :http://phoenix.apache.org/sequences.html
联结查询 (Joins) :http://phoenix.apache.org/joins.html
四、Phoenix Java API
因为 Phoenix 遵循 JDBC 规范,并提供了对应的数据库驱动 PhoenixDriver
,这使得采用 Java 语言对其进行操作的时候,就如同对其他关系型数据库一样,下面给出基本的使用示例。
4.1 引入Phoenix core JAR包
如果是 maven 项目,直接在 maven 中央仓库找到对应的版本,导入依赖即可:
<!-- https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-core -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.0-cdh5.14.2</version>
</dependency>
如果是普通项目,则可以从 Phoenix 解压目录下找到对应的 JAR 包,然后手动引入:
4.2 简单的Java API实例
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
public class PhoenixJavaApi {
public static void main(String[] args) throws Exception {
// 加载数据库驱动
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
/*
* 指定数据库地址,格式为 jdbc:phoenix:Zookeeper 地址
* 如果 HBase 采用 Standalone 模式或者伪集群模式搭建,则 HBase 默认使用内置的 Zookeeper,默认端口为 2181
*/
Connection connection = DriverManager.getConnection("jdbc:phoenix:192.168.200.226:2181");
PreparedStatement statement = connection.prepareStatement("SELECT * FROM us_population");
ResultSet resultSet = statement.executeQuery();
while (resultSet.next()) {
System.out.println(resultSet.getString("city") + " "
+ resultSet.getInt("population"));
}
statement.close();
connection.close();
}
}
结果如下:
实际的开发中我们通常都是采用第三方框架来操作数据库,如 mybatis
,Hibernate
,Spring Data
等。关于 Phoenix 与这些框架的整合步骤参见下一篇文章:Spring/Spring Boot + Mybatis + Phoenix
参考资料
更多大数据系列文章可以参见 GitHub 开源项目: 大数据入门指南