Table of contents
HBase integrates Phoenix
Introduction to Phoenix
Phoenix is an open source SQL skin for HBase. You canuse the standard JDBC API instead of the HBase client APIto create tables, insert data, and query HBase data
Advantages of using Phoenix
- Putting a Phoenix middle layer between Client and HBase won't slow things down
- Phoenix has a lot of optimization methods for user-entered SQL
Phoenix installation and deployment
1. Download and unzip
Official website address:Overview | Apache Phoenix
Download address:Phoenix Downloads | Apache Phoenix
After downloading the tar package, upload it to the server and use tar -zxvf
to decompress it
2.server package configuration
Enter the path where phoenix is installed:cd /opt/module/phoenix
Find the server package:
Copy it to the lib folder under the hbase installation path and synchronize it to all other nodes;
3. Configure environment variables
vim /etc/profile.d/my_env.sh
(Customized environment variable file)
Add the following:
#phoenix
export PHOENIX_HOME=/opt/module/phoenix
export PHOENIX_CLASSPATH=$PHOENIX_HOME
export PATH=$PATH:$PHOENIX_HOME/bin
4. Start phoenix
First you need to restart hbase;
Then start phoenix:
/opt/module/phoenix/bin/sqlline.py hadoop102,hadoop103,hadoop104:2181
The following interface appears, indicating that the startup is successful:
If the following error occurs:
警告: Failed to load history
java.lang.IllegalArgumentException: Bad history file syntax!
Indicate that you have used phoenix before and there is a history record. Just delete the .sqlline folder in the path of /home/用户名
;
My path is:/home/why/.sqlline
Phoenix Shell Operations
Use Grammar to view the official website:Grammar | Apache Phoenix
1.table
show all tables
!table 或 !tables
Create table
Specify a single column as RowKey:
CREATE TABLE IF NOT EXISTS student(
id VARCHAR primary key,
name VARCHAR,
age BIGINT,
addr VARCHAR);
In phoenix, table names, etc. will be automatically converted to uppercase. If you want to lowercase, use double quotes, such as "us_population".
Specify a union of multiple columns as RowKey:
CREATE TABLE IF NOT EXISTS student1 (
id VARCHAR NOT NULL,
name VARCHAR NOT NULL,
age BIGINT,
addr VARCHAR
CONSTRAINT my_pk PRIMARY KEY (id, name));
Note: Creating a table in Phoenix will create a corresponding table in HBase. In order to reduce the disk space occupied by data, Phoenix encodes column names in HBase by default. For specific rules, please refer to the official website link:
Storage Formats | Apache Phoenix
If you do not want to encode the column names, you can add COLUMN_ENCODED_BYTES = 0; at the end of the table creation statement
Insert data
upsert into student values('1001','zhangsan', 10, 'beijing');
Query data
select * from student;
select * from student where id='1001';
delete data
delete from student where id='1001';
Delete table
drop table student;
Exit command line
!quit
2. Table mapping
By default, tables that already exist in HBase are not visible through Phoenix. If you want to operate an existing table in HBase in Phoenix, you can map the table in Phoenix. There are two mapping methods:View mapping and table mapping
Create hbase table
Create table test in hbase shell:create 'test','info1','info2'
view mapping
Create the view mapping of test in phoenix:
create view "test"(id varchar primary key,"info1"."name" varchar, "info2"."address" varchar);
Note: The view created by Phoenix is read-only, so can only be used for queries, and data cannot be modified through the view. Operation
Insert two pieces of data into the table:
put 'test','10001','info1:name','why'
put 'test','10001','info2:address','10086'
Query in phoenix:
select * from "test"
Note: The test here must be enclosed in double quotes, otherwise it will be recognized as a table instead of a view;
The query results are as follows:
How to delete a view:
drop view "test";
Deleting the view will not have any impact on the table in hbase. After deleting the view, you can still query the data in the table in hbase:
table mapping
create table"test"(id varchar primary key,"info1"."name" varchar, "info2"."address" varchar) column_encoded_bytes=0;
When performing table mapping, column name encoding cannot be used and column_encoded_bytes needs to be set to 0
Create a table in Pheonix to map the table that already exists in HBase. You canmodify and delete the data that already exists in HBase. Moreover, Delete the table in Phoenix, then the mapped table in HBase will also be deleted
First query the data in the table:
Then delete the table mapping:
drop table "test";
If you query hbase again, you will find that the original table has also been deleted:
3. Number type description
Numbers in HBase are stored as complement code, while numbers in Phoenix are stored as On the basis of complement code, invert the sign bit. Therefore, when you create a table in Phoenix to map an existing table in HBase, and there are numeric fields in HBase, parsing errors will occur
test
Create a table in hbase, insert data and scan:
create 'test_number','info'
put 'test_number','1001','info:number',Bytes.toBytes(1000)
scan 'test_number',{COLUMNS => 'info:number:toLong'}
The result is as follows:
The function of toLong is to convert bytes into long type data
Otherwise, the scanned data format is like this:
Create table mapping in phoenix:
create view "test_number"(id varchar primary key,"info"."number" bigint);
After querying, it was found that there was a problem with the results:
Solution
1. Use unsigned types:
Phoenix provides unsigned_int, unsigned_long and other unsigned types. Its encoding and decoding method for numbers is the same as HBase. If there is no need to consider negative numbers, then using unsigned types is the most appropriate choice when building tables in Phoenix.
Re-create the view map and query:
create view "test_number"(id varchar primary key,"info"."number" unsigned_long);
select * from "test_number";
The result is as follows:
2. Custom function:
If you need to consider negative numbers, you can use Phoenix custom function to invert the highest bit of the number type, that is, the sign bit< /span>
Phoenix JDBC operations
Add dependencies:
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-client-hbase-2.4</artifactId>
<version>5.1.2</version>
</dependency>
Write standard jdbc code:
public static void main(String[] args) throws SQLException {
//创建连接
String url = "jdbc:phoenix:hadoop102,hadoop103,hadoop104:2181";
//创建配置
Properties properties = new Properties();
//获取连接
Connection connection = DriverManager.getConnection(url, properties);
//编译sql语句
PreparedStatement preparedStatement = connection.prepareStatement("select * from student");
//执行语句
ResultSet resultSet = preparedStatement.executeQuery();
//输出结果
while (resultSet.next())
{
System.out.println(resultSet.getString(1) + ":" + resultSet.getString(2) + ":" + resultSet.getString(3));
}
connection.close();
//由于 Phoenix 框架内部需要获取一个 HBase 连接,所以会延迟关闭
System.out.println("hello");
}
Phoenix secondary index
Add the following configuration to hbase-site.xml of the HRegionserver node of HBase:
<!-- phoenix regionserver 配置参数-->
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
global index
Global Index is the default index format. When creating a global index, will create a new table in HBase. That is to sayindex data and data table are stored in different tables, so the global index is suitable for Business scenarios of reading more and writing less
Writing data will consume a lot of overhead, because the index table also needs to be updated, and the index table is distributed on different data nodes. Cross-node data transmission brings large performance consumption; when reading data, Phoenix will choose Index tables to reduce query time
grammar:
- Create index:
CREATE INDEX my_index ON my_table (my_col);
- Delete index:
DROP INDEX my_index ON my_table
Example: Add an index to the age column
create index my_index on student(age);
Check whether the secondary index is valid
View via explain syntax:
explain select age from student where age = 10;
After adding a secondary index, it will become a range scan;
But if the queried field is not an index field, it will become a global scan:
explain select id,name,addr from student where age = 10;
covered index
Createa global index carrying other fields (essentially still a global index)
grammar:CREATE INDEX my_index ON my_table (v1) INCLUDE (v2);
Example: Add an index to the age column and include the addr column
create index my_index on student(age) include (addr);
View the execution plan:
local index
Local Index is suitable for scenarios where write operations are frequent.
Index data and data table data are stored in the same table (and in the same Region), which avoids the additional overhead of writing indexes to index tables on different servers during write operations.
grammar:CREATE LOCAL INDEX my_index ON my_table(my_column);
Create a local index:
CREATE LOCAL INDEX my_index ON student(age,addr);
View the execution plan:
explain select id,name,addr from student where age = 10;