HBase study notes (3) - HBase integrates Phoenix

Table of contents

Phoenix Shell Operations

Phoenix JDBC operations

Phoenix secondary index

HBase integrates Phoenix

Introduction to Phoenix

Phoenix is ​​an open source SQL skin for HBase. You canuse the standard JDBC API instead of the HBase client APIto create tables, insert data, and query HBase data

Advantages of using Phoenix

  1. Putting a Phoenix middle layer between Client and HBase won't slow things down
  2. Phoenix has a lot of optimization methods for user-entered SQL

Phoenix installation and deployment

1. Download and unzip

Official website address:Overview | Apache Phoenix

Download address:Phoenix Downloads | Apache Phoenix

After downloading the tar package, upload it to the server and use tar -zxvf to decompress it

2.server package configuration

Enter the path where phoenix is ​​installed:cd /opt/module/phoenix

Find the server package:

Copy it to the lib folder under the hbase installation path and synchronize it to all other nodes;

3. Configure environment variables

vim /etc/profile.d/my_env.sh(Customized environment variable file)

Add the following:

#phoenix
export PHOENIX_HOME=/opt/module/phoenix
export PHOENIX_CLASSPATH=$PHOENIX_HOME
export PATH=$PATH:$PHOENIX_HOME/bin

4. Start phoenix

First you need to restart hbase;

Then start phoenix:

/opt/module/phoenix/bin/sqlline.py hadoop102,hadoop103,hadoop104:2181

The following interface appears, indicating that the startup is successful:

If the following error occurs:

警告: Failed to load history
java.lang.IllegalArgumentException: Bad history file syntax! 

Indicate that you have used phoenix before and there is a history record. Just delete the .sqlline folder in the path of /home/用户名;

My path is:/home/why/.sqlline

Phoenix Shell Operations

Use Grammar to view the official website:Grammar | Apache Phoenix

1.table

show all tables

!table!tables

Create table

Specify a single column as RowKey:

CREATE TABLE IF NOT EXISTS student( 
  id VARCHAR primary key, 
  name VARCHAR, 
  age BIGINT, 
  addr VARCHAR);

In phoenix, table names, etc. will be automatically converted to uppercase. If you want to lowercase, use double quotes, such as "us_population".

Specify a union of multiple columns as RowKey:

CREATE TABLE IF NOT EXISTS student1 (
id VARCHAR NOT NULL,
name VARCHAR NOT NULL,
age BIGINT,
addr VARCHAR
CONSTRAINT my_pk PRIMARY KEY (id, name));

Note: Creating a table in Phoenix will create a corresponding table in HBase. In order to reduce the disk space occupied by data, Phoenix encodes column names in HBase by default. For specific rules, please refer to the official website link:

Storage Formats | Apache Phoenix

If you do not want to encode the column names, you can add COLUMN_ENCODED_BYTES = 0; at the end of the table creation statement

Insert data
upsert into student values('1001','zhangsan', 10, 'beijing');
Query data
select * from student;
select * from student where id='1001';
delete data
delete from student where id='1001';
Delete table
drop table student;
Exit command line

!quit

2. Table mapping

By default, tables that already exist in HBase are not visible through Phoenix. If you want to operate an existing table in HBase in Phoenix, you can map the table in Phoenix. There are two mapping methods:View mapping and table mapping

Create hbase table

Create table test in hbase shell:create 'test','info1','info2'

view mapping

Create the view mapping of test in phoenix:

create view "test"(id varchar primary key,"info1"."name" varchar, "info2"."address" varchar);

Note: The view created by Phoenix is ​​read-only, so can only be used for queries, and data cannot be modified through the view. Operation

Insert two pieces of data into the table:

put 'test','10001','info1:name','why'
put 'test','10001','info2:address','10086'

Query in phoenix:

select * from "test"

Note: The test here must be enclosed in double quotes, otherwise it will be recognized as a table instead of a view;

The query results are as follows:


How to delete a view:

drop view "test";

Deleting the view will not have any impact on the table in hbase. After deleting the view, you can still query the data in the table in hbase:

table mapping
create table"test"(id varchar primary key,"info1"."name" varchar, "info2"."address" varchar) column_encoded_bytes=0;

When performing table mapping, column name encoding cannot be used and column_encoded_bytes needs to be set to 0

Create a table in Pheonix to map the table that already exists in HBase. You canmodify and delete the data that already exists in HBase. Moreover, Delete the table in Phoenix, then the mapped table in HBase will also be deleted

First query the data in the table:

Then delete the table mapping:

drop table "test";

If you query hbase again, you will find that the original table has also been deleted:

3. Number type description

Numbers in HBase are stored as complement code, while numbers in Phoenix are stored as On the basis of complement code, invert the sign bit. Therefore, when you create a table in Phoenix to map an existing table in HBase, and there are numeric fields in HBase, parsing errors will occur

test

Create a table in hbase, insert data and scan:

create 'test_number','info'
put 'test_number','1001','info:number',Bytes.toBytes(1000)
scan 'test_number',{COLUMNS => 'info:number:toLong'}

The result is as follows:

The function of toLong is to convert bytes into long type data

Otherwise, the scanned data format is like this:

Create table mapping in phoenix:

create view "test_number"(id varchar primary key,"info"."number" bigint);

After querying, it was found that there was a problem with the results:

Solution

1. Use unsigned types:

Phoenix provides unsigned_int, unsigned_long and other unsigned types. Its encoding and decoding method for numbers is the same as HBase. If there is no need to consider negative numbers, then using unsigned types is the most appropriate choice when building tables in Phoenix.

Re-create the view map and query:

create view "test_number"(id varchar primary key,"info"."number" unsigned_long);
select * from "test_number";

The result is as follows:

2. Custom function:
If you need to consider negative numbers, you can use Phoenix custom function to invert the highest bit of the number type, that is, the sign bit< /span>

Phoenix JDBC operations

Add dependencies:

<dependency>
  <groupId>org.apache.phoenix</groupId>
  <artifactId>phoenix-client-hbase-2.4</artifactId>
  <version>5.1.2</version>
</dependency>

Write standard jdbc code:

public static void main(String[] args) throws SQLException {
    //创建连接
    String url = "jdbc:phoenix:hadoop102,hadoop103,hadoop104:2181";
    //创建配置
    Properties properties = new Properties();
    //获取连接
    Connection connection = DriverManager.getConnection(url, properties);
    //编译sql语句
    PreparedStatement preparedStatement = connection.prepareStatement("select * from student");
    //执行语句
    ResultSet resultSet = preparedStatement.executeQuery();
    //输出结果
    while (resultSet.next())
    {
        System.out.println(resultSet.getString(1) + ":" + resultSet.getString(2) + ":" + resultSet.getString(3));
    }

    connection.close();
    //由于 Phoenix 框架内部需要获取一个 HBase 连接,所以会延迟关闭
    System.out.println("hello");
}

Phoenix secondary index

Add the following configuration to hbase-site.xml of the HRegionserver node of HBase:

<!-- phoenix regionserver 配置参数-->
<property>
  <name>hbase.regionserver.wal.codec</name>
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

global index

Global Index is the default index format. When creating a global index, will create a new table in HBase. That is to sayindex data and data table are stored in different tables, so the global index is suitable for Business scenarios of reading more and writing less

Writing data will consume a lot of overhead, because the index table also needs to be updated, and the index table is distributed on different data nodes. Cross-node data transmission brings large performance consumption; when reading data, Phoenix will choose Index tables to reduce query time

grammar:

  • Create index:CREATE INDEX my_index ON my_table (my_col);
  • Delete index:DROP INDEX my_index ON my_table

Example: Add an index to the age column

create index my_index on student(age); 
Check whether the secondary index is valid

View via explain syntax:

explain select age from student where age = 10;

After adding a secondary index, it will become a range scan;

But if the queried field is not an index field, it will become a global scan:

explain select id,name,addr from student where age = 10;

covered index

Createa global index carrying other fields (essentially still a global index)

grammar:CREATE INDEX my_index ON my_table (v1) INCLUDE (v2);

Example: Add an index to the age column and include the addr column

create index my_index on student(age) include (addr);

View the execution plan:

local index

Local Index is suitable for scenarios where write operations are frequent.

Index data and data table data are stored in the same table (and in the same Region), which avoids the additional overhead of writing indexes to index tables on different servers during write operations.

grammar:CREATE LOCAL INDEX my_index ON my_table(my_column);

Create a local index:

CREATE LOCAL INDEX my_index ON student(age,addr);

View the execution plan:

explain select id,name,addr from student where age = 10;

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/134399501