Phoenix actual combat | Phoenix ODPSBulkLoadTool use case

1. Create ODPS table

create table hbaseport.odps_test (
key string,
value1 string,
value2 bigint);


2. Configure MR cluster to access cloud HBASE environment

  1. Open cloud HBase HDFS port

  2. Configure hdfs-site.xml so that it can access hdfs of cloud HBASE HA, please refer to here for details

  3. Configure hbase-site.xml file to access cloud HBASE

Create a temporary conf directory on the MR cluster, and add it to the classpath when the command is run through the --config option when executing the hadoop command or the yarn command. The directory includes the following:

ls conf/
core-site.xml  hbase-site.xml  hdfs-site.xml  
mapred-site.xml  yarn-site.xml


3. Create a Phoenix test table

DROP TABLE IF EXISTS TABLE1;

CREATE TABLE TABLE1 (
ID VARCHAR NOT NULL PRIMARY KEY,
V1 VARCHAR
V2 BIGINT)
SALT_BUCKETS = 10,UPDATE_CACHE_FREQUENCY = 120000;

CREATE INDEX V1_IDX on TABLE1(V1) include(v2);
CREATE INDEX V2_IDX on TABLE1(V2) include(v1);

4. Import test data to ODSP table

Import odps300w data via csv


5. Execute Bulkload command

Use the client jar provided by Phoenix to run the Bulkload command:

yarn --config  conf  \
jar ali-phoenix-4.12.0-AliHBase-1.1-0.4-Final/ali-phoenix-4.12.0-AliHBase-1.1-0.4-Final-client.jar \
org.apache.phoenix.mapreduce.ODPSBulkLoadTool \
--table "TABLE1" \
--access_id "xxx" \
--access_key "xxx" \
--odps_url  "http://odps-ext.aliyun-inc.com/api" \
--odps_tunnel_url "http://dt-ext.odps.aliyun-inc.com" \
--odps_project "hbaseport" \
--odps_table "odps_test" \
--odps_partition_number 15  \
--zookeeper "zk1,zk2,zk3" \
--output "hdfs://emr-cluster/tmp/tmp_data"


6. Verification

Phoenix table data verification

0: jdbc:phoenix:localhost> select count(*) from TABLE1;
+-----------+
| COUNT(1)  |
+-----------+
3124856   |
+-----------+
1 row selected (4.618 seconds)
0: jdbc:phoenix:localhost> select count(*) from V1_IDX;
+-----------+
| COUNT(1)  |
+-----------+
3124856   |
+-----------+
1 row selected (3.149 seconds)
0: jdbc:phoenix:localhosts> select count(*) from V2_IDX;
+-----------+
| COUNT(1)  |
+-----------+
3124856   |
+-----------+
1 row selected (4.386 seconds)


image



Guess you like

Origin blog.51cto.com/15060465/2676999