Phoenix on Hbase Application Basics

Create a table directly with Phoenix (recommended, build a table directly through Phoenix, you cannot use bulkload, and subsequent backup and restoration are inconvenient)

CREATE TABLE NFT:T_COLLECTION_TEST (

a_key VARCHAR PRIMARY KEY, a_col VARCHAR) SALT_BUCKETS = 20;

SALT_BUCKETS pre-partition field, in order to solve the hotspot region problem; the range is 1-256.

Tables that use pre-partitioning cannot use hbase api to insert data, but use Phoenix API. SALT_BUCKETS can mainly improve concurrent read and write performance.

The Hbase method (not recommended) is suitable for writing and modifying through the Hbase API, but this method can be used for stock data.

first step:

Hbase table creation

create 'table name', {NAME=>'column family', VERSION=>maximum number of versions (generally set to 1), [MIN_VERSIONS=>minimum number of versions], [TTL=Set version survival time in seconds] ,BLOOMFILTER='ROW'or'ROWCOL'} 

hbase(main):>create 't1',{NAME =>'f1', COMPRESSION => 'SNAPPY', VERSIONS => 1,  BLOOMFILTER => 'ROW'},{SPLITS => ['00e1fb4fb0674d24e5a6579f8fe32060','1d148c39d210c886dc02e8468f89a539']}

//It also supports compression formats such as lzo and snappy. splits are split points for pre-partitioning. The number of partitions is set according to the data size (row/column). The collection of split points needs to be considered as rowkey data

Step 2: Phoenix view, which can only be queried but not modified

CREATE view "NFT:T_COLLECTION_TEST"(
"ROW_KEY" varchar primary key,
"info"."chain_id" varchar,
"info"."address" varchar,
"info"."created_at" varchar
)COLUMN_ENCODED_BYTES = 0;   --映射Hbase历史数据


Other choices in the second step are the Phoenix mapping table (deleting the table will delete the Hbase data synchronously, choose carefully)

CREATE table "NFT:T_COLLECTION_TEST"(
"ROW_KEY" varchar primary key,
"info"."chain_id" varchar,
"info"."address" varchar,
"info"."created_at" varchar
)COLUMN_ENCODED_BYTES = 0;   --映射Hbase历史数据

Tips View partitions: scan 'hbase:meta',{FILTER=>"PrefixFilter('table_name')"}

Index: (It is recommended to build an index before writing data)

The more difficult areas are local indexes and global indexes:

Simple definition: Local indexes are preferred for those with small amount of data and frequent updates. Global indexes are preferred for those with a large amount of data and low update frequency;

The difference and comparison between global index and local index To put it
bluntly: the global index is a table, which is suitable for the scenario of heavy reading and light writing, and a separate table will be written. The
               local index is a column family, which is suitable for the scenario of heavy writing and light reading.
1. Index data

The global index stores the index data in a table alone, which ensures the security of the original data. The intrusive
local index writes the data into the original data, which is highly intrusive. The data volume of the original table = original data + index data, so that Raw data is bigger
2. Performance

The global index needs to write an extra piece of data, and the writing pressure is a little higher, but the reading speed is very fast, and the index also needs to be partitioned. The table mapped with HBAE needs to manually specify the number of partitions.
Local index only needs to write one index data, which saves a lot of space, but there is an extra step to find data through rowkey, the writing speed is very fast, and the reading speed is not as fast as directly fetching its own column family data.

Create an index online:

CREATED INDEX IDX_SLUG_RELATION_DISTINCT

ON NFT.T_COLLECTION_SLUG_RELATION_INFO("info"."CHAIN_ID","info"."COLLECTION_ADDRESS","info"."SLUG","info"."EXTERNAL_SLUG")

INCLUDE("info"."CHAIN_ID","info"."COLLECTION_ADDRESS");  -- 可选 

Specify the number of partitions for the index
(when using Phoenix to build a table, the default number of partitions is consistent with the table when the index does not specify the number of pre-partitions)
CREATE INDEX idx_name ON TebleName(col1,col2) SALT_BUCKETS=20;

Create indexes offline and asynchronously:

1) create index IDX_T_COLLECTION_SLUG_RELATION_INFO_TEST on NFT.T_COLLECTION_SLUG_RELATION_INFO_TEST(col1, col2) async
2) MR task: hbase org.apache.phoenix.mapreduce.index.IndexTool --schema NFT --data-table T_COLLECTION_SLUG_RE LATION_INFO_TEST --index-table IDX_T_COLLECTION_SLUG_RELATION_INFO_TEST -- output-path hdfs path where the target file written by the MapReduce task is stored

The index field requires the query field of the where condition to be written in on, and the query result field can be written in INCLUDE.
 

View index status

SELECT 
TABLE_NAME -- 表名(索引表也是个表) 
,DATA_TABLE_NAME -- 数据表(我们保存数据的表,不包括索引表,虽然他也存数据)
,INDEX_TYPE -- 索引类型
,INDEX_STATE -- 索引状态  a为正常
,INDEX_DISABLE_TIMESTAMP -- disable的时间 
from system.catalog 
where INDEX_TYPE is not NULL
AND DATA_TABLE_NAME IN (
'T_COLLECTION_SLUG_RELATION_INFO',
'T_COLLECTION_ITEM_RARITY',
'T_COLLECTION_ITEM_INFO',
'NFT.T_NFT_LISTING');

遇到失效的索引直接重建

ALTER INDEX IF EXISTS "IDX_SLUG_RELATION_DISTINCT" on "NFT.T_COLLECTION_SLUG_RELATION_INFO" REBUILD;

序列:

CREATE SEQUENCE use_users_sequence CACHE 1000;   --创建序列 

SELECT sequence_schema, sequence_name, start_with, increment_by, cache_size FROM SYSTEM."SEQUENCE";  --查询所有序列

select next value for use_users_sequence;   --  下一个值

select current value for use_users_sequence; -- 当前序列

DROP SEQUENCE use_users_sequence;   -- 删除序列

UPSERT VALUES INTO use_users(autoid, col1, col2)  VALUES( NEXT VALUE FOR use_users_sequence, '11', '22');  -- 使用序列插入

快照:


1.停止表继续插入

 hbase shell>disable 'NFT:T_COLLECTION_ITEM_INFO'

 enable 恢复表 

2.制作快照
hbase shell> snapshot 'NFT:T_COLLECTION_ITEM_INFO', 'T_COLLECTION_ITEM_INFO_1101_Snapshot'

3.克隆快照为新的名字
hbase shell> clone_snapshot 'T_COLLECTION_ITEM_INFO_1101_Snapshot', 'T_COLLECTION_BAK'

4.删除快照
hbase shell> delete_snapshot 'T_COLLECTION_ITEM_INFO_1101_Snapshot'

5.删除原来表

hbase shell> drop 'NFT:T_COLLECTION_ITEM_INFO'

Tips:

-- 统计数据条数

hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'NFT:T_COLLECTION_ITEM_INFO'

--SELECT  COUNT (*) FROM  NFT.T_COLLECTION_ITEM_INFO

Guess you like

Origin blog.csdn.net/lansye/article/details/127665742