Content query of hbase

1. Shell query

 

The hbase query is quite simple, providing two methods of get and scan, and there is no problem of multi-table joint query. For complex queries, you need to create corresponding external tables through Hive , and use SQL statements to automatically generate mapreduce for execution.
But this simplicity, and sometimes in order to achieve the goal, is not so easy. At least it is quite different from the sql query method.

hbase provides many filters to filter on row keys, columns, and values. The filtering method can be substring, binary, prefix, regular comparison, etc. Conditions can be combinations of AND, OR, etc. Therefore, through filtering, it is still possible to meet the needs and find the correct results.

1.1 Filter Types

There is a description of the filter in the latest HBase official document Chinese version (http://abloz.com/hbase/book.html) . Filters are divided into 5 types:

  1. Stereotype filter: A filter used to contain another set of filters. Include: FilterList
  2. Column value type filter: filter the value of each column. Equivalent to = and like in sql query, including:
    SingleColumnValueFilter
    Comparators, including:
     RegexStringComparator  Regular expressions that support value comparison
     SubstringComparator  is used to detect whether a substring exists in a value. Not case sensitive. 
    BinaryPrefixComparator Binary Prefix Compare
     BinaryComparator Binary Compare
  3. Key-value metadata filter: used to filter the column. include:
    FamilyFilter  is used to filter column families. In general, selecting ColumnFamilie in Scan is better than doing it in Filter.
    QualifierFilter  is used to filter based on column name (i.e. Qualifier).
    ColumnPrefixFilter can filter based on column name (ie Qualifier) ​​prefix.
    MultipleColumnPrefixFilter behaves like ColumnPrefixFilter, but multiple prefixes can be specified.
    ColumnRangeFilter  enables efficient internal scanning.  

     

  4. Rowkey: Filters on row keys. It is generally considered that the startRow/stopRow method is better for Scan when selecting rows. However RowFilter can also be used.
  5. Tools: For example , FirstKeyOnlyFilter is used to count the number of rows.

2. Examples

 

1. FirstKeyOnlyFilter, a convenient filter for calculating the number of rows

hbase(main):002:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>'info',FILTER=>"(FirstKeyOnlyFilter())"}
 0000000001                       column=info:loginid, timestamp=1343625459713, value=jjm168131013
 0000000002                       column=info:loginid, timestamp=1343625459713, value=loveswh
...
21 row(s) in 0.5480 seconds

2. Filter by column name substring

hbase(main):006:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:id'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000001 column=info:userid, timestamp=1343625459713, value=168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh
0000000002 column=info:userid, timestamp=1343625459713, value=100898152

hbase(main):005:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:loginid'],FILTER=>"(QualifierFilter(=,'substring:id'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh

hbase(main):007:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nid'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh

hbase(main):008:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick'))"}
ROW COLUMN+CELL
0000000001 column=info:nick, timestamp=1343625459713, value=\xE5\xAE\xB6\xE6\x9C\x89\xE8\x99\x8E\xE5\xAE\x9
D
0000000002 column=info:nick, timestamp=1343625459713, value=loveswh08

3.Value filtering

3.1 Regular filtering
hbase(main):004:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>'info',FILTER=>"(SingleColumnValueFilter('info','nick',=,'regexstring:.*99',true,true))"}
ROW                               COLUMN+CELL
 0000000009                       column=info:loginid, timestamp=1343625459713, value=zgh1968
 0000000009                       column=info:nick, timestamp=1343625459713, value=zwy99
 0000000009                       column=info:score, timestamp=1343625459713, value=5
 0000000009                       column=info:userid, timestamp=1343625459713, value=100366262
1 row(s) in 0.2520 seconds

3.2 Substrings
need to be imported
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

hbase(main):028:0> scan 'toplist_ware_ios_1001_201231',{COLUMNS =>'info:nick', FILTER=>SingleColumnValueFilter.new(Bytes.toBytes('info'),Bytes.toBytes('nick'),CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('8888'))}
ROW COLUMN+CELL
0000000002 column=info:nick, timestamp=1343625446556, value=\xE7\x81\x8F????\xE3\x81\x8A??8888
1 row(s) in 0.0330 seconds

3.3 Binary
Substrings, etc. do not support multi-byte literals, so use binary for comparison
hbase(main):010:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick') AND ValueFilter(=,'binary:7789\xE6\xB4\x81') )"}
ROW COLUMN+CELL
0000000016 column=info:nick, timestamp=1343625459713, value=7789\xE6\xB4\x81
1 row(s) in 0.1710 seconds

4 Comprehensive column name substring and value binary comparison

hbase(main):012:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick') AND ValueFilter(=,'binary:7789\xE6\xB4\x81') )"}
ROW COLUMN+CELL
0000000016 column=info:nick, timestamp=1343625459713, value=7789\xE6\xB4\x81
1 row(s) in 0.0120 seconds
hbase(main):014:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>"info:",FILTER=>"(PrefixFilter('000000002')) AND (QualifierFilter(=,'substring:nick')"}
ROW COLUMN+CELL
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000021 column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\x97\xE6\x95\x99\xE7\xBB\x8
 31
2 row(s) in 0.0440 seconds

5. Line query

 

hbase(main):005:0> get 'toplist_ware_ios_1009_201231','0000000009'
COLUMN CELL
 info:loginid timestamp=1343625459713, value=zgh1968
 info:nick timestamp=1343625459713, value=zwy99
 info:score timestamp=1343625459713, value=5
 info:userid timestamp=1343625459713, value=100366262
4 row(s) in 0.1000 seconds
hbase(main):006:0> get 'toplist_ware_ios_1009_201231','0000000009','info:nick'
COLUMN CELL
 info:nick timestamp=1343625459713, value=zwy99
1 row(s) in 0.0100 seconds
hbase(main):009:0> scan 'toplist_ware_ios_1009_201231',FILTER=>"PrefixFilter('000000002')"
ROW COLUMN+CELL
 0000000020 column=info:loginid, timestamp=1343625459713, value=jjm169212318
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000020 column=info:score, timestamp=1343625459713, value=1
 0000000020 column=info:userid, timestamp=1343625459713, value=169212318
 0000000021 column=info:loginid, timestamp=1343625459713, value=jjm169371841
 0000000021 column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\x97\xE6\x95\x99\xE7\xBB\x8
 31
 0000000021 column=info:score, timestamp=1343625459713, value=1
 0000000021 column=info:userid, timestamp=1343625459713, value=169371841
2 row(s) in 0.0180 seconds
hbase(main):010:0> scan 'toplist_ware_ios_1009_201231',FILTER=>"PrefixFilter('000000002')",LIMIT=>1
ROW COLUMN+CELL
 0000000020 column=info:loginid, timestamp=1343625459713, value=jjm169212318
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000020 column=info:score, timestamp=1343625459713, value=1
 0000000020 column=info:userid, timestamp=1343625459713, value=169212318
1 row(s) in 0.0170 seconds
hbase(main):011:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>"info:nick",FILTER=>"PrefixFilter('000000002')",LIMIT=>1}
ROW COLUMN+CELL
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
1 row(s) in 0.0160 seconds

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326428851&siteId=291194637