HBase conditional query (multi-conditional query)

HBase's query implementation only provides two ways:

1. Get the only record according to the specified RowKey, get method ( org.apache.hadoop.hbase.client.Get )

2. Obtain a batch of records according to the specified conditions, scan method ( org.apache.hadoop.hbase.client.Scan )

 

The scan method is used to implement the conditional query function. The following points are worth noting when using scan:

1. The speed of scan can be improved by the setCaching and setBatch methods (space for time);

2. Scan can be limited by setStartRow and setEndRow. The smaller the range, the higher the performance.

Through the clever design of RowKey, we can get the elements in the record collection in batches next to each other (should be under the same Region), which can achieve good performance when traversing the results.

3. Scan can add filters through the setFilter method, which is also the basis for paging and multi-condition queries.

 

Here's an example of an image:

We store file information in the table, and each file has 5 attributes: file id (long, globally unique), creation time (long), file name (String), category name (String), owner (User) .

The query conditions we can enter: file creation time interval (such as files created from 20120901 to 20120914), file name ("The Voice of China"), category ("Variety"), owner ("Zhejiang Satellite TV").

Suppose we currently have the following files:

Content List ID CreateTime Name Category UserID 1 2 3 4 5 6 7 8 9 10
20120902 The Voice of China Issue 1 variety show 1
20120904 The Voice of China Issue 2 variety show 1
20120906 The Voice of China Wild Card Tournament variety show 1
20120908 The Voice of China Issue 3 variety show 1
20120910 The Voice of China Issue 4 variety show 1
20120912 Interview with the Voice of China contestants Variety Highlights 2
20120914 The Voice of China Issue 5 variety show 1
20120916 The Voice of China Recording Highlights Variety Highlights 2
20120918 Exclusive interview with Zhang Wei Trivia 3
20120920 Jiaduobao herbal tea advertisement variety show 4

 

 

 

Here UserID should correspond to another User table, which is not listed for the time being. We just need to know what UserID means:

1 is for Zhejiang Satellite TV; 2 is for The Voice; 3 is for XX Weibo; 4 is for sponsors.
When calling the query interface, enter the above five conditions into find (20120901, 20121001, "The Voice of China", "Variety", "Zhejiang Satellite TV").

At this point we should get the record should have items 1, 2, 3, 4, 5, 7. Article 6 should not be selected because it does not belong to "Zhejiang Satellite TV".

We can do this when designing RowKey: use UserID + CreateTime + FileID to form rowKey, which can not only satisfy multi-condition query, but also have fast query speed

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326361322&siteId=291194637