How HBase in table salt after reading: Spark articles

In "How to table salt after HBase reads: coprocessor articles" article in the table after a query using a coprocessor with salt, this article will introduce the second method to achieve the same functionality.

 

We know, HBase provides hbase-mapreduce project includes reading HBase table InputFormat, OutputFormat other categories for us. The project is described as follows:

This module contains implementations of InputFormat, OutputFormat, Mapper, Reducer, etc which are needed for running MR jobs on tables, WALs, HFiles and other HBase specific constructs. It also contains a bunch of tools: RowCounter, ImportTsv, Import, Export, CompactionTool, ExportSnapshot, WALPlayer, etc.

We also know that, although described above is an MR jobs, but Spark is also possible to use these InputFormat, OutputFormat HBase table to read and write, as follows:

The program calculates the total number of rows above table used iteblog TableInputFormat. What happens if we want to check that a UID of all history to achieve it? If you look TableInputFormat code, you will find that it contains a lot of parameters:

Hbase.mapreduce.inputtable which is the need to query the table, which is above the Spark program which TableInputFormat.INPUT_TABLE. The hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.stop correspond to the beginning and ending Rowkey is to be queried, so we can use this information to implement a range of data queries. However, to note that, iteblog this table is salt, so we need to add some prefix before the UID, otherwise it is not query the data. But TableInputFormat can not achieve this function. How then to handle it? The answer is getSplits method of rewriting TableInputFormat.

From the name can also be seen getSplits is to calculate how many Splits. In HBase, one corresponding to a Region Split, corresponding to TableSplit implementation class. Construction TableSplit that need to pass startRow and endRow. startRow and endRow is above the corresponding hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.stop parameters passed in value, so if we need to deal with table salt, you need to achieve here.

On the other hand, we can () to get a particular table all Region of StartKeys and EndKeys by RegionLocator of getStartEndKeys. You will get StartKey and user-entered the hbase.mapreduce.scan.row.start hbase.mapreduce.scan.row.stop value and then spliced ​​to demand what we want to achieve. According to this idea, our code can be achieved as follows:

 

 

Then we have the same UID = user query history of all of 1000, our program can be implemented as follows:

 

 

We compile the package above, and then use the following command to run the above program:

 

 

The results obtained were as follows:

和前面文章使用 HBase Shell 输出结果一致。好了,到这里就介绍完如何在 Spark 中查询 HBase 加盐之后的表了,明天我会介绍如何在 MapReduce 中查询 HBase 加盐之后的表,敬请关注。

Guess you like

Origin www.cnblogs.com/cxhfuujust/p/11987952.html