Solve the problem of more than 32 hfiles using the HBase Bulk Loading tool, targeting a family of regions

When importing indicators, when the number of hfile files output by the importtsv.bulk.output directory exceeds 32, it needs to be divided into multi-step operations

. The number is kept within 32) and moved to another directory.

Step 2: Execute hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles output wordcount to import the hfile files in the output directory into the "wordcount" hbase table.

Step 3: Reinstall the files previously moved to other directories Move into the output (bulk.output directory still cannot exceed 32 hfiles, if there are more than one, please repeat this step) directory, perform the previous second step until all files are imported into the "wordcount" hbase table

@黄坤 by executing the completebulkload tool The error log output:

----------------------------------------

Exception in thread "main " java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:55)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Trying to load more than 32 hfiles to one family of one region
        at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:377)
        at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:960)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:967)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run (ProgramDriver.java:144)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
        ... 11 more
------------------- -----------------------

Tracking the source code found the following code: this.maxFilesPerRegionPerFamily = conf.getInt("hbase.mapreduce.bulkload.max.hfiles.perRegion. perFamily", 32);

by adding the following parameters when loading data: "-Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=64 (64 matches the number of files in hfile in the bulk.output output directory)" That is You can import data to hbase once without multiple operations.

For example execute the command:

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=64 output wordcount

or

sudo -u hbase hadoop jar $HBASE_HOME/hbase-server-1.0.0-cdh5. 4.0.jar completebulkload -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=64 output wordcount


Test reproduction idea:

According to the previous description, we change the hbase.hregion.max.filesize of the hbase RegionServer of the "panda" cluster The value or the hfile tool generated by importtvs is to set the -Dhbase.hregion.max.filesize=20971520 parameter. Specific test verification steps:

Step 1: Set hbase.hregion.max.filesize The value is changed from 10G to 20MB.

Step 2: The size of the file to be imported into hbase is 2G (the purpose is to execute importtvs to generate more than 32 hfiles) (calculation method: 20MB * 32 (the default hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily value of the completebulkload tool is 32) ) )


To set the file size of hbase.hregion.max.filesize when executing the importtvs command, just add -Dhbase.hregion.max.filesize=20971520 (20971520 == 20MB) parameter value. You can also set the hbase of RegionServer in the cluster. The value of

hregion.max.filesize, how many files are generated after the importtvs command is executed. The value set by reading the file size (File Input Format Counters / Bytes Read)/hbase.hregion.max.filesize = the number of hfile file output number.

hadoop jar $HBASE_HOME/hbase-server-1.0.0-cdh5.4.0.jar importtsv -Dimporttsv.bulk.output=output1 -Dhbase.hregion.max.filesize=20971520 -Dimporttsv.columns=HBASE_ROW_KEY,f:data wordcountexample 2013-09-25.csv

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326220724&siteId=291194637