Hive docking with Hbase

everyone:

it is good! Due to the actual needs of the project, it is required to connect the data pair in hive to hbase. On the basis of a blog post I read online, I added my own understanding and related operation steps, as well as several common mistakes, to organize this blog, I hope it will be helpful to everyone.

Bulk Load-HBase data import best practices
1. Overview
HBase itself provides a variety of data import methods, usually two commonly used methods:
1. Using the TableOutputFormat provided by HBase, the principle is to import data into HBase through a Mapreduce job
2. There is another way to use the HBase native Client API
because it needs to communicate frequently with the RegionServer where the data is stored. When a large amount of data is stored in the database at one time, it takes up resources especially, so it is not the most effective. Those who have understood the underlying principles of HBase should know that HBase is stored in the HFile file structure in HDFS. A more efficient and convenient method is to use the "Bulk Loading" method to directly generate HFile, that is, the HFileOutputFormat class provided by HBase.

2. The basic principle of
Bulk Load Bulk Load processing consists of two main steps.
1.
The first step of preparing the data file Bulk Load. A Mapreduce job will be executed, which uses the HFileOutputFormat to output the HBase data file: StoreFile.
The function of HFileOutputFormat is to make the output HFile file adapt to a single region. Use the TotalOrderPartitioner class to partition the map output results into different key intervals, and each key interval corresponds to the region of the HBase table.

2. Import the HBase table. In the
second step, use the completebulkload tool to deliver the result files of the first step to the RegionServer responsible for the corresponding region of the file, and move the file to the storage folder of the region on HDFS. Once it's over. Open data to clients.
Assuming that the boundary of the region has changed when bulk load is preparing to import or at the critical point between preparing to import and completing the import, the completebulkload tool will automatically split the data file to the new boundary. However, this process is not the best practice, so users need to minimize the delay between preparing to import and importing the cluster, especially when other clients use other tools to import data to the same table at the same time.

Note:
The completebulkload step of bulk load. Simply import the result file of importtsv or HFileOutputFormat into a table. Use a command similar to the following
hadoop jar hbase-VERSION.jar completebulkload [-c /path/to/hbase/config/hbase-site.xml] /user/todd/myoutput mytable
command will finish running very quickly. Import the HFile file under /user/todd/myoutput into the mytable table. Note: It is assumed that the target table does not exist. The tool will take the initiative to create the table.

3. Instructions for generating HFile program:
1. Finally output the result. Regardless of map or reduce, the key and value types of the output part must be: <ImmutableBytesWritable, KeyValue> or <ImmutableBytesWritable, Put>.
2. Finally, in the output part, the Value type is KeyValue or Put. The corresponding Sorters are KeyValueSortReducer or PutSortReducer respectively.
3. In the MR sample, job.setOutputFormatClass(HFileOutputFormat.class); HFileOutputFormat is only suitable for organizing a single-column family into an HFile file at a time.
4. HFileOutputFormat.configureIncrementalLoad(job, table) in the MR sample; configure the job by yourself. SimpleTotalOrderPartitioner needs to sort the keys first, and then divide them into each reducer to ensure that the range of the minimum and maximum keys in each reducer will not overlap. Because when it is stored in HBase, as an overall Region, the keys are absolutely orderly.
5. In the MR sample, the HFile is finally generated and stored on HDFS. The subfolders under the output path are each column family. Assume that HFile is stored in HBase. Equivalent to move HFile to HBase Region. The column family content of the HFile subfolder is gone.

Four actual combat drills

Step 1: Create test data and upload it to hdfs:

[root@hadoop test]# hadoop fs -cat /test/hbase.txt
key1	fm1:col1	value1
key1	fm1:col2	value2
key1	fm2:col1	value99
key4	fm1:col1	value4

Step 2: Create the required tables on hbase

--创建表
create 'hfiletable','fm1','fm2'

Verify the created table in hbase

hbase(main):006:0> truncate 'hfiletable'
Truncating 'hfiletable' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.2430 seconds

hbase(main):007:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
0 row(s) in 0.3490 seconds

Explanation: 1 Truncate is used to prevent historical data. If the table is created for the first time, this step is not required

Step 3: Create the BulkLoadJob class in eclipse, which is also needed by the program

package day01;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FsShell;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class BulkLoadJob {
    static Logger logger = LoggerFactory.getLogger(BulkLoadJob.class);

    public static class BulkLoadMap extends
            Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

		public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] valueStrSplit = value.toString().split("\t");
            String hkey = valueStrSplit[0];
            String family = valueStrSplit[1].split(":")[0];
            String column = valueStrSplit[1].split(":")[1];
            String hvalue = valueStrSplit[2];
            final byte[] rowKey = Bytes.toBytes(hkey);
            final ImmutableBytesWritable HKey = new ImmutableBytesWritable(rowKey);
            Put HPut = new Put(rowKey);
            byte[] cell = Bytes.toBytes(hvalue);
            HPut.add(Bytes.toBytes(family), Bytes.toBytes(column), cell);
            context.write(HKey, HPut);

        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = HBaseConfiguration.create();
        String inputPath = args[0];
        String outputPath = args[1];
        HTable hTable = null;
        try {
            Job job = Job.getInstance(conf, "ExampleRead");
            job.setJarByClass(BulkLoadJob.class);
            job.setMapperClass(BulkLoadJob.BulkLoadMap.class);
            job.setMapOutputKeyClass(ImmutableBytesWritable.class);
            job.setMapOutputValueClass(Put.class);
            // speculation
            job.setSpeculativeExecution(false);
            job.setReduceSpeculativeExecution(false);
            // in/out format
			job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(HFileOutputFormat2.class);

            FileInputFormat.setInputPaths(job, inputPath);
            FileOutputFormat.setOutputPath(job, new Path(outputPath));

            hTable = new HTable(conf, args[2]);
            HFileOutputFormat2.configureIncrementalLoad(job, hTable);

            if (job.waitForCompletion(true)) {
                FsShell shell = new FsShell(conf);
                try {
                    shell.run(new String[]{"-chmod", "-R", "777", args[1]});
                } catch (Exception e) {
                    logger.error("Couldnt change the file permissions ", e);
                    throw new IOException(e);
                }
                //载入到hbase表
                LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
                loader.doBulkLoad(new Path(outputPath), hTable);
            } else {
                logger.error("loading failed.");
                System.exit(1);
            }

        } catch (IllegalArgumentException e) {
            e.printStackTrace();
        } finally {
            if (hTable != null) {
                hTable.close();
            }
        }
    }
}

Step 4: Pack the class created in Step 3 into the file Hdfs_To_Hbase.jar, upload it to the directory /root/test, and perform the following verification

[root@hadoop test]# ls -l /root/test/Hdfs_To_Hbase.jar
-rw-r--r-- 1 root root 44481940 Aug 20 06:14 /root/test/Hdfs_To_Hbase.jar

Step 5: In linux, start to execute the jar package just uploaded

hadoop jar /root/test/Hdfs_To_Hbase.jar /test/hbase.txt /Hdfs_to_Hbase hfiletable >123.log 2>&1 &

Parameter description:
1 /root/test/Hdfs_To_Hbase.jar is the jar package name
2 /test/hbase.txt is the imported HDFS data source
3 /Hdfs_to_Hbase is the temporary directory for storing Hfiles generated on hdfs after each processing by MR, This directory needs to be deleted before execution.
4 hfiletable is the name of the table in hbase, which must be created in hbase first

Step 6: View the execution log in the background, the file is relatively large, you need to be patient to finish it

[root@hadoop ~]# cat 123.log 
18/08/20 06:15:24 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x130dca52 connecting to ZooKeeper ensemble=localhost:2181
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_45
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hbase/lib/hbase-hadoop-compat-1.1.3.jar:/usr/local/hbase/lib/metrics-core-2.2.0.jar
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.name=root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x130dca520x0, quorum=localhost:2181, baseZNode=/hbase
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x165541a87e60009, negotiated timeout = 40000
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table hfiletable
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Writing partition information to /user/root/hbase-staging/partitions_21f6d6d4-c583-4f96-a31a-ad889828c257
18/08/20 06:15:26 INFO compress.CodecPool: Got brand-new compressor [.deflate]
18/08/20 06:15:26 INFO mapreduce.HFileOutputFormat2: Incremental table hfiletable output configured.
18/08/20 06:15:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.17.108:8032
18/08/20 06:15:27 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:30 INFO input.FileInputFormat: Total input paths to process : 1
18/08/20 06:15:30 INFO mapreduce.JobSubmitter: number of splits:1
18/08/20 06:15:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534714206732_0002
18/08/20 06:15:30 INFO impl.YarnClientImpl: Submitted application application_1534714206732_0002
18/08/20 06:15:31 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1534714206732_0002/
18/08/20 06:15:31 INFO mapreduce.Job: Running job: job_1534714206732_0002
18/08/20 06:15:45 INFO mapreduce.Job: Job job_1534714206732_0002 running in uber mode : false
18/08/20 06:15:45 INFO mapreduce.Job:  map 0% reduce 0%
18/08/20 06:15:53 INFO mapreduce.Job:  map 100% reduce 0%
18/08/20 06:16:05 INFO mapreduce.Job:  map 100% reduce 100%
18/08/20 06:16:05 INFO mapreduce.Job: Job job_1534714206732_0002 completed successfully
18/08/20 06:16:05 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=263
		FILE: Number of bytes written=300567
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=183
		HDFS: Number of bytes written=10115
		HDFS: Number of read operations=10
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=5
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6480
		Total time spent by all reduces in occupied slots (ms)=7719
		Total time spent by all map tasks (ms)=6480
		Total time spent by all reduce tasks (ms)=7719
		Total vcore-milliseconds taken by all map tasks=6480
		Total vcore-milliseconds taken by all reduce tasks=7719
		Total megabyte-milliseconds taken by all map tasks=6635520
		Total megabyte-milliseconds taken by all reduce tasks=7904256
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=249
		Map output materialized bytes=263
		Input split bytes=98
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=263
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=195
		CPU time spent (ms)=2870
		Physical memory (bytes) snapshot=363540480
		Virtual memory (bytes) snapshot=4150374400
		Total committed heap usage (bytes)=222429184
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=85
	File Output Format Counters 
		Bytes Written=10115
18/08/20 06:16:05 WARN mapreduce.LoadIncrementalHFiles: managed connection cannot be used for bulkload. Creating unmanaged connection.
18/08/20 06:16:05 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x70e889e9 connecting to ZooKeeper ensemble=localhost:2181
18/08/20 06:16:05 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x70e889e90x0, quorum=localhost:2181, baseZNode=/hbase
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x165541a87e6000a, negotiated timeout = 40000
18/08/20 06:16:05 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://hadoop:9000/Hdfs_to_Hbase/_SUCCESS
18/08/20 06:16:06 INFO hfile.CacheConfig: CacheConfig:disabled
18/08/20 06:16:06 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://hadoop:9000/Hdfs_to_Hbase/fm1/f1fd7bb279c04b39a13678d98a1dad19 first=key1 last=key4
18/08/20 06:16:06 INFO hfile.CacheConfig: CacheConfig:disabled
18/08/20 06:16:06 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://hadoop:9000/Hdfs_to_Hbase/fm2/879add100bdc46c5887773e9c7f13703 first=key1 last=key1
18/08/20 06:16:06 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
18/08/20 06:16:06 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x165541a87e6000a
18/08/20 06:16:06 INFO zookeeper.ZooKeeper: Session: 0x165541a87e6000a closed
18/08/20 06:16:06 INFO zookeeper.ClientCnxn: EventThread shut down

Step 7: Log in to the background hbase and verify whether the data is inserted into the corresponding table. In order to show the accuracy of the data, I explained it on the basis of the original

hbase(main):006:0> truncate 'hfiletable'
Truncating 'hfiletable' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.2430 seconds

hbase(main):007:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
0 row(s) in 0.3490 seconds

hbase(main):008:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
 key1                           column=fm1:col1, timestamp=1534716962335, value=value1                                   
 key1                           column=fm1:col2, timestamp=1534716962335, value=value2                                   
 key1                           column=fm2:col1, timestamp=1534716962335, value=value99                                  
 key4                           column=fm1:col1, timestamp=1534716962335, value=value4                                   
2 row(s) in 0.0830 seconds

As you can see, rowkey, column clusters, and column values have been inserted into hbase according to the format of the data source, and the data quality verification is complete

Five common errors of missing jar, this is also the error I encountered during testing
1 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/CompatibilityFactory
less hbase-hadoop-compat-1.1 .3.jar
Solution for this jar : Find the hadoop-env.sh file of hadoop, and add
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${HBASE_HOME}/lib/hbase-hadoop-compat-1.1.3.jar

2 Exception in thread "main" java.lang.NoClassDefFoundError: com/yammer/metrics/core/MetricsRegistry is
less metrics-core-2.2.0.jar this jar
solution: find the hadoop-env.sh file of hadoop, add
export HADOOP_CLASSPATH =$HADOOP_CLASSPATH:${HBASE_HOME}/lib/metrics-core-2.2.0.jar

Hive docking with Hbase

Guess you like