Acoplamiento de Hive con Hbase

todo el mundo:

¡es bueno! Debido a las necesidades reales del proyecto, es necesario conectar el par de datos en hive a hbase. Sobre la base de una publicación de blog que leí en línea, además de mi propia comprensión y los pasos operativos relacionados, así como varios errores comunes, he organizado este blog. Espero que sea útil para todos.

Prácticas recomendadas para la importación de datos Bulk Load-HBase
1. Descripción general
HBase proporciona una variedad de métodos de importación de datos, generalmente dos métodos de uso común:
1. Usando TableOutputFormat proporcionado por HBase, el principio es importar datos a HBase a través de un trabajo de Mapreduce
2. Hay otra forma de usar la API de cliente nativa de HBase
porque necesita comunicarse frecuentemente con RegionServer donde se almacenan los datos. Cuando se almacena una gran cantidad de datos en la base de datos a la vez, consume recursos especialmente, por lo que no es el más efectivo. Aquellos que han entendido los principios subyacentes de HBase deben saber que HBase se almacena en la estructura de archivos HFile en HDFS. Un método más eficiente y conveniente es utilizar el método "Bulk Loading" para generar directamente HFile, es decir, la clase HFileOutputFormat proporcionada por HBase.

2. El principio básico del
procesamiento Bulk Load Bulk Load consta de dos pasos principales:
1.
El primer paso para preparar el archivo de datos Bulk Load. Se ejecutará un trabajo de Mapreduce, que utiliza HFileOutputFormat para generar el archivo de datos HBase: StoreFile.
La función de HFileOutputFormat es hacer que el archivo HFile de salida se adapte a una sola región. Utilice la clase TotalOrderPartitioner para dividir los resultados de salida del mapa en diferentes intervalos de clave, y cada intervalo de clave corresponde a la región de la tabla HBase.

2. Importe la tabla HBase. En el
segundo paso, use la herramienta completebulkload para entregar los archivos de resultados del primer paso al RegionServer responsable de la región correspondiente del archivo y mueva el archivo a la carpeta de almacenamiento de la región en HDFS. . Una vez que termine. Abrir datos a clientes.
Suponiendo que el límite de la región ha cambiado cuando la carga masiva se está preparando para importar o en el punto crítico entre la preparación para la importación y la finalización de la importación, la herramienta completebulkload dividirá automáticamente el archivo de datos en el nuevo límite. Sin embargo, este proceso no es la mejor práctica, por lo que los usuarios deben minimizar la demora entre la preparación para importar y la importación del clúster, especialmente cuando otros clientes usan otras herramientas para importar datos a la misma tabla al mismo tiempo.

Nota:
El paso completo de carga a granel. Simplemente importe el archivo de resultados de importtsv o HFileOutputFormat a una tabla. Utilice un comando similar al siguiente
hadoop jar hbase-VERSION.jar completebulkload [-c /path/to/hbase/config/hbase-site.xml] / user / todd / myoutput mytable El
comando terminará de ejecutarse muy rápidamente. Importe el archivo HFile en / user / todd / myoutput a la tabla mytable. Nota: Se asume que la tabla de destino no existe. La herramienta tomará la iniciativa para crear la tabla.

3. Instrucciones para generar el programa HFile:
1. Finalmente, envíe el resultado. Independientemente de map o reduce, los tipos de clave y valor de la parte de salida deben ser: <ImmutableBytesWritable, KeyValue> o <ImmutableBytesWritable, Put>.
2. Finalmente, en la parte de salida, el tipo de valor es KeyValue o Put. Los clasificadores correspondientes son KeyValueSortReducer o PutSortReducer respectivamente.
3. En el ejemplo de MR, job.setOutputFormatClass (HFileOutputFormat.class); HFileOutputFormat solo es adecuado para organizar una familia de una sola columna en un archivo HFile a la vez.
4. HFileOutputFormat.configureIncrementalLoad (trabajo, tabla) en el ejemplo de MR, configure el trabajo usted mismo. SimpleTotalOrderPartitioner necesita ordenar las claves primero y luego dividirlas en cada reductor para asegurarse de que el rango de las claves mínima y máxima en cada reductor no se superponga. Porque cuando se almacena en HBase, como una región general, las claves están absolutamente ordenadas.
5. En la muestra de RM, el HFile finalmente se genera y se almacena en HDFS. Las subcarpetas de la ruta de salida son cada familia de columnas. Suponga que HFile se almacena en HBase. Equivalente a mover HFile a la región HBase. El contenido de la familia de columnas de la subcarpeta HFile ha desaparecido.

Cuatro ejercicios de combate reales

Paso 1: cree datos de prueba y cárguelos en hdfs:

[root@hadoop test]# hadoop fs -cat /test/hbase.txt
key1	fm1:col1	value1
key1	fm1:col2	value2
key1	fm2:col1	value99
key4	fm1:col1	value4

Paso 2: crea las tablas necesarias en hbase

--创建表
create 'hfiletable','fm1','fm2'

Verifique la tabla creada en hbase

hbase(main):006:0> truncate 'hfiletable'
Truncating 'hfiletable' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.2430 seconds

hbase(main):007:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
0 row(s) in 0.3490 seconds

Explicación: 1 Truncar se utiliza para evitar datos históricos. Si la tabla se crea por primera vez, este paso no es necesario

Paso 3: Cree la clase BulkLoadJob en eclipse, que también es necesaria para el programa

package day01;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FsShell;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class BulkLoadJob {
    static Logger logger = LoggerFactory.getLogger(BulkLoadJob.class);

    public static class BulkLoadMap extends
            Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

		public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] valueStrSplit = value.toString().split("\t");
            String hkey = valueStrSplit[0];
            String family = valueStrSplit[1].split(":")[0];
            String column = valueStrSplit[1].split(":")[1];
            String hvalue = valueStrSplit[2];
            final byte[] rowKey = Bytes.toBytes(hkey);
            final ImmutableBytesWritable HKey = new ImmutableBytesWritable(rowKey);
            Put HPut = new Put(rowKey);
            byte[] cell = Bytes.toBytes(hvalue);
            HPut.add(Bytes.toBytes(family), Bytes.toBytes(column), cell);
            context.write(HKey, HPut);

        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = HBaseConfiguration.create();
        String inputPath = args[0];
        String outputPath = args[1];
        HTable hTable = null;
        try {
            Job job = Job.getInstance(conf, "ExampleRead");
            job.setJarByClass(BulkLoadJob.class);
            job.setMapperClass(BulkLoadJob.BulkLoadMap.class);
            job.setMapOutputKeyClass(ImmutableBytesWritable.class);
            job.setMapOutputValueClass(Put.class);
            // speculation
            job.setSpeculativeExecution(false);
            job.setReduceSpeculativeExecution(false);
            // in/out format
			job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(HFileOutputFormat2.class);

            FileInputFormat.setInputPaths(job, inputPath);
            FileOutputFormat.setOutputPath(job, new Path(outputPath));

            hTable = new HTable(conf, args[2]);
            HFileOutputFormat2.configureIncrementalLoad(job, hTable);

            if (job.waitForCompletion(true)) {
                FsShell shell = new FsShell(conf);
                try {
                    shell.run(new String[]{"-chmod", "-R", "777", args[1]});
                } catch (Exception e) {
                    logger.error("Couldnt change the file permissions ", e);
                    throw new IOException(e);
                }
                //载入到hbase表
                LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
                loader.doBulkLoad(new Path(outputPath), hTable);
            } else {
                logger.error("loading failed.");
                System.exit(1);
            }

        } catch (IllegalArgumentException e) {
            e.printStackTrace();
        } finally {
            if (hTable != null) {
                hTable.close();
            }
        }
    }
}

Paso 4: Empaquete la clase creada en el Paso 3 en el archivo Hdfs_To_Hbase.jar, cárguela en el directorio / root / test y realice la siguiente verificación

[root@hadoop test]# ls -l /root/test/Hdfs_To_Hbase.jar
-rw-r--r-- 1 root root 44481940 Aug 20 06:14 /root/test/Hdfs_To_Hbase.jar

Paso 5: en Linux, comience a ejecutar el paquete jar que acaba de cargar

hadoop jar /root/test/Hdfs_To_Hbase.jar /test/hbase.txt /Hdfs_to_Hbase hfiletable >123.log 2>&1 &

Descripción del parámetro:
 1 /root/test/Hdfs_To_Hbase.jar es el nombre del paquete jar
 2 /test/hbase.txt es la fuente de datos HDFS importada
 3 / Hdfs_to_Hbase es el directorio temporal para almacenar archivos H generados en hdfs después de cada procesamiento por MR, esto El directorio debe eliminarse antes de la ejecución.4
 hfiletable es el nombre de la tabla en hbase, que primero debe crearse en hbase

Paso 6: vea el registro de ejecución en segundo plano, el archivo es relativamente grande, debe ser paciente para terminarlo

[root@hadoop ~]# cat 123.log 
18/08/20 06:15:24 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x130dca52 connecting to ZooKeeper ensemble=localhost:2181
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_45
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jre
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hbase/lib/hbase-hadoop-compat-1.1.3.jar:/usr/local/hbase/lib/metrics-core-2.2.0.jar
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.name=root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
18/08/20 06:15:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x130dca520x0, quorum=localhost:2181, baseZNode=/hbase
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
18/08/20 06:15:24 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x165541a87e60009, negotiated timeout = 40000
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table hfiletable
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count
18/08/20 06:15:25 INFO mapreduce.HFileOutputFormat2: Writing partition information to /user/root/hbase-staging/partitions_21f6d6d4-c583-4f96-a31a-ad889828c257
18/08/20 06:15:26 INFO compress.CodecPool: Got brand-new compressor [.deflate]
18/08/20 06:15:26 INFO mapreduce.HFileOutputFormat2: Incremental table hfiletable output configured.
18/08/20 06:15:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.17.108:8032
18/08/20 06:15:27 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:28 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1245)
	at java.lang.Thread.join(Thread.java:1319)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
18/08/20 06:15:30 INFO input.FileInputFormat: Total input paths to process : 1
18/08/20 06:15:30 INFO mapreduce.JobSubmitter: number of splits:1
18/08/20 06:15:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534714206732_0002
18/08/20 06:15:30 INFO impl.YarnClientImpl: Submitted application application_1534714206732_0002
18/08/20 06:15:31 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1534714206732_0002/
18/08/20 06:15:31 INFO mapreduce.Job: Running job: job_1534714206732_0002
18/08/20 06:15:45 INFO mapreduce.Job: Job job_1534714206732_0002 running in uber mode : false
18/08/20 06:15:45 INFO mapreduce.Job:  map 0% reduce 0%
18/08/20 06:15:53 INFO mapreduce.Job:  map 100% reduce 0%
18/08/20 06:16:05 INFO mapreduce.Job:  map 100% reduce 100%
18/08/20 06:16:05 INFO mapreduce.Job: Job job_1534714206732_0002 completed successfully
18/08/20 06:16:05 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=263
		FILE: Number of bytes written=300567
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=183
		HDFS: Number of bytes written=10115
		HDFS: Number of read operations=10
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=5
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6480
		Total time spent by all reduces in occupied slots (ms)=7719
		Total time spent by all map tasks (ms)=6480
		Total time spent by all reduce tasks (ms)=7719
		Total vcore-milliseconds taken by all map tasks=6480
		Total vcore-milliseconds taken by all reduce tasks=7719
		Total megabyte-milliseconds taken by all map tasks=6635520
		Total megabyte-milliseconds taken by all reduce tasks=7904256
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=249
		Map output materialized bytes=263
		Input split bytes=98
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=263
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=195
		CPU time spent (ms)=2870
		Physical memory (bytes) snapshot=363540480
		Virtual memory (bytes) snapshot=4150374400
		Total committed heap usage (bytes)=222429184
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=85
	File Output Format Counters 
		Bytes Written=10115
18/08/20 06:16:05 WARN mapreduce.LoadIncrementalHFiles: managed connection cannot be used for bulkload. Creating unmanaged connection.
18/08/20 06:16:05 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x70e889e9 connecting to ZooKeeper ensemble=localhost:2181
18/08/20 06:16:05 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x70e889e90x0, quorum=localhost:2181, baseZNode=/hbase
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
18/08/20 06:16:05 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x165541a87e6000a, negotiated timeout = 40000
18/08/20 06:16:05 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://hadoop:9000/Hdfs_to_Hbase/_SUCCESS
18/08/20 06:16:06 INFO hfile.CacheConfig: CacheConfig:disabled
18/08/20 06:16:06 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://hadoop:9000/Hdfs_to_Hbase/fm1/f1fd7bb279c04b39a13678d98a1dad19 first=key1 last=key4
18/08/20 06:16:06 INFO hfile.CacheConfig: CacheConfig:disabled
18/08/20 06:16:06 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://hadoop:9000/Hdfs_to_Hbase/fm2/879add100bdc46c5887773e9c7f13703 first=key1 last=key1
18/08/20 06:16:06 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
18/08/20 06:16:06 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x165541a87e6000a
18/08/20 06:16:06 INFO zookeeper.ZooKeeper: Session: 0x165541a87e6000a closed
18/08/20 06:16:06 INFO zookeeper.ClientCnxn: EventThread shut down

Paso 7: inicie sesión en la base de datos en segundo plano y verifique si los datos están insertados en la tabla correspondiente. Para mostrar la exactitud de los datos, lo expliqué sobre la base del original.

hbase(main):006:0> truncate 'hfiletable'
Truncating 'hfiletable' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.2430 seconds

hbase(main):007:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
0 row(s) in 0.3490 seconds

hbase(main):008:0> scan 'hfiletable'
ROW                             COLUMN+CELL                                                                              
 key1                           column=fm1:col1, timestamp=1534716962335, value=value1                                   
 key1                           column=fm1:col2, timestamp=1534716962335, value=value2                                   
 key1                           column=fm2:col1, timestamp=1534716962335, value=value99                                  
 key4                           column=fm1:col1, timestamp=1534716962335, value=value4                                   
2 row(s) in 0.0830 seconds

Como puede ver, la clave de fila, los grupos de columnas y los valores de columna se han insertado en hbase de acuerdo con el formato de la fuente de datos, y la verificación de la calidad de los datos está completa.
 

Cinco errores comunes de jar faltante, este es también el error que encontré durante la prueba
1 Excepción en el hilo "principal" java.lang.NoClassDefFoundError: org / apache / hadoop / hbase / CompatibilityFactory
  menos hbase-hadoop-compat-1.1 .3.jar
  Solución para este jar : busque el archivo hadoop-env.sh de hadoop y agregue
export HADOOP_CLASSPATH = $ HADOOP_CLASSPATH: $ {HBASE_HOME} /lib/hbase-hadoop-compat-1.1.3.jar

2 Excepción en el hilo "principal" java.lang.NoClassDefFoundError: com / yammer / metrics / core / MetricsRegistry es
   menos metrics-core-2.2.0.jar esta
  solución jar : busque el archivo hadoop-env.sh de hadoop, agregue
export HADOOP_CLASSPATH = $ HADOOP_CLASSPATH: $ {HBASE_HOME} /lib/metrics-core-2.2.0.jar

 

 

Supongo que te gusta

Origin blog.csdn.net/zhaoxiangchong/article/details/81866883
Recomendado
Clasificación