sqoop MySQL导入Hbase

sqoop http://mirror.bit.edu.cn/apache/sqoop/1.4.1-incubating/sqoop-1.4.1-incubating__hadoop-0.20.tar.gz

mysql  http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.20.tar.gz/from/http://mysql.ntu.edu.tw/

安装好 sqoop、hbase。

下载jbdc驱动:mysql-connector-java-5.1.20.jar

将 mysql-connector-java-5.2.10.jar 拷贝到sqoop的lib下,同时拷贝hbase,zookeeper的jar到lib下

 

执行:

./sqoop import --connect jdbc:mysql://10.20.147.3/pm2 --username pm --password ×× --table acookie --hbase-table acookie  --column-family acookie --hbase-create-table

 

12/05/24 14:49:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

12/05/24 14:49:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

12/05/24 14:49:55 INFO tool.CodeGenTool: Beginning code generation

12/05/24 14:49:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `acookie` AS t LIMIT 1

12/05/24 14:49:55 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.203.0/bin/..

Note: /tmp/sqoop-hadoop/compile/d75d3a2bf713dd671b174830abc3da31/acookie.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

12/05/24 14:49:56 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-hadoop/compile/d75d3a2bf713dd671b174830abc3da31/acookie.java to /home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/./acookie.java

12/05/24 14:49:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/d75d3a2bf713dd671b174830abc3da31/acookie.jar

12/05/24 14:49:56 WARN manager.MySQLManager: It looks like you are importing from mysql.

12/05/24 14:49:56 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

12/05/24 14:49:56 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

12/05/24 14:49:56 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

12/05/24 14:50:04 INFO mapreduce.ImportJobBase: Beginning import of acookie

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:host.name=node1

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_18

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/alibaba/java/jre

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-0.20.203.0/bin/../conf:/usr/alibaba/java/lib/tools.jar:/home/hadoop/hadoop-0.20.203.0/bin/..:/home/hadoop/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../conf::/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/ant-contrib-1.0b3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/avro-1.5.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/avro-ipc-1.5.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/avro-mapred-1.5.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/commons-io-1.4.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/hbase-0.90.4.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/jackson-core-asl-1.7.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/jackson-mapper-asl-1.7.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/jopt-simple-3.2.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/mysql-connector-java-5.1.20-bin.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/paranamer-2.3.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/snappy-java-1.0.3.2.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../lib/zookeeper-3.3.2.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../sqoop-1.4.1-incubating.jar:/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin/../sqoop-test-1.4.1-incubating.jar:

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-131.el5.customxen

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/sqoop-1.4.1-incubating__hadoop-0.20/bin

12/05/24 14:50:10 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection

12/05/24 14:50:10 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181

12/05/24 14:50:10 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session

12/05/24 14:50:10 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x3777b6dcd00029, negotiated timeout = 180000

12/05/24 14:50:10 INFO mapreduce.HBaseImportJob: Creating missing column family acookie

12/05/24 14:50:10 INFO client.HBaseAdmin: Started disable of acookie

12/05/24 14:50:12 INFO client.HBaseAdmin: Disabled acookie

12/05/24 14:50:12 INFO client.HBaseAdmin: Started enable of acookie

12/05/24 14:50:14 INFO client.HBaseAdmin: Enabled table acookie

12/05/24 14:50:15 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ACOOKIE_ID`), MAX(`ACOOKIE_ID`) FROM `acookie`

12/05/24 14:50:15 WARN db.TextSplitter: Generating splits for a textual index column.

12/05/24 14:50:15 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

12/05/24 14:50:15 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

12/05/24 14:50:16 INFO mapred.JobClient: Running job: job_201205231120_0008

12/05/24 14:50:17 INFO mapred.JobClient:  map 0% reduce 0%

12/05/24 14:50:34 INFO mapred.JobClient:  map 16% reduce 0%

12/05/24 14:50:37 INFO mapred.JobClient:  map 33% reduce 0%

12/05/24 14:50:38 INFO mapred.JobClient:  map 66% reduce 0%

12/05/24 14:50:40 INFO mapred.JobClient:  map 83% reduce 0%

12/05/24 14:50:43 INFO mapred.JobClient:  map 100% reduce 0%

12/05/24 14:50:48 INFO mapred.JobClient: Job complete: job_201205231120_0008

12/05/24 14:50:48 INFO mapred.JobClient: Counters: 13

12/05/24 14:50:48 INFO mapred.JobClient:   Job Counters 

12/05/24 14:50:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=35523

12/05/24 14:50:48 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/05/24 14:50:48 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/05/24 14:50:48 INFO mapred.JobClient:     Launched map tasks=6

12/05/24 14:50:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

12/05/24 14:50:48 INFO mapred.JobClient:   File Output Format Counters 

12/05/24 14:50:48 INFO mapred.JobClient:     Bytes Written=0

12/05/24 14:50:48 INFO mapred.JobClient:   FileSystemCounters

12/05/24 14:50:48 INFO mapred.JobClient:     HDFS_BYTES_READ=877

12/05/24 14:50:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=226202

12/05/24 14:50:48 INFO mapred.JobClient:   File Input Format Counters 

12/05/24 14:50:48 INFO mapred.JobClient:     Bytes Read=0

12/05/24 14:50:48 INFO mapred.JobClient:   Map-Reduce Framework

12/05/24 14:50:48 INFO mapred.JobClient:     Map input records=72

12/05/24 14:50:48 INFO mapred.JobClient:     Spilled Records=0

12/05/24 14:50:48 INFO mapred.JobClient:     Map output records=72

12/05/24 14:50:48 INFO mapred.JobClient:     SPLIT_RAW_BYTES=877

12/05/24 14:50:48 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 33.7181 seconds (0 bytes/sec)

12/05/24 14:50:48 INFO mapreduce.ImportJobBase: Retrieved 72 records.

去hbase shell下,list查询,可以看到表已经建立,scan一下or

get 'acookie','7JzGBSJKrh0CAccdAHmqwaeu'

COLUMN                           CELL                                                                                        

 acookie:ALIIM_ID                timestamp=1337842232403, value=cnalichnmyetptest19                                          

 acookie:CREATOR                 timestamp=1337842232403, value=sys                                                          

 acookie:GMT_CREATE              timestamp=1337842232403, value=2011-01-26 13:34:15.0                                        

 acookie:GMT_MODIFIED            timestamp=1337842232403, value=2011-01-26 16:11:01.0                                        

 acookie:MODIFIER                timestamp=1337842232403, value=sys                                                          

 acookie:corp_name               timestamp=1337842232403, value=\xE4\xB8\xAD\xE5\x9B\xBDmyetptest19&\xE8\x9C\xBB\xE8\x84\xB2\

                                 xE9\x9C\x81\xE6\x92\xAC\xE6\x9C\x89\xE9\x99\x90\xE5\x85\xAC\xE5\x8F\xB8                     

 acookie:email                   timestamp=1337842232403, [email protected]                                        

 acookie:gender                  timestamp=1337842232403, value=M                                                            

 acookie:phone                   timestamp=1337842232403, value=86_0571_ 45683158                                            

 acookie:user_name               timestamp=1337842232403, value=\xE5\xB7\xA1\xE5\x9C\x8A\xE7\x8A\x81                         

10 row(s) in 0.0410 seconds

附网上的导入命令参考:

Sqoop Import Examples:

Sqoop Import :- Import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).

Import the data (MySQL table) to HBase: 

Case 1: If table have primary key and import all the column of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName  --column-family hbase_table_col1 --hbase-create-table

Case 2: If table have primary key and import only few columns of MySQL table into HBase table.  

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table 

Note : Column names specified in --columns attribute must contain the primary key column.

Case 3: If table doesn't have primary key then choose one column as a hbase-row-key. Import all the column of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table

Case 4: If table doesn't have primary key then choose one column as a hbase-row-key. Import only few columns of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table  

Note: Column name specified in hbase-row-key atribute must be in columns list. Otherwise command will execute successfully but no records are inserted into hbase.

Note : The value of primary key column or column specified in --hbase-row-key attribute become the HBase row value. If MySQL table doesn't have primary key or column specified in --hbase-row-key attribute doesn't have unique value then there is a lost of few records.

Example : Let us consider a MySQL table test_table which have two columns name,address. The table test_table doesn't have primary key or unique key column.

Records of test_table:

________________

name    address

----------------

abc    123

sqw    345

abc    125

sdf    1234

aql    23dw

Run the following command to import test_table data into HBase:

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table

Only 4 records are visible into HBase table instead of 5. In above example two rows have same value 'abc' of name column and value of this column is used as a HBase row key value. If record having value 'abc' of name column come then thoes record will inserted into HBase table. Next time, another record having the same value 'abc' of name column come then thoes column will overwrite the value previous column.

Above problem also occured if table have composite primary key because the one column from composite key is used as a HBase row key.

Import the data (MySQL table) to Hive

Case 1: Import MySQL table into Hive if table have primary key.

bin/sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

Case 2: Import MySQL table into Hive if table doesn't have primary key.

$ bin/sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name

or 

$ bin/sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1

Import the data (MySQL table) to HDFS

Case 1: Import MySQL table into HDFS if table have primary key.

$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName 

Case 2: Import MySQL table into HDFS if table doesn't have primary key.

$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName  -m 1

Sqoop Export Examples:

Sqoop Export: export the HDFS and its subproject (Hive, HBase) data back into an RDBMS. 

Export Hive table back to an RDBMS:

By default, Hive will stored data using ^A as a field delimiter and \n as a row delimiter.

$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName  --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

where '\001' is octal representation of ^A.

猜你喜欢

转载自sunqi.iteye.com/blog/1539730