快学Big Data -- Sqoop (十七)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xfg0218/article/details/82343564

Sqoop总结

官网:http://sqoop.apache.org/

概述

sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。

导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;

导出数据:从Hadoop的文件系统中导出数据到关系数据库mysql等

 

 

工作机制

将导入或导出命令翻译成mapreduce程序来实现

在翻译出的mapreduce中主要是对inputformat和outputformat进行定制

 

sqoop安装

安装sqoop的前提是已经具备java和hadoop的环境

1-1)、下载并解压

软件下载:

链接:http://pan.baidu.com/s/1ge3wD1p 密码:0dgs  如果无法下载请联系作者。

 

[root@hadoop1 sqoop]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

[root@hadoop1 sqoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/  sqoop

[root@hadoop1 sqoop]# cd conf/

1-2)、修改配置文件

[root@hadoop1 conf]# cp sqoop-env-template.sh  sqoop-env.sh

[root@hadoop1 conf]# vi sqoop-env.sh

export HADOOP_COMMON_HOME=/usr/local/hadoop-2.6.4

export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.6.4

export HIVE_HOME=/usr/local/hive

1-3)、加入mysql的jdbc驱动包

[root@hadoop1 lib]# cp /usr/local/hive/lib/mysql-connector-java-5.1.39.jar  /usr/local/sqoop/lib/

1-4 ) 、启动

[root@hadoop1 conf]# cd ../bin/

[root@hadoop1 bin]# ./sqoop-version

Warning: /usr/local/sqoop/bin/../../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/local/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /usr/local/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

16/09/28 16:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Sqoop 1.4.6

git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25

Compiled by root on Mon Apr 27 14:38:36 CST 2015

 

 

警告可以省略、、、

 

Sqoop的数据导入

Sqoop 支持导入与导出数据到HDFS与mysql , hive中的TBLS表中的数据

 

下面的语法用于将数据导入HDFS。

$ sqoop import (generic-args) (import-args)

 

表数据

使用mysql自带的数据库的数据mysql数据库中的user , db 、、、

 

导入到HDFS默认目录

1-1 )、 导入数据

[root@hadoop1 bin]# ./sqoop import   --connect jdbc:mysql://localhost:3306/mysql   --username it  --password it   --table user   --m 1

 

********

Map-Reduce Framework

Map input records=11

Map output records=11

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=72

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=59621376

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1386

16/09/28 16:36:15 INFO mapreduce.ImportJobBase: Transferred 1.3535 KB in 26.3647 seconds (52.5704 bytes/sec)

16/09/28 16:36:15 INFO mapreduce.ImportJobBase: Retrieved 11 records.

 

--connect  需要链接的驱动串

--username 链接用户名

--pssword  链接密码

-- table  需要对那个表操作

--m  需要多少个mapreduce

1-2)、查看数据

[root@hadoop1 bin]# hadoop fs -ls  /user/root/user

Found 2 items

-rw-r--r--   3 root supergroup          0 2016-09-28 16:36 /user/root/user/_SUCCESS

-rw-r--r--   3 root supergroup       1386 2016-09-28 16:36 /user/root/user/part-m-00000

 

 

[root@hadoop1 bin]# hadoop fs -cat  /user/root/user/part-m-00000

localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

hadoop1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

::1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

localhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N

hadoop1,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N

%,myuser,*FABE5482D5AADF36D028AC443D117BE1180B9725,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,root,*81F5E21E35407D884A6CD4A731AEBFB6AF209E1B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,itoffice,*FE2B2BCDE5445749A0C82A89B738BD0BD6B1F407,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

localhost,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

 

导入到HDFS指定目录

1-1)、导入数据

以下的导出来的数据是以逗号分隔的

 

[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/mysql  --username it  --password it  --target-dir /sqoop  --table user  --m 1

99

************

Map-Reduce Framework

Map input records=11

Map output records=11

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=76

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=56848384

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1386

16/09/28 16:55:35 INFO mapreduce.ImportJobBase: Transferred 1.3535 KB in 18.283 seconds (75.8079 bytes/sec)

16/09/28 16:55:35 INFO mapreduce.ImportJobBase: Retrieved 11 records.

 

 

1-2)、查看数据

[root@hadoop1 testData]# hadoop fs -ls /sqoop

Found 2 items

-rw-r--r--   3 root supergroup          0 2016-09-28 16:55 /sqoop/_SUCCESS

-rw-r--r--   3 root supergroup       1386 2016-09-28 16:55 /sqoop/part-m-00000

 

 

 

 

 

[root@hadoop1 testData]# hadoop fs -cat  /sqoop/part-m-00000

localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

hadoop1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

::1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

localhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N

hadoop1,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N

%,myuser,*FABE5482D5AADF36D028AC443D117BE1180B9725,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,root,*81F5E21E35407D884A6CD4A731AEBFB6AF209E1B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

%,itoffice,*FE2B2BCDE5445749A0C82A89B738BD0BD6B1F407,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

localhost,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N

 

导入关系表到HIVE

查看源数据

 

 

1-1)、导入数据

[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/mysql --username it --password it --table user --hive-import --m 1

 

****************

 

16/09/28 17:43:22 INFO tool.CodeGenTool: Beginning code generation

16/09/28 17:43:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1

16/09/28 17:43:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1

********************

16/09/28 17:47:02 INFO hive.HiveImport: Time taken: 18.294 seconds

16/09/28 17:47:04 INFO hive.HiveImport: Loading data to table default.tbls

16/09/28 17:47:10 INFO hive.HiveImport: Table default.tbls stats: [numFiles=1, totalSize=710]

16/09/28 17:47:10 INFO hive.HiveImport: OK

16/09/28 17:47:10 INFO hive.HiveImport: Time taken: 7.94 seconds

16/09/28 17:47:12 INFO hive.HiveImport: Hive import complete.

16/09/28 17:47:12 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

 

从以上的标红的部分发现数据保存的位置、、、

1-2)、查看数据

hive> show databases;

OK

default

Time taken: 0.418 seconds, Fetched: 5 row(s)

hive> use default;

OK

Time taken: 0.363 seconds

hive> show tables;

OK

tbls

Time taken: 0.104 seconds, Fetched: 1 row(s)

hive> select * from tbls;

OK

1 1474767895 2 0 root 0 1 hive MANAGED_TABLE null null

2 1474769518 2 0 root 0 2 t_test MANAGED_TABLE null null

6 1474773899 2 0 root 0 6 t_partition MANAGED_TABLE null null

7 1474773977 2 0 root 0 7 t_part MANAGED_TABLE null null

8 1474804219 2 0 root 0 10 td_ext_a EXTERNAL_TABLE null null

9 1474805273 2 0 root 0 11 td_ext_b EXTERNAL_TABLE null null

10 1474805448 2 0 root 0 12 td_ext_c MANAGED_TABLE null null

11 1474805587 2 0 root 0 13 stu__buck MANAGED_TABLE null null

16 1475100139 6 0 root 0 16 hive MANAGED_TABLE null null

21 1475101361 11 0 root 0 21 hive1 MANAGED_TABLE null null

26 1475101790 16 0 root 0 26 hive2 MANAGED_TABLE null null

27 1475101831 16 0 root 0 27 hive3 MANAGED_TABLE null null

Time taken: 6.308 seconds, Fetched: 12 row(s)

 

 

 

按照条件导入数据

1-1)、Where 条件查询

select * from TBLS where TBL_NAME = 't_part'

 

 

 

1-1)导入数据

[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it  --where "TBL_NAME ='t_part'"  --target-dir /wherequery --table TBLS --m 1

 

******************

16/09/28 21:51:51 INFO mapred.MapTask: Processing split: 1=1 AND 1=1

16/09/28 21:51:52 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1

16/09/28 21:51:52 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TBLS` AS `TBLS` WHERE ( TBL_NAME ='t_part' ) AND ( 1=1 ) AND ( 1=1 )

 

******************

16/09/28 21:51:56 INFO mapreduce.ImportJobBase: Transferred 57 bytes in 34.7512 seconds (1.6402 bytes/sec)

16/09/28 21:51:56 INFO mapreduce.ImportJobBase: Retrieved 1 records.

 

1-2 ) 查看数据

[root@hadoop1 testData]# hadoop fs -cat /wherequery/part-m-00000

7,1474773977,2,0,root,0,7,t_part,MANAGED_TABLE,null,null

 

--where 需要的条件

 

 

 

1-1)、Select  查询

1-1)导入数据

 

 

[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it --target-dir /select --query 'select TBL_ID,TBL_NAME,TBL_TYPE from TBLS WHERE  TBL_ID>2 and $CONDITIONS' --split-by id --fields-terminated-by '\t' --m 1

 

**************

16/09/28 22:24:33 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1

16/09/28 22:24:33 INFO db.DBRecordReader: Executing query: select TBL_ID,TBL_NAME,TBL_TYPE from TBLS WHERE  TBL_ID>2 and ( 1=1 ) AND ( 1=1 )

16/09/28 22:24:33 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false

 

**************

16/09/28 22:24:37 INFO mapreduce.ImportJobBase: Transferred 269 bytes in 25.5587 seconds (10.5248 bytes/sec)

16/09/28 22:24:37 INFO mapreduce.ImportJobBase: Retrieved 11 records.

 

--query 查询语句的形式读取数据

--split-by 对某一个字段进行分割

--fields-terminated-by 以什么分隔符分割数据

 

1-2)查看数据

[root@hadoop1 testData]# hadoop fs -cat /select/part-m-00000

6 t_partition MANAGED_TABLE

7 t_part MANAGED_TABLE

8 td_ext_a EXTERNAL_TABLE

9 td_ext_b EXTERNAL_TABLE

10 td_ext_c MANAGED_TABLE

11 stu__buck MANAGED_TABLE

16 hive MANAGED_TABLE

21 hive1 MANAGED_TABLE

26 hive2 MANAGED_TABLE

27 hive3 MANAGED_TABLE

31 tbls MANAGED_TABLE

 

增量导入

增量导入值的是在导入时分批导入,这样避免了一个表过大浪费时间,需要在语句中新增

Incremental , check-column , last-value 来定义,选项如下:

--incremental <mode>     增量的方式

--check-column <column name>   约束的字段

--last value <last check column value>  以上次的数据的位置开始导入数据

 

1-1)、导入数据

[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it --table TBLS  --m 1 --incremental append --check-column TBL_ID --last-value 10

**************

16/09/28 22:38:54 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`TBL_ID`) FROM `TBLS`

16/09/28 22:38:54 INFO tool.ImportTool: Incremental import based on column `TBL_ID`

16/09/28 22:38:54 INFO tool.ImportTool: Lower bound value: 10

16/09/28 22:38:54 INFO tool.ImportTool: Upper bound value: 31

16/09/28 22:38:54 WARN manager.MySQLManager: It looks like you are importing from mysql.

*************

16/09/28 22:39:15 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1

16/09/28 22:39:15 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TBLS` AS `TBLS` WHERE ( `TBL_ID` > 10 AND `TBL_ID` <= 31 ) AND ( 1=1 ) AND ( 1=1 )

**************

16/09/28 22:39:19 INFO tool.ImportTool:  --incremental append

16/09/28 22:39:19 INFO tool.ImportTool:   --check-column TBL_ID

16/09/28 22:39:19 INFO tool.ImportTool:   --last-value 31

16/09/28 22:39:19 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

1-2)、查看数据

[root@hadoop1 testData]# hadoop fs -cat /user/root/TBLS/part-m-00000

11,1474805587,2,0,root,0,13,stu__buck,MANAGED_TABLE,null,null

16,1475100139,6,0,root,0,16,hive,MANAGED_TABLE,null,null

21,1475101361,11,0,root,0,21,hive1,MANAGED_TABLE,null,null

26,1475101790,16,0,root,0,26,hive2,MANAGED_TABLE,null,null

27,1475101831,16,0,root,0,27,hive3,MANAGED_TABLE,null,null

31,1475110018,1,0,root,0,31,tbls,MANAGED_TABLE,null,null

 

1-3)、按照时间增将导入数据

–incremental lastmodified –check-column created –last-value ‘2012-02-01 11:0:00′

就是只导入created 比’2012-02-01 11:0:00’更大的数据。

Sqoop的数据导出

在mysql 数据库中创建sqoop数据库,并在数据库中添加一个表结构:

SET FOREIGN_KEY_CHECKS=0;

 

-- ----------------------------

-- Table structure for TBLS

-- ----------------------------

DROP TABLE IF EXISTS `TBLS`;

CREATE TABLE `TBLS` (

  `TBL_ID` bigint(20) NOT NULL,

  `CREATE_TIME` int(11) NOT NULL,

  `DB_ID` bigint(20) DEFAULT NULL,

  `LAST_ACCESS_TIME` int(11) NOT NULL,

  `OWNER` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,

  `RETENTION` int(11) NOT NULL,

  `SD_ID` bigint(20) DEFAULT NULL,

  `TBL_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,

  `TBL_TYPE` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,

  `VIEW_EXPANDED_TEXT` mediumtext,

  `VIEW_ORIGINAL_TEXT` mediumtext,

  PRIMARY KEY (`TBL_ID`),

  UNIQUE KEY `UNIQUETABLE` (`TBL_NAME`,`DB_ID`),

  KEY `TBLS_N50` (`DB_ID`),

  KEY `TBLS_N49` (`SD_ID`),

  CONSTRAINT `TBLS_FK1` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`),

  CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

 

1-1)、导出数据

[root@hadoop1 bin]# ./sqoop export --connect jdbc:mysql://localhost:3306/sqoop --username it --password it --table TBLS  --export-dir /user/root/TBLS/part-m-00000

************

*****************

16/09/28 23:07:19 INFO mapreduce.ExportJobBase: Transferred 2.3008 KB in 21.7299 seconds (108.422 bytes/sec)

16/09/28 23:07:19 INFO mapreduce.ExportJobBase: Exported 6 records.

 

1-2)、查看数据

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema  |

| azkaban            |

| hive               |

| mysql              |

| performance_schema |

| sqoop              |

+--------------------+

6 rows in set (0.08 sec)

 

mysql> use sqoop

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

 

Database changed

mysql> show tables;

+-----------------+

| Tables_in_sqoop |

+-----------------+

| TBLS      |

+-----------------+

1 row in set (0.00 sec)

mysql> select * from TBLS;

+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+

| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME  | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |

+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+

|     11 |  1474805587 |     2 |                0 | root  |         0 |    13 | stu__buck | MANAGED_TABLE | NULL               | NULL               |

|     16 |  1475100139 |     6 |                0 | root  |         0 |    16 | hive      | MANAGED_TABLE | NULL               | NULL               |

|     21 |  1475101361 |    11 |                0 | root  |         0 |    21 | hive1     | MANAGED_TABLE | NULL               | NULL               |

|     26 |  1475101790 |    16 |                0 | root  |         0 |    26 | hive2     | MANAGED_TABLE | NULL               | NULL               |

|     27 |  1475101831 |    16 |                0 | root  |         0 |    27 | hive3     | MANAGED_TABLE | NULL               | NULL               |

|     31 |  1475110018 |     1 |                0 | root  |         0 |    31 | tbls      | MANAGED_TABLE | NULL               | NULL               |

+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+

6 rows in set (0.00 sec)

 

Sqoop 常用Mysql操作语句

1-1)、查看mysql有多少个数据库

[root@hadoop1 ~]# sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password 123456

information_schema

azkaban

hive

mysql

performance_schema

Xiaoxu

 

list-databases :数据库的集合

1-2)、查看某一个数据库下的表

[root@hadoop1 ~]# sqoop list-tables --connect jdbc:mysql://localhost:3306/xiaoxu --username root --password 123456

 

1-3)、将mysql表中的表结构复制到Hive表中

sqoop create-hive-table --connect jdbc:mysql://hadoop1:3306/xiaoxu --table sp_address_code --username root --password 123456 --hive-table  sp_address_code

 

1-4)、将mysql中的数据导入到hive中

sqoop import --connect jdbc:mysql://localhost:3306/xiaoxu --username root --password 123456 --table sp_address_code --hive-import  --m 1

 

1-5)、将hive中的表数据导入到mysql中

[root@hadoop1 start-single]# sqoop export --connect jdbc:mysql://localhost:3306/sp_address --username root --password 123456 --table sp_address --export-dir /user/hive/warehouse/xiaoxu.db/sp_address  -m 20

Sqoop作业

sqoop 作业也就是将定义好的数据导入与导出任务按照指定的流程执行。

 

Sqoop 作业的语法如下:

$ sqoop job (generic-args) (job-args)

   [-- [subtool-name] (subtool-args)]

 

$ sqoop-job (generic-args) (job-args)

   [-- [subtool-name] (subtool-args)]

 

1-1)、创建作业

[root@hadoop1 bin]# ./sqoop job --create person_job -- import --connect jdbc:mysql://localhost:3306/sqoop --username it --password it --table TEST -m 1

*************

16/09/29 01:06:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

16/09/29 01:06:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

 

注意以上的标红的空格。

1-2)、验证作业

A)、 job  --list 验证

[root@hadoop1 bin]# ./sqoop job --list

***************

16/09/29 01:07:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Available jobs:

  person_job

 

B)、show  jobName 验证

[root@hadoop1 bin]# ./sqoop job --show person_job

16/09/29 01:10:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Enter password:

Job: person_job

Tool: import

Options:

----------------------------

verbose = false

db.connect.string = jdbc:mysql://localhost:3306/sqoop

codegen.output.delimiters.escape = 0

codegen.output.delimiters.enclose.required = false

codegen.input.delimiters.field = 0

hbase.create.table = false

db.require.password = true

hdfs.append.dir = false

db.table = TEST

codegen.input.delimiters.escape = 0

import.fetch.size = null

accumulo.create.table = false

codegen.input.delimiters.enclose.required = false

db.username = it

reset.onemapper = false

codegen.output.delimiters.record = 10

import.max.inline.lob.size = 16777216

hbase.bulk.load.enabled = false

hcatalog.create.table = false

db.clear.staging.table = false

codegen.input.delimiters.record = 0

enable.compression = false

hive.overwrite.table = false

hive.import = false

codegen.input.delimiters.enclose = 0

accumulo.batch.size = 10240000

hive.drop.delims = false

codegen.output.delimiters.enclose = 0

hdfs.delete-target.dir = false

codegen.output.dir = .

codegen.auto.compile.dir = true

relaxed.isolation = false

mapreduce.num.mappers = 1

accumulo.max.latency = 5000

import.direct.split.size = 0

codegen.output.delimiters.field = 44

export.new.update = UpdateOnly

incremental.mode = None

hdfs.file.format = TextFile

codegen.compile.dir = /tmp/sqoop-root/compile/d96fb0fc73fdb5ad13067fdcd914d3a1

direct.import = false

hive.fail.table.exists = false

db.batch = false

 

遇到以上的输入密码输入mysql登录的密码。

1-3)、执行作业

[root@hadoop1 bin]# ./sqoop job --exec person_job

16/09/29 01:19:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Enter password:

16/09/29 01:19:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

**************

16/09/29 01:20:07 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1

16/09/29 01:20:07 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TEST` AS `TEST` WHERE ( 1=1 ) AND ( 1=1 )

*************

16/09/29 01:20:11 INFO output.FileOutputCommitter: Saved output of task 'attempt_local698116558_0001_m_000000_0' to hdfs://hadoop1:9000/user/root/TEST/_temporary/0/task_local698116558_0001_m_000000

16/09/29 01:20:11 INFO mapred.LocalJobRunner: map

*************

16/09/29 01:20:12 INFO mapreduce.ImportJobBase: Transferred 353 bytes in 36.2109 seconds (9.7484 bytes/sec)

16/09/29 01:20:12 INFO mapreduce.ImportJobBase: Retrieved 6 records.

 

遇到以上的输入密码输入mysql登录的密码。

1-4)、查看数据

[root@hadoop1 ~]# hadoop fs -cat /user/root/TEST/part-m-00000

11,1474805587,2,0,root,0,13,stu__buck,MANAGED_TABLE,null,null

16,1475100139,6,0,root,0,16,hive,MANAGED_TABLE,null,null

21,1475101361,11,0,root,0,21,hive1,MANAGED_TABLE,null,null

26,1475101790,16,0,root,0,26,hive2,MANAGED_TABLE,null,null

27,1475101831,16,0,root,0,27,hive3,MANAGED_TABLE,null,null

31,1475110018,1,0,root,0,31,tbls,MANAGED_TABLE,null,null

猜你喜欢

转载自blog.csdn.net/xfg0218/article/details/82343564