一、Sqoop基本原理

1.1、何为Sqoop？

Sqoop(SQL-to-Hadoop)是一款开源的工具，主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递，可以将一个关系型数据库（例如： MySQL ,Oracle ,Postgres等）中的数据导入到Hadoop的HDFS中，也可以将HDFS的数据导出到关系型数据库中。

1.2、为什么需要用Sqoop？

我们通常把有价值的数据存储在关系型数据库系统中，以行和列的形式存储数据，以便于用户读取和查询。但是当遇到海量数据时，我们需要把数据提取出来，通过MapReduce对数据进行加工，获得更符合我们需求的数据。数据的导入和导出本质上是Mapreduce程序，充分利用了MR的并行化和容错性。为了能够和HDFS系统之外的数据库系统进行数据交互，MapReduce程序需要使用外部API来访问数据，因此我们需要用到Sqoop。

1.3、关系图

在这里插入图片描述

1.4、架构图

在这里插入图片描述

在 mapreduce 中主要是对 inputformat 和 outputformat 进行定制。
Sqoop工具接收到客户端的shell命令或者Java api命令后，通过Sqoop中的任务翻译器(Task Translator)将命令转换为对应的MapReduce任务，而后将关系型数据库和Hadoop中的数据进行相互转移，进而完成数据的拷贝。

二、Sqoop可用命令

命令	方法
codegen	生成与数据库记录交互的代码
create-hive-table	将表定义导入到Hive中
eval	评估SQL语句并显示结果
export	导出一个HDFS目录到一个数据库表
help	可用命令列表
import	将一个表从数据库导入到HDFS
import-all-tables	从数据库导入表到HDFS
import-mainframe	从大型机服务器导入数据集到HDFS
job	使用已保存的工作
list-databases	列出服务器上可用的数据库
list-tables	列出数据库中可用的表
merge	合并增量导入的结果
metastore	运行一个独立的Sqoop转移
version	显示版本信息

对于不同的命令，有不同的参数，这里给大家列出来了一部分Sqoop操作时的常用参数，以供参考，需要深入学习的可以参看对应类的源代码，本文目前介绍常用的导入、导出的一些命令。

公用参数：数据库连接

参数	说明
–connect	连接关系型数据库的URL
–connection-manager	指定要使用的连接管理类
–driver	JDBC的driver class
–help	打印帮助信息
–username	连接数据库的用户名
–password	连接数据库的密码
–verbose	在控制台打印出详细信息

公用参数：import

参数	说明
–enclosed-by	给字段值前后加上指定的字符
–escaped-by	对字段中的双引号加转义符
–fields-terminated-by	设定每个字段是以什么符号作为结束，默认为逗号
–lines-terminated-by	设定每行记录之间的分隔符，默认是\n
–mysql-delimiters	Mysql默认的分隔符设置，字段之间以逗号分隔，行之间以\n分隔，默认转义符是\，字段值以单引号包裹。
–optionally-enclosed-by	给带有双引号或单引号的字段值前后加上指定字符。

公用参数：export

参数	说明
–input-enclosed-by	对字段值前后加上指定字符
–input-escaped-by	对含有转移符的字段做转义处理
–input-fields-terminated-by	字段之间的分隔符
–input-lines-terminated-by	行之间的分隔符
–input-optionally-enclosed-by	给带有双引号或单引号的字段前后加上指定字符

公用参数：hive

参数	说明
–hive-delims-replacement	用自定义的字符串替换掉数据中的\r\n和\013 \010等字符
–hive-drop-import-delims	在导入数据到hive时，去掉数据中的\r\n\013\010这样的字符
–map-column-hive < map>	生成hive表时，可以更改生成字段的数据类型
–hive-partition-key	创建分区，后面直接跟分区名，分区字段的默认类型为string
–hive-partition-value	导入数据时，指定某个分区的值
–hive-home	hive的安装目录，可以通过该参数覆盖之前默认配置的目录
–hive-import	将数据从关系数据库中导入到hive表中
–hive-overwrite	覆盖掉在hive表中已经存在的数据
–create-hive-table	默认是false，即，如果目标表已经存在了，那么创建任务失败
–hive-table	后面接要创建的hive表,默认使用MySQL的表名
–table	指定关系数据库的表名

其余

命令	含义
-m N	指定启动N个map进程
–num-mappers N	指定启动N个map进程
–query	后跟查询的SQL语句
–incremental mode	mode：append或lastmodified
–check-column	作为增量导入判断的列名
–split-by	按照某一列来切分表的工作单元，不能与–autoreset-to-one-mapper连用
–last-value	指定某一个值，用于标记增量导入的位置
–target-dir	指定HDFS路径
–delete-target-dir	若hdfs存放目录已存在，则自动删除

三、Sqoop常用方法

先在mysql中建一张表来使用

create table student(
    sid int primary key,
    sname varchar(16) not null,
    gender enum('女','男') not null default '男',
    age int not null
);

insert into student(sid,sname,gender,age) values
(1,'孙尚香','女',15),
(2,'貂蝉','女',16),
(3,'刘备','男',17),
(4,'孙二娘','女',16),
(5,'张飞','男',15),
(6,'关羽','男',18),

3.1、RDBMS => HDFS (导入重点)

3.1.1、全表导入

命令：

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table student \
--target-dir /sqooptest/table_all \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by ','

MR的数据处理过程
在这里插入图片描述
查看/sqooptest/table_all目录下，生成了数据结果

查看hdfs的数据

hdfs dfs -cat /sqooptest/table_all/part-m-00000

数据结果如下

1,孙尚香,女,15
2,貂蝉,女,16
3,刘备,男,17
4,孙二娘,女,16
5,张飞,男,15
6,关羽,男,18

3.1.2、查询导入

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--target-dir /sqooptest/select_test \
--num-mappers 1 \
--query 'select sname,gender from student where $CONDITIONS'

where语句中必须有 $CONDITIONS，表示将查询结果带回。如果query后使用的是双引号，则 $CONDITIONS前必须加转移符，防止shell识别为自己的变量。
在这里插入图片描述

hdfs dfs -cat /sqooptest/select_test/part-m-00000

数据结果如下

孙尚香,女
貂蝉,女
刘备,男
孙二娘,女
张飞,男
关羽,男

3.1.3、导入指定列

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table student \
--columns sid,sname,age \
--target-dir /sqooptest/column_test \
--num-mappers 1 \
--fields-terminated-by "|"

注意:columns中如果涉及到多列，用逗号分隔，分隔时不要添加空格

在这里插入图片描述

hdfs dfs -cat /sqooptest/column_test/part-m-00000

数据结果如下

1|孙尚香|15
2|貂蝉|16
3|刘备|17
4|孙二娘|16
5|张飞|15
6|关羽|18

3.1.4、where语句过滤

源表数据
在这里插入图片描述

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table student \
--where "sid>=6" \
--target-dir /sqooptest/wheretest \
-m 2

在这里插入图片描述
得到了如下 “sid>=6” 的数据

[root@single ~]# hdfs dfs -cat /sqooptest/wheretest/*
6,关羽,男,18
7,云中君,男,19
8,百里玄策,男,20
9,裴擒虎,男,17

3.1.5、①增量导入 append

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--query "select sid,sname,gender from student where \$CONDITIONS" \
--target-dir /sqooptest/add1 \
--split-by sid \
-m 2 \
--incremental append \
--check-column sid \
--last-value 0

–split-by 和 -m 结合实现numberReduceTasks并行

后面两句
–check-column sid
–last-value 0
结合使用的效果类似于where sid>0

MR过程中部分关键信息如下

--sid界限值是0-6
20/11/20 05:17:42 INFO tool.ImportTool: Incremental import based on column `sid`
20/11/20 05:17:42 INFO tool.ImportTool: Lower bound value: 0
20/11/20 05:17:42 INFO tool.ImportTool: Upper bound value: 6
--条件是where `sid` > 0 AND `sid` <= 6
20/11/20 05:17:48 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(sid), MAX(sid) FROM (select sid,sname,gender from student where `sid` > 0 AND `sid` <= 6 AND  (1 = 1) ) AS t1
--指定了两个maptask
20/11/20 05:17:48 INFO mapreduce.JobSubmitter: number of splits:2
--提示last-value即sid是6
20/11/20 05:18:06 INFO tool.ImportTool:  --incremental append
20/11/20 05:18:06 INFO tool.ImportTool:   --check-column sid
20/11/20 05:18:06 INFO tool.ImportTool:   --last-value 6

因为有两个maptask，所以会分成两份文件
在这里插入图片描述

hdfs dfs -cat /sqooptest/add1/part-m-*

数据结果如下

1,孙尚香,女
2,貂蝉,女
3,刘备,男
4,孙二娘,女
5,张飞,男
6,关羽,男

此时往mysql中再添加几条数据，再进行一次增量导入

insert into student(sid,sname,gender,age) values(7,'云中君','男',19),(8,'百里玄策','男',20),(9,'裴擒虎','男',17);

再执行一次增量导入

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--query "select sid,sname,gender,age from student where \$CONDITIONS" \
--target-dir /sqooptest/add1 \
-m 1 \
--incremental append \
--check-column sid \
--last-value 6

此时多了一个文件
在这里插入图片描述

hdfs dfs -cat /sqooptest/add1/part-m-*

数据结果如下

1,孙尚香,女
2,貂蝉,女
3,刘备,男
4,孙二娘,女
5,张飞,男
6,关羽,男
7,云中君,男,19
8,百里玄策,男,20
9,裴擒虎,男,17

3.1.5、②增量导入 lastmodified

先在mysql创建一张新表

create table orderinfo(
	oid int primary key,
	oName varchar(10) not null,
	oPrice double not null,
	oTime timestamp not null
);

insert into orderinfo(oid,oName,oPrice,oTime) values(1,'爱疯12',6500.0,'2020-11-11 00:00:00'),(2,'华为xpro',12000.0,'2020-10-1 12:52:33'),(3,'行李箱',888.8,'2019-5-22 21:56:17'),(4,'羽绒服',1100.0,'2018-3-7 14:22:31');

在这里插入图片描述

–incremental lastmodified修改和增加此时搭配–check-column 必须为timestamp类型

使用lastmodified方式导入数据要指定增量数据是要–append（追加）还是要–merge-key（合并）

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table orderinfo \
--target-dir /sqooptest/lastmod \
-m 1

在这里插入图片描述

hdfs dfs -cat /sqooptest/lastmod/part-m-00000

数据结果如下

1,爱疯12,6500.0,2020-11-11 00:00:00.0
2,华为xpro,12000.0,2020-10-01 12:52:33.0
3,行李箱,888.8,2019-05-22 21:56:17.0
4,羽绒服,1100.0,2018-03-07 14:22:31.0

往mysql的orderinfo表中新插入几条数据，然后增量导入

insert into orderinfo(oid,oName,oPrice,oTime) values(5,'帕拉梅拉',1333333.3,'2020-4-7 12:23:34'),(6,'保温杯',86.5,'2017-3-5 22:52:16'),(7,'枸杞',46.3,'2019-10-5 11:11:11'),(8,'电动牙刷',350.0,'2019-9-9 12:21:41');

在这里插入图片描述

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table orderinfo \
--target-dir /sqooptest/lastmod \
-m 1 \
--incremental lastmodified \
--check-column oTime \
--merge-key oid \
--last-value "2019-10-1 12:12:12"

执行后合并了文件
从part-m-00000变成了part-r-00000
在这里插入图片描述

hdfs dfs -cat /sqooptest/lastmod/part-r-00000

数据结果如下

1,爱疯12,6500.0,2020-11-11 00:00:00.0
2,华为xpro,12000.0,2020-10-01 12:52:33.0
3,行李箱,888.8,2019-05-22 21:56:17.0
4,羽绒服,1100.0,2018-03-07 14:22:31.0
5,帕拉梅拉,1333333.3,2020-04-07 12:23:34.0
7,枸杞,46.3,2019-10-05 11:11:11.0

发现只添加了两条记录，因为序号为6和8的记录的时间不在–last-value的范围内

3.2、RDBMS => HBase

先在hbase中建表

hbase(main):007:0> create 'sqooptest:sqstudent','stuinfo'

使用sqoop开始导入数据

sqoop import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table student \
--hbase-table sqooptest:sqstudent \
--column-family stuinfo \
--hbase-create-table \
--hbase-row-key sid

–column-family stuinfo
指定列族为stuinfo
–hbase-create-table
若表不存在，则自动创建
–hbase-row-key sid
指定行键为sid

查看hbase表数据

hbase(main):008:0> scan 'sqooptest:sqstudent'
ROW                         COLUMN+CELL                                                                    
 1                          column=stuinfo:age, timestamp=1605958889301, value=15                          
 1                          column=stuinfo:gender, timestamp=1605958889301, value=\xE5\xA5\xB3             
 1                          column=stuinfo:sname, timestamp=1605958889301, value=\xE5\xAD\x99\xE5\xB0\x9A\x
                            E9\xA6\x99                                                                     
 2                          column=stuinfo:age, timestamp=1605958889301, value=16                          
 2                          column=stuinfo:gender, timestamp=1605958889301, value=\xE5\xA5\xB3             
 2                          column=stuinfo:sname, timestamp=1605958889301, value=\xE8\xB2\x82\xE8\x9D\x89  
...
...
...                                                     
 9                          column=stuinfo:age, timestamp=1605958892765, value=17                          
 9                          column=stuinfo:gender, timestamp=1605958892765, value=\xE7\x94\xB7             
 9                          column=stuinfo:sname, timestamp=1605958892765, value=\xE8\xA3\xB4\xE6\x93\x92\x
                            E8\x99\x8E                                                                     
9 row(s) in 0.1830 seconds

HBase中的数据没有数据类型，统一存储为字节码，是否显示具体的汉字只是前端显示问题，此处没有解决，因此gender和sname字段显示的都是字节码

3.3、RDBMS => Hive

3.3.1、导入普通表

将mysql中retail_db库下的orders表导入hive

sqoop import \
--connect jdbc:mysql://single:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--username root \
--password kb10 \
--table orders \
--hive-import \
--hive-database sqooptest \
--create-hive-table \
--hive-table orders \
--hive-overwrite \
-m 3

导入过程日志会提示mapreduce过程等信息，导入成功日志最后会提示如下信息

20/11/21 20:08:07 INFO hive.HiveImport: OK
20/11/21 20:08:07 INFO hive.HiveImport: Time taken: 5.899 seconds
20/11/21 20:08:07 INFO hive.HiveImport: Loading data to table default.sqstudent
20/11/21 20:08:08 INFO hive.HiveImport: Table default.sqstudent stats: [numFiles=1, totalSize=162]
20/11/21 20:08:08 INFO hive.HiveImport: OK
20/11/21 20:08:08 INFO hive.HiveImport: Time taken: 0.722 seconds
20/11/21 20:08:08 INFO hive.HiveImport: Hive import complete.

在hdfs的hive工作目录中生成文件
在这里插入图片描述

查看一下hive的orders表数据
在这里插入图片描述

3.3.2、导入分区表

sqoop import \
--connect jdbc:mysql://single:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--username root \
--password kb10 \
--query "select order_id,order_status from orders where 
order_date>='2014-07-02' and order_date<'2014-07-03' and \$CONDITIONS" \
--hive-import \
--hive-database sqooptest \
--hive-table order_partition \
--hive-partition-key 'order_date' \
--hive-partition-value '2014-07-02' \
-m 1

执行结果
在这里插入图片描述

查询结果

3.4、Hive/Hdfs => RDBMS

先在mysql中建表

create table hiveTomysql(
	sid int primary key,
	sname varchar(5) not null,
	gender varchar(1) default '男',
	age int not null
);

我们把刚才在hive中创建的sqstudent表数据再导出到mysql中

sqoop export \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table hiveTomysql \
--num-mappers 1 \
--export-dir /opt/software/hadoop/hive110/warehouse/sqstudent/part-m-00000 \
--input-fields-terminated-by ","

查看一下mysql中的数据结果

在这里插入图片描述

3.5、Sqoop Job

job参数说明

Argument	Description
–create JOB_NAME	创建job参数
–delete JOB_NAME	删除一个job
–exec JOB_NAME	执行一个job
–help	显示job帮助
–list	显示job列表
–help	显示job帮助
–meta-connect < jdbc-uri>	用来连接metastore服务
–show JOB_NAME	显示一个job的信息
–verbose	打印命令运行时的详细信息

创建job
- -和import之间有个空格。这里–空格之后表示给job添加参数，而恰好import又不需要–，所以这个空格很容易被忽略。

sqoop job --create myjob \
-- import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password kb10 \
--table student \
--target-dir /sqooptest/myjob \
-m 1 \
--lines-terminated-by '\n' \
--null-string '\\N' \
--null-non-string '\\N'

查看job

sqoop job --list

结果

Warning: /opt/software/hadoop/sqoop146/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/hadoop/sqoop146/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/hadoop/sqoop146/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/11/20 06:34:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Available jobs:
  myjob

显示job

sqoop job --show myjob

结果
此时未配备密码，因此在过程中需要输入密码

Warning: /opt/software/hadoop/sqoop146/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/hadoop/sqoop146/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/hadoop/sqoop146/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/11/20 06:36:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password: 
Job: myjob
Tool: import
Options:
...

删除job

sqoop job --delete myjob

保存密码
保存的job里并没有保存明文的password，所以每次使用job，都会要求提供密码【-P交互式】。
而在使用Oozie等这些自动化调度的时候，是没有机会输入password的。所以需要保存password。

1、使用–password-file文件

You should save the password in a file on the users home directory with 400 permissions and specify the path to that file using the –password-file argument, and is the preferred method of entering credentials. Sqoop will then read the password from the file and pass it to the MapReduce cluster using secure means with out exposing the password in the job configuration. The file containing the password can either be on the Local FS or HDFS

文件存放在/home/user(主机名)下，文件的权限应该是400，支持本地文件和HDFS。

#生成文件
echo -n "secret" > password.file
#本地文件
--password-file file:///home/username/.password.file \
#hdfs
--password-file hdfs://user/username/.password.file \

在这里插入图片描述
我们新建一个指定密码文件路径的job

sqoop job --create pwdjob \
-- import \
--connect jdbc:mysql://single:3306/sqoop_test \
--username root \
--password-file file:///home/single/password.file \
--table student \
--target-dir /sqooptest/pwdjob \
-m 1 \
--lines-terminated-by '\n' \
--null-string '\\N' \
--null-non-string '\\N'

查看新建的job

[root@single single]# sqoop job --list
Warning: /opt/software/hadoop/sqoop146/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/hadoop/sqoop146/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/hadoop/sqoop146/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/11/20 16:38:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Available jobs:
  pwdjob

执行job

sqoop job --exec pwdjob

执行过程如下，已经不需要输入密码

Warning: /opt/software/hadoop/sqoop146/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/hadoop/sqoop146/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/hadoop/sqoop146/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/11/20 16:41:52 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/software/hadoop/hadoop260/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/hadoop/hbase120/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/20 16:41:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/11/20 16:41:52 INFO tool.CodeGenTool: Beginning code generation
20/11/20 16:41:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
20/11/20 16:41:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
20/11/20 16:41:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/software/hadoop/hadoop260/share/hadoop/mapreduce
Note: /tmp/sqoop-root/compile/bd96b7f9af6a63765e92e6cc0a3c49dd/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/11/20 16:41:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bd96b7f9af6a63765e92e6cc0a3c49dd/student.jar
20/11/20 16:41:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/11/20 16:41:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/11/20 16:41:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/11/20 16:41:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/11/20 16:41:54 INFO mapreduce.ImportJobBase: Beginning import of student
20/11/20 16:41:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/11/20 16:41:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/11/20 16:41:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/11/20 16:42:01 INFO db.DBInputFormat: Using read commited transaction isolation
20/11/20 16:42:01 INFO mapreduce.JobSubmitter: number of splits:1
20/11/20 16:42:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1605859678795_0002
20/11/20 16:42:02 INFO impl.YarnClientImpl: Submitted application application_1605859678795_0002
20/11/20 16:42:02 INFO mapreduce.Job: The url to track the job: http://single:8088/proxy/application_1605859678795_0002/
20/11/20 16:42:02 INFO mapreduce.Job: Running job: job_1605859678795_0002
20/11/20 16:42:07 INFO mapreduce.Job: Job job_1605859678795_0002 running in uber mode : false
20/11/20 16:42:07 INFO mapreduce.Job:  map 0% reduce 0%
20/11/20 16:42:13 INFO mapreduce.Job:  map 100% reduce 0%
20/11/20 16:42:13 INFO mapreduce.Job: Job job_1605859678795_0002 completed successfully
20/11/20 16:42:13 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=163464
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=162
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3481
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3481
		Total vcore-milliseconds taken by all map tasks=3481
		Total megabyte-milliseconds taken by all map tasks=3564544
	Map-Reduce Framework
		Map input records=9
		Map output records=9
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=44
		CPU time spent (ms)=720
		Physical memory (bytes) snapshot=171573248
		Virtual memory (bytes) snapshot=2781327360
		Total committed heap usage (bytes)=135266304
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=162
20/11/20 16:42:13 INFO mapreduce.ImportJobBase: Transferred 162 bytes in 19.0418 seconds (8.5076 bytes/sec)
20/11/20 16:42:13 INFO mapreduce.ImportJobBase: Retrieved 9 records.

查看执行结果
在这里插入图片描述

hdfs dfs -cat /sqooptest/pwdjob/part-m-00000

结果数据如下

1,孙尚香,女,15
2,貂蝉,女,16
3,刘备,男,17
4,孙二娘,女,16
5,张飞,男,15
6,关羽,男,18
7,云中君,男,19
8,百里玄策,男,20
9,裴擒虎,男,17

2、保存在sqoop元数据里

You can enable passwords in the metastore by setting sqoop.metastore.client.record.password to true in the configuration.

配置文件在conf/sqoop-site.xml里，设置之后的第一次运行，需要提供密码。

<property>
		<name>sqoop.metastore.client.record.password</name>
		<value>true</value>
		<description>If true, allow saved passwords in the metastore.</description>
</property>

注：关系图和架构图非原创，来自CSDN

sqoop常用命令地址

Sqoop基本原理及常用方法

Sqoop

一、Sqoop基本原理

1.1、何为Sqoop？

1.2、为什么需要用Sqoop？

1.3、关系图

1.4、架构图

二、Sqoop可用命令

三、Sqoop常用方法

3.1、RDBMS => HDFS (导入重点)

3.1.1、全表导入

3.1.2、查询导入

3.1.3、导入指定列

3.1.4、where语句过滤

3.1.5、①增量导入 append

3.1.5、②增量导入 lastmodified

3.2、RDBMS => HBase

3.3、RDBMS => Hive

3.3.1、导入普通表

3.3.2、导入分区表

3.4、Hive/Hdfs => RDBMS

3.5、Sqoop Job

猜你喜欢