Sqoop 从mysql导入hive

1:查看所有的库

sqoop list-databases \
--connect jdbc:mysql://rm-xxxx.mysql.xxx.xxx.com:3306 \
--username xxxxx \
--password xxxxx

2: 列出库里面所有的表

sqoop list-tables \
--connect jdbc:mysql://xxxxx.xxx.xxx.xxxx.xxx:3306/05_invoxxx \
--username xxxx \
--password xxxxxx

3:mysql导入hive

sqoop import \
--connect jdbc:mysql://xxxxx.xxx.xxx.xxxx.xxx:3306/05_invoxxx \
--username xxxx \
--password xxxxxx 
--table enterprise_info \
--hive-import \
--create-hive-table \
--fields-terminated-by "\t" \
-m 5

hive 参数

–hive-import 必须参数，指定导入hive
–hive-database default hive库名
–hive-table people hive表名
–fields-terminated-by hive的分隔符
–hive-overwrite 重写重复字段
–create-hive-table 帮创建好 hive 表，但是表存在会出错。不建议使用这个参数，因为到导入的时候，会与我们的字段类型有出入。
–hive-partition-key “dt” 指定分区表的字段
–hive-partition-value “2018-08-08” 指定分区表的值

导出没有主键的表,可以使用两种方式：

过滤条件

–where “age>18” 匹配条件
–columns “name,age” 选择要导入的指定列
–query ‘select * from people where age>18 and $CONDITIONS’:

sql语句查询的结果集
不能 –table 一起使用
需要指定 –target-dir 路径
当数据库中字符为空时的处理
–null-non-string ‘0’ 当不是字符串的数据为空的时候，用 0 替换
–null-string ‘string’ 当字符串为空的时候，使用string 字符替换
提高传输速度
–direct 提高数据库到hadoop的传输速度
支持的数据库类型与版本：
* myslq 5.0 以上
* oracle 10.2.0 以上

增量导入
增量导入对应，首先需要知监控那一列，这列要从哪个值开始增量

* check-column id 用来指定一些列
* 这些被指定的列的类型不能使任意字符类型，如char、varchar等类型都是不可以的,常用的是指定主键id.
* –check-column 可以去指定多个列

last-value 10 从哪个值开始增量
incremental 增量的模式

append id 是获取大于某一列的某个值。
lastmodified “2016-12-15 15:47:30” 获取某个时间后修改的所有数据

–append 附加模式
–merge-key id 合并模式
增量导入不能与 –delete-target-dir 一起使用，还有必须指定增量的模式

Sqoop 从mysql导入hive

猜你喜欢