sqoop：import

从mysql拉取：

方式一：

sqoop import --connect jdbc:mysql://172.***.***34/linj --username ***--password ***--table T_GW_*** --hive-import --hive-database os --hive-table os_shanghai_***1  --hive-overwrite --hive-drop-import-delims -z --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir -m 5 --as-parquetfile

sqoop import --connect jdbc:mysql://***.***.***.***/databasename \
--username *** \
--password *** \
--table t_table1 \
--hive-import \
--hive-database hivedbname \
--hive-table hive_t_table1 \
--hive-overwrite \
--hive-drop-import-delims \
--delete-target-dir \
-m 1 \
--as-parquetfile

方式二：使用sql语句导入

第一步：

直接上命令：

sqoop import \
--connect jdbc:mysql://172.***.***.170:3309/p_b \
--username p_b \
--password *****\
--query "select * from sparksql_hive_test where \$CONDITIONS" \
--target-dir tmp-bi-mysql -m 1 \     --这里其实可以写具体目录如：/a/b
--delete-target-dir \
--num-mappers 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--direct \
--fields-terminated-by '\t'

加where条件：

sqoop import \
--connect jdbc:mysql://***.***.***.***:3306/databasename \
--username ***\
--password ***\
--query "select * from sparksql_hive_test where remark1 regexp '[0-9][^a]\..*' and \$CONDITIONS" \
--target-dir tmp-bi-mysql -m 1 \
--delete-target-dir \
--num-mappers 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--direct \
--fields-terminated-by '\t'

执行：

hadoop fs -ls tmp-bi-mysql/
hadoop fs -cat tmp-bi-mysql/part-*

正常的话会有打印结果，有数据，格式是你指定的。

第二步：

drop table if exists tmp.hive_mobile_test1;
CREATE TABLE `tmp.hive_mobile_test1`(
  `id` bigint, 
  `mobile` string, 
  `group_name` string
  )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;



drop table if exists tmp.hive_mobile_test2;
CREATE TABLE `tmp.hive_mobile_test2`(
  `id` bigint, 
  `mobile` string, 
  `group_name` string,
  `remark1` string,
  `remark2` string
  
  )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;

第三步：

打开客户端执行：

load data inpath 'tmp-bi-mysql' into table tmp.hive_mobile_test1;
load data inpath 'tmp-bi-mysql' into table tmp.hive_mobile_test2;
注意：需要import两次数据，分别load data

结果：

扫描二维码关注公众号，回复： 8668187 查看本文章

需要哪几个字段，建表时指定对应字段即可。

参考：https://www.cnblogs.com/xuyou551/p/7998846.html

增量导入+追加：

参考：

https://www.cnblogs.com/Alcesttt/p/11432547.html

通过id字段进行增量导入：

sqoop import \
--connect jdbc:mysql://***.***.***.***:3309/p_b \
--username *** \
--password ***\
--table sparksql_hive_test \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--hive-import \
--hive-database tmp \
--hive-table sparksql_hive_test00 \
--incremental append \
--check-column id \
--last-value '5' -m 1 \
--null-string '\\N' \
--null-non-string '\\N'

效果：

通过时间字段进行增量导入：

sqoop import \
--connect jdbc:mysql://***.***.***.***:3309/p_b \
--username *** \
--password ***\
--table t_cal_base_busess_daa \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--hive-import \
--hive-database tmp \
--hive-table t_cal_base_busess_daa \
--incremental append \
--check-column statistic_time \
--last-value '2019-07-20 00:00:00' -m 1 \
--null-string '\\N' \
--null-non-string '\\N'

传入变量：

#!/bin/bash

last_day=`date -d "-1days" +%Y-%m-%d`
echo '====================last_day======================'
echo $last_day

effective_time='2019-07-20'
# effective_time='2019-07-20 00:00:00' 这种写法错误
echo '====================effective_time======================'
echo $effective_time


sqoop import \
--connect jdbc:mysql://***.***.***.***:3309/p_b \
--username *** \
--password ***\
--table t_cal_base_busess_daa \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--hive-import \
--hive-database tmp \
--hive-table t_cal_base_busess_daa \
--incremental append \
--check-column statistic_time \
--last-value ${effective_time} -m 1 \
--null-string '\\N' \
--null-non-string '\\N'


注意取变量这里不可以写为：'$effective_time'
但是query语句中可以这样写如：
#!/bin/bash
effective_time=`date -d "-1days" +%Y-%m-%d`
sqoop eval --connect "jdbc:mysql://***.***.***.***:3309/p_b?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC&useSSL=true" \
--username *** \
--password *** \
--e "delete from *** where left(statistic_time,10) ='$effective_time'"

以上执行效果是只会导入mysql那张表statistic_time大于2019年7月20日的数据。

以上语句使用 lastmodified 模式进行增量导入，结果报错：

错误信息：--incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified

错误原因：Sqoop 不支持 mysql转hive时使用 lastmodified 模式进行增量导入，但mysql转HDFS时可以支持该方式！

所以我们使用append方式导入。

从sqlserver拉取：

sqoop import --connect "jdbc:sqlserver://172.***.***.39;databaseName=***;username=***;password=***" --table AcAlts --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --hive-import --hive-table os.*** --hive-overwrite --hive-drop-import-delims --delete-target-dir -m 1

增量导入

sqoop支持两种增量导入到hive的模式，一种是 append，即通过指定一个递增的列，比如：

--incremental append --check-column id --last-value 0

另种是可以根据时间戳，比如：

　 --incremental lastmodified --check-column time --last-value '2013-01-01 11:0:00'

就是只导入time比'2013-01-01 11:0:00'更大的数据。

好，我试验的是第一种，我在插入了前面插入了差距的基础上，再插入WorkNo是201309071后面的数据

sqoop import  --connect 'jdbc:sqlserver://192.***.***.**5:1433;username=**;password=***;database=***'  --table ST_Statistics \
  --where "BigReason='OfficeSoftwareFault'"   --split-by ResponseTime --target-dir /user/cenyuhai/sams \
  --incremental append  --check-column WorkNo  --last-value 201309071

sqlserver-->hbase：

sqoop import  --connect 'jdbc:sqlserver://192.***.***.**5:1433;username=**;password=**;database=**' --table ST_Statistics --where "BigReason='OfficeSoftwareFault'" --split-by ResponseTime  --hbase-table ST_Statistics --hbase-create-table   --hbase-row-key WorkNo  --column-family cf

花和尚也有春天

发布了260 篇原创文章 · 获赞 119 · 访问量 51万+

他的留言板关注

从mysql拉取：

增量导入+追加：

从sqlserver拉取：

猜你喜欢