[Sqoop] Synchronize the mysql table to the hive partition table

 The following article is importing mysql table into partition table in hive TEXTfile storage format

Import mysql table to partition table in hive ORC storage format , please click here to jump

 1. The storage location of hdfs imported from mysql to hive table partition data == (mysql-->>hdfs)

sqoop import \
--connect jdbc:mysql://IP:3306/DATABASE \
--username USENAME--password PWD \
--fields-terminated-by ',' \
--m 1 \
--query "select * from TABLE where COLUMN='VALUE' and \$CONDITIONS" \
--target-dir /user/hive/warehouse/DATABASE.db/TABLE/PARTITION_NAME=PARTITION_VALUE/ \
--delete-target-dir 

Capitalized places need to be modified by yourself

The and \$CONDITIONS of the where clause should not be modified, it must be added

 2. mysql-->>hive partition table (the hive table will be automatically created if it does not exist)

sqoop import \
--connect jdbc:mysql://IP:3306/DB\
--username root --password PWD \
--query "select * from tab_task where task_createTime='2020-12-30' and \$CONDITIONS" \
--fields-terminated-by ',' \
--delete-target-dir \
--hive-import \
--m 1 \
--hive-partition-key dt \
--hive-partition-value 2020-12-30 \
--hive-database DB\
--hive-table tab_task \
--target-dir /user/hive/warehouse/DB.db/tab_task/dt=2020-12-30/ \
--delete-target-dir \
--direct

 If the partition field task_createTime is not in yyyy-MM-dd format, you can use date_format(${task_createTime},'%Y-%m-%d')

  1. To synchronize partition tables, specify query
  2. query and add \$CONDITIONS at the end

Guess you like

Origin blog.csdn.net/qq_44065303/article/details/112916572