Sqoop job (incremental synchronization scheme of 80 tables, used by Sqoop Job)

table of Contents

1. Overview of needs and plans

Two, Sqoop Job 

1. Sqoop job usage example

 2. Sqoop incremental import


1. Overview of needs and plans

  • Requirements : First, import all 83 MySQL tables into the Hive data warehouse, and then incrementally import the new data to the Hive data warehouse ods layer.
  • Solution : Create a temporary table, insert the information of each table into the temporary table, export the MySQL temporary table information to the local, then create 83 Sqoop jobs, and finally execute the Sqoop job to realize the incremental import of 83 tables. The premise That is, there are time fields in all 83 tables, and Sqoop can monitor this time field.

Two, Sqoop Job 

1. Sqoop job usage example

1. Query job list

[root@hdp301 ~]# sqoop job --list

2. Delete job

[root@hdp301 ~]# sqoop job --delete kangll

3. Create a test job

sqoop job --create kangll -- import --connect jdbc:mysql://192.168.2.226:3306/yx \
--table stat_url_count \
--username root \
--password winner@001 -m 1 \
--hive-import \
--hive-table dwd_url_count \
--external-table-dir /yax/dwd/sfyp_test.db/dwd_url_count \
--incremental append \
--check-column last_timestamp \
--last-value '2015-11-30 16:59:43.1' \
--fields-terminated-by "\001"

4. Execute job

[root@hdp301 ~]# sqoop job -exec kangll

Printed log 

5. View running results

Why create a Sqoop job, because the job can save the latest timestamp field of the monitoring in /root/.sqoop/metastore.db.script, if you don’t use Sqoop job to save, the default will be the following effect, get the initial given Value and maximum value found

 

6. Insert two new data in MySQL

7. How to display the query in the log

8. Query the data in the hive table

 2. Sqoop incremental import

Note: Sqoop does not support incremental import using lastmodified mode when converting mysql to hive

nohup sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect ${driver} \
--username ${dbUsername} \
--password ${dbPasswd} \
--table "${tableName}" \
--hive-import \
--hive-database data_exchange \
--hive-table "${tableName}" \
--hive-partition-key  "dt" \
--hive-partition-value "${date}" \
--fields-terminated-by '\001' \
--external-table-dir "/winhadoop/ods/data_exchange.db/${tableName}" \
--check-column jhpt_update_time \
--incremental append \
--last-value '1990-02-02 12:21:21' \
--target-dir "/sqoop/data/append/" \
--m 1 \
--null-string '\\N' \
--null-non-string '\\N' \
--mapreduce-job-name data_exchange_${tableName} >> ${sfyp_log}/ods_data_exchange.${date}.log

 

One more point : Sqoop job password saving parameters

--password-file /input/sqoop/pwd/sqoopPWD.pwd

Create a file to save the password and upload it to HDFS

echo -n "hadoop" > sqoopPWD.pwd
hdfs dfs  -mkdir -p /input/sqoop/pwd/sqoopPWD.pwd
​echo -n "hadoop" > sqoopPWD.pwd
hdfs dfs  -mkdir -p /input/sqoop/pwd/sqoopPWD.pwd
hdfs dfs -put sqoopPWD.pwd /input/sqoop/pwd
hdfs dfs -chmod 400 /input/sqoop/pwd/sqoopPWD.pwd

 

 

Guess you like

Origin blog.csdn.net/qq_35995514/article/details/108471044