table of Contents
1. Overview of needs and plans
1. Overview of needs and plans
- Requirements : First, import all 83 MySQL tables into the Hive data warehouse, and then incrementally import the new data to the Hive data warehouse ods layer.
- Solution : Create a temporary table, insert the information of each table into the temporary table, export the MySQL temporary table information to the local, then create 83 Sqoop jobs, and finally execute the Sqoop job to realize the incremental import of 83 tables. The premise That is, there are time fields in all 83 tables, and Sqoop can monitor this time field.
Two, Sqoop Job
1. Sqoop job usage example
1. Query job list
[root@hdp301 ~]# sqoop job --list
2. Delete job
[root@hdp301 ~]# sqoop job --delete kangll
3. Create a test job
sqoop job --create kangll -- import --connect jdbc:mysql://192.168.2.226:3306/yx \
--table stat_url_count \
--username root \
--password winner@001 -m 1 \
--hive-import \
--hive-table dwd_url_count \
--external-table-dir /yax/dwd/sfyp_test.db/dwd_url_count \
--incremental append \
--check-column last_timestamp \
--last-value '2015-11-30 16:59:43.1' \
--fields-terminated-by "\001"
4. Execute job
[root@hdp301 ~]# sqoop job -exec kangll
Printed log
5. View running results
Why create a Sqoop job, because the job can save the latest timestamp field of the monitoring in /root/.sqoop/metastore.db.script, if you don’t use Sqoop job to save, the default will be the following effect, get the initial given Value and maximum value found
6. Insert two new data in MySQL
7. How to display the query in the log
8. Query the data in the hive table
2. Sqoop incremental import
Note: Sqoop does not support incremental import using lastmodified mode when converting mysql to hive
nohup sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ --connect ${driver} \ --username ${dbUsername} \ --password ${dbPasswd} \ --table "${tableName}" \ --hive-import \ --hive-database data_exchange \ --hive-table "${tableName}" \ --hive-partition-key "dt" \ --hive-partition-value "${date}" \ --fields-terminated-by '\001' \ --external-table-dir "/winhadoop/ods/data_exchange.db/${tableName}" \ --check-column jhpt_update_time \ --incremental append \ --last-value '1990-02-02 12:21:21' \ --target-dir "/sqoop/data/append/" \ --m 1 \ --null-string '\\N' \ --null-non-string '\\N' \ --mapreduce-job-name data_exchange_${tableName} >> ${sfyp_log}/ods_data_exchange.${date}.log
One more point : Sqoop job password saving parameters
--password-file /input/sqoop/pwd/sqoopPWD.pwd
Create a file to save the password and upload it to HDFS
echo -n "hadoop" > sqoopPWD.pwd
hdfs dfs -mkdir -p /input/sqoop/pwd/sqoopPWD.pwd
echo -n "hadoop" > sqoopPWD.pwd
hdfs dfs -mkdir -p /input/sqoop/pwd/sqoopPWD.pwd
hdfs dfs -put sqoopPWD.pwd /input/sqoop/pwd
hdfs dfs -chmod 400 /input/sqoop/pwd/sqoopPWD.pwd