Sqoop incremental data import
Experiment 1: Import 3 incremental data.
Import initial data:
[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin --table bigdata
20/04/13 17:36:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`class_id`), MAX(`class_id`) FROM `bigdata` 20/04/13 17:36:49 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 1 to: 7 20/04/13 17:36:49 INFO mapreduce.JobSubmitter: number of splits:4
20/04/13 17:37:11 INFO mapreduce.ImportJobBase: Transferred 908.5771 KB in 24.5775 seconds (36.9678 KB/sec)
20/04/13 17:37:11 INFO mapreduce.ImportJobBase: Retrieved 7 records.
result:
Insert 3 pieces of data in the table.
Import increment
[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin
--table bigdata --check-column class_id --incremental append --last-value 7
Import:
20/04/13 17:49:29 INFO mapreduce.ImportJobBase: Transferred 672.2295 KB in 22.4879 seconds (29.8929 KB/sec) 20/04/13 17:49:29 INFO mapreduce.ImportJobBase: Retrieved 3 records. 20/04/13 17:49:29 INFO util.AppendUtils: Appending to directory bigdata 20/04/13 17:49:29 INFO util.AppendUtils: Using found partition 4 20/04/13 17:49:29 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments: 20/04/13 17:49:29 INFO tool.ImportTool: --incremental append 20/04/13 17:49:29 INFO tool.ImportTool: --check-column class_id 20/04/13 17:49:29 INFO tool.ImportTool: --last-value 10 20/04/13 17:49:29 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
View data: Although the default map is 4, the newly added data is too small, only 3 lines, so only 3 files are generated.
ata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00004 8,9,hive,2020-04-14 06:39:44.0,Mars [root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00005 9,10,Hive example:log analysis,2020-04-14 06:39:44.0,Mars [root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00006 10,10,hbase,2020-04-14 06:39:44.0,Mars
****** Additional *******
If it is operated according to -m 1 (1 map) in the video, it is one more file, and there are three new lines of data.
******* Additional supplements *******
Experiment 2: Add a new data: we don't add --last-value 1, see what will happen?
[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin --table bigdata
--check-column class_id --incremental append -m 1
20/04/13 18:08:25 INFO mapreduce.ImportJobBase: Transferred 218.0459 KB in 22.4301 seconds (9.7211 KB/sec) 20/04/13 18:08:25 INFO mapreduce.ImportJobBase: Retrieved 11 records. 20/04/13 18:08:25 INFO util.AppendUtils: Appending to directory bigdata 20/04/13 18:08:25 INFO util.AppendUtils: Using found partition 7 20/04/13 18:08:25 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments: 20/04/13 18:08:25 INFO tool.ImportTool: --incremental append 20/04/13 18:08:25 INFO tool.ImportTool: --check-column class_id 20/04/13 18:08:25 INFO tool.ImportTool: --last-value 11 20/04/13 18:08:25 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
As a result, the system imported all 11 pieces of data again.
Therefore, --last-value x must be added, otherwise the data is redundant.
[root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00007 1,8,bigdata intro.,2020-04-13 11:26:45.0,Mars 2,8,hadoop intro.,2020-04-13 11:26:45.0,Mars 3,8,hadoop components,2020-04-13 11:26:45.0,Mars 4,8,hadoop arch.,2020-04-13 11:26:45.0,Mars 5,9,hdfs,2020-04-13 11:26:45.0,Mars 6,9,yarn,2020-04-13 11:26:45.0,Mars 7,9,sqoop,2020-04-13 11:26:45.0,Mars 8,9,hive,2020-04-14 06:39:44.0,Mars 9,10,Hive example:log analysis,2020-04-14 06:39:44.0,Mars 10,10,hbase,2020-04-14 06:39:44.0,Mars 11,11,kylin,2020-04-14 07:07:03.0,Mars
Experiment 3:
Add another piece of data,