HADOOP study notes 6 Sqoop incremental data import combat

Sqoop incremental data import

Experiment 1: Import 3 incremental data.

Import initial data:

[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin --table bigdata

20/04/13 17:36:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`class_id`), MAX(`class_id`) FROM `bigdata`
20/04/13 17:36:49 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 1 to: 7
20/04/13 17:36:49 INFO mapreduce.JobSubmitter: number of splits:4

20/04/13 17:37:11 INFO mapreduce.ImportJobBase: Transferred 908.5771 KB in 24.5775 seconds (36.9678 KB/sec)
20/04/13 17:37:11 INFO mapreduce.ImportJobBase: Retrieved 7 records.

result:

Insert 3 pieces of data in the table.

Import increment

[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin 
--table bigdata --check-column class_id --incremental append --last-value 7

Import:

20/04/13 17:49:29 INFO mapreduce.ImportJobBase: Transferred 672.2295 KB in 22.4879 seconds (29.8929 KB/sec)
20/04/13 17:49:29 INFO mapreduce.ImportJobBase: Retrieved 3 records.
20/04/13 17:49:29 INFO util.AppendUtils: Appending to directory bigdata
20/04/13 17:49:29 INFO util.AppendUtils: Using found partition 4
20/04/13 17:49:29 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
20/04/13 17:49:29 INFO tool.ImportTool:  --incremental append
20/04/13 17:49:29 INFO tool.ImportTool:   --check-column class_id
20/04/13 17:49:29 INFO tool.ImportTool:   --last-value 10
20/04/13 17:49:29 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

View data: Although the default map is 4, the newly added data is too small, only 3 lines, so only 3 files are generated.

ata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00004
8,9,hive,2020-04-14 06:39:44.0,Mars
[root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00005
9,10,Hive example:log analysis,2020-04-14 06:39:44.0,Mars
[root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00006
10,10,hbase,2020-04-14 06:39:44.0,Mars

****** Additional *******

If it is operated according to -m 1 (1 map) in the video, it is one more file, and there are three new lines of data.

******* Additional supplements *******

Experiment 2: Add a new data: we don't add --last-value 1, see what will happen?

[root@bigdata tempfolder]# sqoop import --connect jdbc:mysql://localhost:3306/sqooptest --username root --password admin --table bigdata 
--check-column class_id --incremental append -m 1

20/04/13 18:08:25 INFO mapreduce.ImportJobBase: Transferred 218.0459 KB in 22.4301 seconds (9.7211 KB/sec)
20/04/13 18:08:25 INFO mapreduce.ImportJobBase: Retrieved 11 records.
20/04/13 18:08:25 INFO util.AppendUtils: Appending to directory bigdata
20/04/13 18:08:25 INFO util.AppendUtils: Using found partition 7
20/04/13 18:08:25 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
20/04/13 18:08:25 INFO tool.ImportTool:  --incremental append
20/04/13 18:08:25 INFO tool.ImportTool:   --check-column class_id
20/04/13 18:08:25 INFO tool.ImportTool:   --last-value 11
20/04/13 18:08:25 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

As a result, the system imported all 11 pieces of data again.

Therefore, --last-value x must be added, otherwise the data is redundant.

[root@bigdata tempfolder]# hdfs dfs -cat /user/root/bigdata/part-m-00007
1,8,bigdata intro.,2020-04-13 11:26:45.0,Mars
2,8,hadoop intro.,2020-04-13 11:26:45.0,Mars
3,8,hadoop components,2020-04-13 11:26:45.0,Mars
4,8,hadoop arch.,2020-04-13 11:26:45.0,Mars
5,9,hdfs,2020-04-13 11:26:45.0,Mars
6,9,yarn,2020-04-13 11:26:45.0,Mars
7,9,sqoop,2020-04-13 11:26:45.0,Mars
8,9,hive,2020-04-14 06:39:44.0,Mars
9,10,Hive example:log analysis,2020-04-14 06:39:44.0,Mars
10,10,hbase,2020-04-14 06:39:44.0,Mars
11,11,kylin,2020-04-14 07:07:03.0,Mars

Experiment 3:

Add another piece of data,

HADOOP study notes 6 Sqoop incremental data import combat

Guess you like