SQOOP incremental extraction

1. Create an external table in hive for data testing:

CREATE EXTERNAL TABLE smes_source.test_etl (
    id int ,
name varchar(8),
score int
)
row format delimited fields terminated by '\001'
lines terminated by '\n' 
stored as textfile

location "/data/cdh/hive/hiveExternal/TEST_ETL";

2. Extract the existing data in mysql into hive

sqoop  import --connect jdbc:mysql://10.96.3.8:3306/lOT_DMPS --username galera --password 123456  --table test_etl --target-dir '/data/cdh/hive/hiveExternal/TEST_ETL' --check-column id --incremental append --last-value 1 --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --lines-terminated-by '\n' -m 1

3. Create sqoop job:

When synchronizing relational database and Hadoop/Hive data, if the --incremental option is used, such as append mode, we need to record a value of --last-value. The value of --last-value is parsed from the log, and then the script parameters are reset to synchronize correctly, ensuring that the data synchronized from the relational database to Hadoop/Hive will not be duplicated. Moreover, we need to manage these scripts we use, and may need to obtain the specified parameter value or modify the parameter before each execution. Sqoop also provides a comparative way, which is to directly create a Sqoop job to manage specific synchronization tasks through the job. Just like the incremental synchronization problem we mentioned earlier, by creating a sqoop job, the value of --last-value recorded in the last synchronization can be saved, so there is no need to parse and obtain it. Every time you want to synchronize, this job It will be automatically obtained from the data saved by the job.

Create sqoop job statement as follows:

sqoop  job --create etl_sync_job -- import --connect jdbc:mysql://10.96.3.8:3306/lOT_DMPS --username galera --password 123456  --table test_etl --target-dir '/data/cdh/hive/hiveExternal/TEST_ETL' --check-column id --incremental append --last-value 1 --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --lines-terminated-by '\n' -m 1

Note: In the red font, there must be a space between "--" and import, otherwise an error will be reported




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324848363&siteId=291194637