One: Introduction to sqoop
- sqoop is a tool designed to efficiently transfer massive amounts of data. It is generally used to synchronize data from a relational database to a non-relational database.
- Use sqoop to import and export data, which essentially runs the Mapreduce program, making full use of the parallelization and fault tolerance of MR.
- Sqoop supports incremental updates, adding new records to the most recently exported data source.
Two: sqoop use
Get the database name in the MySQL database
sqoop list-database
-connect 'jdbc:mysql://10.1.96.xx:3306'
-username test
-password test
Get the names of all tables in a certain database of the sqlserver database
sqoop list-tables
connect'jdbc:sqlserver://192.168.12.xx:1433;database=pems;
username=sa;password=v3pems@2020'
sqoop imports all tables in a database in the oracle database into hive
sqoop import-all-tables
-connect jdbc:oracle:thin:@10.89.142.207:1521:orcl
-username scott -password tiger -hive-database eda
-hive-import -create-hive-table -m 1
Use Sqoop to execute sql statement
sqoop eval -connect 'jdbc:sqlserver://192.168.12.65:1433;database=PEMS_DATA;
username=sa;password=V3pems@2021' -query'select count(*) from rep_energy_tar'
Three: sqoop common operation commands
Use parameter description
1.Data import: sqoop import
The parameter description can be viewed through the sqoop import --help command
Common parameters
- --Connect specifies the JDBC connection string
- --Connection-manager specifies the name of the connection manager class
- --Connection-param-file specifies the connection parameter file
- --Driver manually specify the JDBC driver class to be used
- --Hadoop-home overrides the $HADOOP_MAPR ED_HOME_ARG parameter
- –Hadoop-mapred-home cover $ HADOOP_MAPR ED_HOME_ARG arguments
- --Help print instructions
- --Metadata-transaction-isolation-level Define transaction isolation level for metadata query
- --Oracle-escaping-disabled Disable the escape mechanism of Oracle/OraOop connection manager
- -P read the password from the console
- --Password set password verification
- --Password-alias credential provider password alias
- --Password-file set authentication on the password file path
- --Relaxed-isolation use read-uncommi isolation for imports
- --Skip-dist-cache Skip copying the jar to the distributed cache
- --Temporary-rootdir defines the temporary root directory for imports
- --Throw-on-error Rethrow RuntimeException when an error occurs during the job
- --Username set the username for authentication
- --Verbose print more information while working
Imported control parameters
- --Append Import data in append mode
- --As-avrodatafile will import data to avro file storage
- --As-parquetfile will import data to parquet file storage
- --As-sequencefile will import data as SequenceFile file storage
- --As-textfile Import data in plain text form (default)
- --Autoreset-to-one-mapper If no split key is available, reset the number of mappers to one mapper
- --Boundary-query set boundary query, retrieve the maximum and minimum values of the primary key
- --Columns <col,col,col…> specify the columns that need to be imported
- --Compression-codec The compression codec used for import
- --Delete-target-dir Import data in delete mode (if the target file exists, delete and then import. If not specified, if the target path exists, an error will be reported)
- --Direct use direct import fast path
- --Direct-split-size When importing in direct mode, split the input stream by "n" bytes
- -e, -query import the result of SQL "statement"
- --Fetch-size When more rows are needed, set the number of rows fetched from the database'n', and set the maximum size of inline LOB
- -m, --num-mappers Use n map tasks to import in parallel. The default parallelism is 4
- --Mapreduce-job-name set the name for the generated mapreduce job
- --Merge-key to be used for the Key column of the merged result (used to merge repeated data during incremental import)
- --Split-by is used to split the column of the table of work units
- --Split-limit The upper limit of the row split for each split column of date/time/time stamp and integer type. For date or timestamp fields, it is calculated in seconds. Split limit should be greater than 0
- --Table The name of the table read (the table to be imported)
- --Target-dir The target path where the imported table is stored in HDFS
- --Validate Use the configured validator to validate the copy
- --Validation-failurehandler ValidationFailureHandler fully qualified class name
- --Validation-threshold Fully qualified class name of ValidationThreshold
- --Validator Validator fully qualified class name
- --Warehouse-dir parent path to import hdfs
- --Where use WHERE condition filtering when importing
- -z, --compress enable compression
Incremental import parameters
- --Check-column source column to check for incremental changes
- --Incremental defines incremental import with type "append" or "lastmodified"
- --Last-value Incrementally check the last imported value in the column
Output line formatting parameters
- --Enclosed-by set the enclosed character of the required field
- --Escaped-by set escape character
- --Fields-terminated-by set field separator
- --Lines-terminated-by set end-of-line characters
- --Mysql-delimiters Use MySQL's default delimiter set: field:, ;line:\n ;escape character:\ ;field delimiter:'
- --Optionally-enclosed-by set the field containing characters
Enter parsing parameters
- --Input-enclosed-by set the required field mask
- --Input-escaped-by set input escape character
- --Input-fields-terminated-by set input field separator
- --Input-lines-terminated-by set input line ending character
- --Input-optionally-enclosed-by set fields containing characters
Hive parameters
- --Create-hive-table automatically creates a Hive table when importing. If the target hive table exists, it fails
- --Hive-database Set the database name to be used when importing to hive
- --Hive-delims-replacement replaces Hive record \0x01 and line separator (\n\r) in the imported string field with a user-defined string
- --Hive-drop-import-delims delete Hive record \0x01 and line separator (\n\r) from the imported string field
- --Hive-home overrides the $HIVE_HOME configuration parameter
- --Hive-import Import the table into Hive (if no separator is set, Hive's default separator is used)
- --Hive-overwrite overwrite existing data in Hive table (overwrite import)
- --Hive-partition-key Set the partition key to be used when importing to hive
- --Hive-partition-value Set the partition value to be used when importing to hive
- --Hive-table Set the table name to be used when importing to hive
- --Map-column-hive Override the mapping of the specified column to the hive type
HBase parameters
- --Column-family set the imported target column family
- --Hbase-bulkload enable HBase bulk loading
- --Hbase-create-table If specified, create the missing HBase table
- --Hbase-row-key specifies which input column to use as the row key
- --Hbase-table The name of the table imported into HBase
references:
https://zhuanlan.zhihu.com/p/163266351