Sqoop learning summary

One: Introduction to sqoop

  1. sqoop is a tool designed to efficiently transfer massive amounts of data. It is generally used to synchronize data from a relational database to a non-relational database.
  2. Use sqoop to import and export data, which essentially runs the Mapreduce program, making full use of the parallelization and fault tolerance of MR.
  3. Sqoop supports incremental updates, adding new records to the most recently exported data source.

Two: sqoop use

    Get the database name in the MySQL database

sqoop list-database 
-connect 'jdbc:mysql://10.1.96.xx:3306' 
-username test 
-password test

  Get the names of all tables in a certain database of the sqlserver database

sqoop list-tables 
connect'jdbc:sqlserver://192.168.12.xx:1433;database=pems;
username=sa;password=v3pems@2020'

 sqoop imports all tables in a database in the oracle database into hive

sqoop import-all-tables 
-connect jdbc:oracle:thin:@10.89.142.207:1521:orcl 
-username scott -password tiger -hive-database eda 
-hive-import -create-hive-table -m 1

Use Sqoop to execute sql statement

sqoop eval -connect 'jdbc:sqlserver://192.168.12.65:1433;database=PEMS_DATA;
username=sa;password=V3pems@2021' -query'select count(*) from rep_energy_tar'

Three: sqoop common operation commands

Use parameter description

1.Data import: sqoop import

The parameter description can be viewed through the sqoop import --help command

Common parameters

  • --Connect specifies the JDBC connection string
  • --Connection-manager specifies the name of the connection manager class
  • --Connection-param-file specifies the connection parameter file
  • --Driver manually specify the JDBC driver class to be used
  • --Hadoop-home overrides the $HADOOP_MAPR ED_HOME_ARG parameter
  • –Hadoop-mapred-home cover $ HADOOP_MAPR ED_HOME_ARG arguments
  • --Help print instructions
  • --Metadata-transaction-isolation-level Define transaction isolation level for metadata query
  • --Oracle-escaping-disabled Disable the escape mechanism of Oracle/OraOop connection manager
  • -P read the password from the console
  • --Password set password verification
  • --Password-alias credential provider password alias
  • --Password-file set authentication on the password file path
  • --Relaxed-isolation use read-uncommi isolation for imports
  • --Skip-dist-cache Skip copying the jar to the distributed cache
  • --Temporary-rootdir defines the temporary root directory for imports
  • --Throw-on-error Rethrow RuntimeException when an error occurs during the job
  • --Username set the username for authentication
  • --Verbose print more information while working

Imported control parameters

  • --Append Import data in append mode
  • --As-avrodatafile will import data to avro file storage
  • --As-parquetfile will import data to parquet file storage
  • --As-sequencefile will import data as SequenceFile file storage
  • --As-textfile Import data in plain text form (default)
  • --Autoreset-to-one-mapper If no split key is available, reset the number of mappers to one mapper
  • --Boundary-query set boundary query, retrieve the maximum and minimum values ​​of the primary key
  • --Columns <col,col,col…> specify the columns that need to be imported
  • --Compression-codec The compression codec used for import
  • --Delete-target-dir Import data in delete mode (if the target file exists, delete and then import. If not specified, if the target path exists, an error will be reported)
  • --Direct use direct import fast path
  • --Direct-split-size When importing in direct mode, split the input stream by "n" bytes
  • -e, -query import the result of SQL "statement"
  • --Fetch-size When more rows are needed, set the number of rows fetched from the database'n', and set the maximum size of inline LOB
  • -m, --num-mappers Use n map tasks to import in parallel. The default parallelism is 4
  • --Mapreduce-job-name set the name for the generated mapreduce job
  • --Merge-key to be used for the Key column of the merged result (used to merge repeated data during incremental import)
  • --Split-by is used to split the column of the table of work units
  • --Split-limit The upper limit of the row split for each split column of date/time/time stamp and integer type. For date or timestamp fields, it is calculated in seconds. Split limit should be greater than 0
  • --Table The name of the table read (the table to be imported)
  • --Target-dir The target path where the imported table is stored in HDFS
  • --Validate Use the configured validator to validate the copy
  • --Validation-failurehandler ValidationFailureHandler fully qualified class name
  • --Validation-threshold Fully qualified class name of ValidationThreshold
  • --Validator Validator fully qualified class name
  • --Warehouse-dir parent path to import hdfs
  • --Where use WHERE condition filtering when importing
  • -z, --compress enable compression

Incremental import parameters

  • --Check-column source column to check for incremental changes
  • --Incremental defines incremental import with type "append" or "lastmodified"
  • --Last-value Incrementally check the last imported value in the column

Output line formatting parameters

  • --Enclosed-by set the enclosed character of the required field
  • --Escaped-by set escape character
  • --Fields-terminated-by set field separator
  • --Lines-terminated-by set end-of-line characters
  • --Mysql-delimiters Use MySQL's default delimiter set: field:, ;line:\n ;escape character:\ ;field delimiter:'
  • --Optionally-enclosed-by set the field containing characters

Enter parsing parameters

  • --Input-enclosed-by set the required field mask
  • --Input-escaped-by set input escape character
  • --Input-fields-terminated-by set input field separator
  • --Input-lines-terminated-by set input line ending character
  • --Input-optionally-enclosed-by set fields containing characters

Hive parameters

  • --Create-hive-table automatically creates a Hive table when importing. If the target hive table exists, it fails
  • --Hive-database Set the database name to be used when importing to hive
  • --Hive-delims-replacement replaces Hive record \0x01 and line separator (\n\r) in the imported string field with a user-defined string
  • --Hive-drop-import-delims delete Hive record \0x01 and line separator (\n\r) from the imported string field
  • --Hive-home overrides the $HIVE_HOME configuration parameter
  • --Hive-import Import the table into Hive (if no separator is set, Hive's default separator is used)
  • --Hive-overwrite overwrite existing data in Hive table (overwrite import)
  • --Hive-partition-key Set the partition key to be used when importing to hive
  • --Hive-partition-value Set the partition value to be used when importing to hive
  • --Hive-table Set the table name to be used when importing to hive
  • --Map-column-hive Override the mapping of the specified column to the hive type

 

HBase parameters

  • --Column-family set the imported target column family
  • --Hbase-bulkload enable HBase bulk loading
  • --Hbase-create-table If specified, create the missing HBase table
  • --Hbase-row-key specifies which input column to use as the row key
  • --Hbase-table The name of the table imported into HBase

 

 

references:

https://zhuanlan.zhihu.com/p/163266351

 

Guess you like

Origin blog.csdn.net/yezonghui/article/details/114869257