Sqoop参数

Import参数；

Table 1. Common arguments

Argument	Description
`--connect <jdbc-uri>`	Specify JDBC connect string
`--connection-manager <class-name>`	Specify connection manager class to use
`--driver <class-name>`	Manually specify JDBC driver class to use
`--hadoop-mapred-home <dir>`	Override $HADOOP_MAPRED_HOME
`--help`	Print usage instructions
`--password-file`	Set path for a file containing the authentication password
`-P`	Read password from console
`--password <password>`	Set authentication password
`--username <username>`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file <filename>`	Optional properties file that provides connection parameters
`--relaxed-isolation`	Set connection transaction isolation to read uncommitted for the mappers.

Table 2. Validation arguments

Argument	Description
`--validate`	Enable validation of data copied, supports single table copy only.
`--validator <class-name>`	Specify validator class to use.
`--validation-threshold <class-name>`	Specify validation threshold class to use.
`--validation-failurehandler <class-name>`	Specify validation failure handler class to use.

Table 3. Import control arguments:

Argument	Description
`--append`	Append data to an existing dataset in HDFS
`--as-avrodatafile`	Imports data to Avro Data Files
`--as-sequencefile`	Imports data to SequenceFiles
`--as-textfile`	Imports data as plain text (default)
`--as-parquetfile`	Imports data to Parquet Files
`--boundary-query <statement>`	Boundary query to use for creating splits
`--columns <col,col,col…>`	Columns to import from table
`--delete-target-dir`	Delete the import target directory if it exists
`--direct`	Use direct connector if exists for the database
`--fetch-size <n>`	Number of entries to read from database at once.
`--inline-lob-limit <n>`	Set the maximum size for an inline LOB
`-m,--num-mappers <n>`	Use n map tasks to import in parallel
`-e,--query <statement>`	Import the results of `statement`.
`--split-by <column-name>`	Column of the table used to split work units. Cannot be used with `--autoreset-to-one-mapper` option.
`--autoreset-to-one-mapper`	Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with `--split-by <col>` option.
`--table <table-name>`	Table to read
`--target-dir <dir>`	HDFS destination dir
`--warehouse-dir <dir>`	HDFS parent for table destination
`--where <where clause>`	WHERE clause to use during import
`-z,--compress`	Enable compression
`--compression-codec <c>`	Use Hadoop codec (default gzip)
`--null-string <null-string>`	The string to be written for a null value for string columns
`--null-non-string <null-string>`	The string to be written for a null value for non-string columns

The --null-string and --null-non-string arguments are optional. If not specified, then the string "null" will be used.

Table 4. Parameters for overriding mapping

Argument	Description
`--map-column-java <mapping>`	Override mapping from SQL to Java type for configured columns.
`--map-column-hive <mapping>`	Override mapping from SQL to Hive type for configured columns.

Table 5. Incremental import arguments:

Argument	Description
`--check-column (col)`	Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
`--incremental (mode)`	Specifies how Sqoop determines which rows are new. Legal values for `mode` include `append` and `lastmodified`.
`--last-value (value)`	Specifies the maximum value of the check column from the previous import.

Table 6. Output line formatting arguments:

Argument	Description
`--enclosed-by <char>`	Sets a required field enclosing character
`--escaped-by <char>`	Sets the escape character
`--fields-terminated-by <char>`	Sets the field separator character
`--lines-terminated-by <char>`	Sets the end-of-line character
`--mysql-delimiters`	Uses MySQL’s default delimiter set: fields: `,` lines: `\n` escaped-by: `\` optionally-enclosed-by: `'`
`--optionally-enclosed-by <char>`	Sets a field enclosing character

Table 7. Input parsing arguments:

Argument	Description
`--input-enclosed-by <char>`	Sets a required field encloser
`--input-escaped-by <char>`	Sets the input escape character
`--input-fields-terminated-by <char>`	Sets the input field separator
`--input-lines-terminated-by <char>`	Sets the input end-of-line character
`--input-optionally-enclosed-by <char>`	Sets a field enclosing character

Table 8. Hive arguments:

扫描二维码关注公众号，回复： 4912634 查看本文章

Argument	Description
`--hive-home <dir>`	Override `$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
	table exits. By default this property is false.
`--hive-table <table-name>`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops \n, \r, and \01 from string fields when importing to Hive.
`--hive-delims-replacement`	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value <v>`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive <map>`	Override default mapping from SQL type to Hive type for configured columns.

Table 9. HBase arguments:

Argument	Description
`--column-family <family>`	Sets the target column family for the import
`--hbase-create-table`	If specified, create missing HBase tables
`--hbase-row-key <col>`	Specifies which input column to use as the row key
	In case, if input table contains composite
	key, then <col> must be in the form of a
	comma-separated list of composite key
	attributes
`--hbase-table <table-name>`	Specifies an HBase table to use as the target instead of HDFS
`--hbase-bulkload`	Enables bulk loading

Table 10. Accumulo arguments:

Argument	Description
`--accumulo-table <table-nam>`	Specifies an Accumulo table to use as the target instead of HDFS
`--accumulo-column-family <family>`	Sets the target column family for the import
`--accumulo-create-table`	If specified, create missing Accumulo tables
`--accumulo-row-key <col>`	Specifies which input column to use as the row key
`--accumulo-visibility <vis>`	(Optional) Specifies a visibility token to apply to all rows inserted into Accumulo. Default is the empty string.
`--accumulo-batch-size <size>`	(Optional) Sets the size in bytes of Accumulo’s write buffer. Default is 4MB.
`--accumulo-max-latency <ms>`	(Optional) Sets the max latency in milliseconds for the Accumulo batch writer. Default is 0.
`--accumulo-zookeepers <host:port>`	Comma-separated list of Zookeeper servers used by the Accumulo instance
`--accumulo-instance <table-name>`	Name of the target Accumulo instance
`--accumulo-user <username>`	Name of the Accumulo user to import as
`--accumulo-password <password>`	Password for the Accumulo user

Table 11. Code generation arguments:

Argument	Description
`--bindir <dir>`	Output directory for compiled objects
`--class-name <name>`	Sets the generated class name. This overrides `--package-name`. When combined with `--jar-file`, sets the input class.
`--jar-file <file>`	Disable code generation; use specified jar
`--outdir <dir>`	Output directory for generated code
`--package-name <name>`	Put auto-generated classes in this package
`--map-column-java <m>`	Override default mapping from SQL type to Java type for configured columns.

Table 13. Common arguments

Argument	Description
`--connect <jdbc-uri>`	Specify JDBC connect string
`--connection-manager <class-name>`	Specify connection manager class to use
`--driver <class-name>`	Manually specify JDBC driver class to use
`--hadoop-mapred-home <dir>`	Override $HADOOP_MAPRED_HOME
`--help`	Print usage instructions
`--password-file`	Set path for a file containing the authentication password
`-P`	Read password from console
`--password <password>`	Set authentication password
`--username <username>`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file <filename>`	Optional properties file that provides connection parameters
`--relaxed-isolation`	Set connection transaction isolation to read uncommitted for the mappers.

Table 14. Import control arguments:

Argument	Description
`--as-avrodatafile`	Imports data to Avro Data Files
`--as-sequencefile`	Imports data to SequenceFiles
`--as-textfile`	Imports data as plain text (default)
`--as-parquetfile`	Imports data to Parquet Files
`--direct`	Use direct import fast path
`--inline-lob-limit <n>`	Set the maximum size for an inline LOB
`-m,--num-mappers <n>`	Use n map tasks to import in parallel
`--warehouse-dir <dir>`	HDFS parent for table destination
`-z,--compress`	Enable compression
`--compression-codec <c>`	Use Hadoop codec (default gzip)
`--exclude-tables <tables>`	Comma separated list of tables to exclude from import process
`--autoreset-to-one-mapper`	Import should use one mapper if a table with no primary key is encountered

Table 15. Output line formatting arguments:

Argument	Description
`--enclosed-by <char>`	Sets a required field enclosing character
`--escaped-by <char>`	Sets the escape character
`--fields-terminated-by <char>`	Sets the field separator character
`--lines-terminated-by <char>`	Sets the end-of-line character
`--mysql-delimiters`	Uses MySQL’s default delimiter set: fields: `,` lines: `\n` escaped-by: `\` optionally-enclosed-by: `'`
`--optionally-enclosed-by <char>`	Sets a field enclosing character

Table 16. Input parsing arguments:

Argument	Description
`--input-enclosed-by <char>`	Sets a required field encloser
`--input-escaped-by <char>`	Sets the input escape character
`--input-fields-terminated-by <char>`	Sets the input field separator
`--input-lines-terminated-by <char>`	Sets the input end-of-line character
`--input-optionally-enclosed-by <char>`	Sets a field enclosing character

Table 17. Hive arguments:

Argument	Description
`--hive-home <dir>`	Override `$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
	table exits. By default this property is false.
`--hive-table <table-name>`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops \n, \r, and \01 from string fields when importing to Hive.
`--hive-delims-replacement`	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value <v>`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive <map>`	Override default mapping from SQL type to Hive type for configured columns.

Table 18. Code generation arguments:

Argument	Description
`--bindir <dir>`	Output directory for compiled objects
`--jar-file <file>`	Disable code generation; use specified jar
`--outdir <dir>`	Output directory for generated code
`--package-name <name>`	Put auto-generated classes in this package

Table 19. Common arguments

Argument	Description
`--connect <hostname>`	Specify mainframe host to connect
`--connection-manager <class-name>`	Specify connection manager class to use
`--hadoop-mapred-home <dir>`	Override $HADOOP_MAPRED_HOME
`--help`	Print usage instructions
`--password-file`	Set path for a file containing the authentication password
`-P`	Read password from console
`--password <password>`	Set authentication password
`--username <username>`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file <filename>`	Optional properties file that provides connection parameters

Table 20. Import control arguments:

Argument	Description
`--as-avrodatafile`	Imports data to Avro Data Files
`--as-sequencefile`	Imports data to SequenceFiles
`--as-textfile`	Imports data as plain text (default)
`--as-parquetfile`	Imports data to Parquet Files
`--delete-target-dir`	Delete the import target directory if it exists
`-m,--num-mappers <n>`	Use n map tasks to import in parallel
`--target-dir <dir>`	HDFS destination dir
`--warehouse-dir <dir>`	HDFS parent for table destination
`-z,--compress`	Enable compression
`--compression-codec <c>`	Use Hadoop codec (default gzip)

Table 21. Output line formatting arguments:

Argument	Description
`--enclosed-by <char>`	Sets a required field enclosing character
`--escaped-by <char>`	Sets the escape character
`--fields-terminated-by <char>`	Sets the field separator character
`--lines-terminated-by <char>`	Sets the end-of-line character
`--mysql-delimiters`	Uses MySQL’s default delimiter set: fields: `,` lines: `\n` escaped-by: `\` optionally-enclosed-by: `'`
`--optionally-enclosed-by <char>`	Sets a field enclosing character

Table 22. Input parsing arguments:

Argument	Description
`--input-enclosed-by <char>`	Sets a required field encloser
`--input-escaped-by <char>`	Sets the input escape character
`--input-fields-terminated-by <char>`	Sets the input field separator
`--input-lines-terminated-by <char>`	Sets the input end-of-line character
`--input-optionally-enclosed-by <char>`	Sets a field enclosing character

Table 23. Hive arguments:

Argument	Description
`--hive-home <dir>`	Override `$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
	table exits. By default this property is false.
`--hive-table <table-name>`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops \n, \r, and \01 from string fields when importing to Hive.
`--hive-delims-replacement`	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value <v>`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive <map>`	Override default mapping from SQL type to Hive type for configured columns.

Table 24. HBase arguments:

Argument	Description
`--column-family <family>`	Sets the target column family for the import
`--hbase-create-table`	If specified, create missing HBase tables
`--hbase-row-key <col>`	Specifies which input column to use as the row key
	In case, if input table contains composite
	key, then <col> must be in the form of a
	comma-separated list of composite key
	attributes
`--hbase-table <table-name>`	Specifies an HBase table to use as the target instead of HDFS
`--hbase-bulkload`	Enables bulk loading

Table 25. Accumulo arguments:

Argument	Description
`--accumulo-table <table-nam>`	Specifies an Accumulo table to use as the target instead of HDFS
`--accumulo-column-family <family>`	Sets the target column family for the import
`--accumulo-create-table`	If specified, create missing Accumulo tables
`--accumulo-row-key <col>`	Specifies which input column to use as the row key
`--accumulo-visibility <vis>`	(Optional) Specifies a visibility token to apply to all rows inserted into Accumulo. Default is the empty string.
`--accumulo-batch-size <size>`	(Optional) Sets the size in bytes of Accumulo’s write buffer. Default is 4MB.
`--accumulo-max-latency <ms>`	(Optional) Sets the max latency in milliseconds for the Accumulo batch writer. Default is 0.
`--accumulo-zookeepers <host:port>`	Comma-separated list of Zookeeper servers used by the Accumulo instance
`--accumulo-instance <table-name>`	Name of the target Accumulo instance
`--accumulo-user <username>`	Name of the Accumulo user to import as
`--accumulo-password <password>`	Password for the Accumulo user

Table 26. Code generation arguments:

Argument	Description
`--bindir <dir>`	Output directory for compiled objects
`--class-name <name>`	Sets the generated class name. This overrides `--package-name`. When combined with `--jar-file`, sets the input class.
`--jar-file <file>`	Disable code generation; use specified jar
`--outdir <dir>`	Output directory for generated code
`--package-name <name>`	Put auto-generated classes in this package
`--map-column-java <m>`	Override default mapping from SQL type to Java type for configured columns.

A basic import of all sequential files in a partitioned dataset named EMPLOYEES in the mainframe host z390:

$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \
    --username SomeUser -P
Enter password: (hidden)

Controlling the import parallelism (using 8 parallel tasks):

$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \
    --username SomeUser --password-file mypassword -m 8

Importing the data to Hive:

$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \
    --hive-import

A basic import of a table named EMPLOYEES in the corp database:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES

A basic import requiring a login:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --username SomeUser -P
Enter password: (hidden)

Selecting specific columns from the EMPLOYEES table:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --columns "employee_id,first_name,last_name,job_title"

Controlling the import parallelism (using 8 parallel tasks):

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    -m 8

Storing data in SequenceFiles, and setting the generated class name to com.foocorp.Employee:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --class-name com.foocorp.Employee --as-sequencefile

Specifying the delimiters to use in a text-mode import:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --fields-terminated-by '\t' --lines-terminated-by '\n' \
    --optionally-enclosed-by '\"'

Importing the data to Hive:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --hive-import

Importing only new employees:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --where "start_date > '2010-01-01'"

Changing the splitting column from the default:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --split-by dept_id

Performing an incremental import of new data, after having already imported the first 100,000 rows of a table:

$ sqoop import --connect jdbc:mysql://db.foo.com/somedb --table sometable \
    --where "id > 100000" --target-dir /incremental_dataset --append

An import of a table named EMPLOYEES in the corp database that uses validation to validate the import using the table row count and number of rows copied into HDFS: More Details

$ sqoop import --connect jdbc:mysql://db.foo.com/corp \
    --table EMPLOYEES --validate

Import all tables from the corp database:

$ sqoop import-all-tables --connect jdbc:mysql://db.foo.com/corp

猜你喜欢