2.2 sqoop specify a destination path

Apache Sqoop Cookbook in English - translation learning !!

sqoop major command

19/05/31 05:49:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.4.2-2
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

problem

On an example of the use of works well, it can make you decide to integrate Sqoop to the data in Hadoop workflows.
In order to do this, you need to specify a directory to store the imported data.

solution

Sqoop provides two parameters to specify a custom output path.

--target-dir and --warehouse the dir- , using --target-dir path HDFS parameter used to specify the data to be imported as cities with the following command to import the data into the directory table / etc / input / cities.

sqoop import \
    --connect jdbc:mysql://localhost:3306/sqoop \
    --username sqoop \
    --password sqoop \
    --table cities \
    --target-dir /etl/input/cities

Parent directory of all sqoop task approach, using --warehouse-dir parameters

sqoop import \ 
    --connect jdbc:mysql://localhost:3306/sqoop \
    --username sqoop \ 
    --password sqoop \
    --table cities \
    --warehouse-dir /mydir/input

discuss

默认情况下,sqoop将会创建一个和导入表名相同的目录在HDFS上,并将全部的数据到这个目录中。举例来说

When the user jarcec import table cities, will be introduced into the data / user / jarcec / cities directory. And to replace this directory in the HDFS
any directory by using the parameter --target-dir , provided that this directory need to create good.
For existing directories, Sqoop rejects the write, the data coverage to prevent problems.

16307917-898d880ba19a025d.png
c2_2_table_exist.png

If you want to execute a plurality of jobs sqoop tables, need to modify --target-dir parameter
specifies directly comparable final storage path using --warehouse-dir can be used to specify a parent directory, and writes data to just
warehouse heads, also creates a sqoop
and table the same name as the folder, and import the data into this folder, this situation is a bit like sqoop import the data into your next home directory HDFS system. But it notes that --warehouse-dir parameter allows the use of a path other than the home directory, there is little need to be reminded that the use of this
need to change the name of each table parameter, unless there is a table of the same name.

For more information https://blue-shadow.top/

Reproduced in: https: //www.jianshu.com/p/7dcbb74c7776

Guess you like

Origin blog.csdn.net/weixin_33696106/article/details/91186107