Apache Sqoop Cookbook in English - translation learning !!
sqoop major command
19/05/31 05:49:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.4.2-2
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
problem
On an example of the use of works well, it can make you decide to integrate Sqoop to the data in Hadoop workflows.
In order to do this, you need to specify a directory to store the imported data.
solution
Sqoop provides two parameters to specify a custom output path.
--target-dir and --warehouse the dir- , using --target-dir path HDFS parameter used to specify the data to be imported as cities with the following command to import the data into the directory table / etc / input / cities.
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--target-dir /etl/input/cities
Parent directory of all sqoop task approach, using --warehouse-dir parameters
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--warehouse-dir /mydir/input
discuss
默认情况下,sqoop将会创建一个和导入表名相同的目录在HDFS上,并将全部的数据到这个目录中。举例来说
When the user jarcec import table cities, will be introduced into the data / user / jarcec / cities directory. And to replace this directory in the HDFS
any directory by using the parameter --target-dir , provided that this directory need to create good.
For existing directories, Sqoop rejects the write, the data coverage to prevent problems.
If you want to execute a plurality of jobs sqoop tables, need to modify --target-dir parameter
specifies directly comparable final storage path using --warehouse-dir can be used to specify a parent directory, and writes data to just
warehouse heads, also creates a sqoop
and table the same name as the folder, and import the data into this folder, this situation is a bit like sqoop import the data into your next home directory HDFS system. But it notes that --warehouse-dir parameter allows the use of a path other than the home directory, there is little need to be reminded that the use of this
need to change the name of each table parameter, unless there is a table of the same name.
Reproduced in: https: //www.jianshu.com/p/7dcbb74c7776