1. Parameter description
serial number |
command/command |
kind |
illustrate |
1 |
import |
ImportTool |
Import data (from tables or queries) from relational databases into HDFS |
2 |
export |
ExportTool |
Import data from HDFS into relational database |
3 |
codegen |
CodeGenTool |
Get the data of a table in the database to generate Java and type it into a jar package |
4 |
create-hive-table |
CreateHiveTableTool |
Create Hive table |
5 |
eval |
EvalSqlTool |
View SQL execution results |
6 |
import-all-tables |
ImportAllTablesTool |
Import all tables under a database into HDFS |
7 |
job |
JobTool |
|
8 |
list-databases |
ListDatabasesTool |
List all database names |
9 |
list-tables |
ListTablesTool |
List all tables in a database |
10 |
merge |
MergeTool |
|
11 |
metastore |
MetastoreTool |
|
12 |
help |
HelpTool |
View help |
13 |
version |
VersionTool |
View version |
2. Import Common arguments general parameters, mainly for some parameters of relational database links
serial number |
parameter |
illustrate |
Sample |
1 |
connect |
URL to connect to the relational database |
jdbc:mysql://localhost/sqoop_datas |
2 |
connection-manager |
Connection management class, generally not used |
|
3 |
driver |
connection driver |
|
4 |
hadoop-home |
hadoop directory |
/home/hadoop |
5 |
help |
View help information |
|
6 |
password |
Password to connect to relational database |
|
7 |
username |
Username for linking relational database |
|
8 |
verbose |
To see more information, it is actually to lower the log level |
There is no value after this parameter |
import parameter
parameter | describe |
---|---|
–connect < jdbc-uri > | JDBC connection string |
–connection-manager < class-name > | Connection management class |
–driver < class-name > | Manually specify the JDBC driver class |
–hadoop-mapred-home < dir > | 可以覆盖$HADOOP_MAPRED_HOME |
–help | 使用帮助 |
–password-file | 指定包含密码的文件 |
-P | 执行import时会暂停,等待用户手动输入密码 |
–password < password > | 直接将密码写在命令行中 |
–username < username > | 指定用户名 |
–verbose | 显示Sqoop任务更多执行信息 |
–connection-param-file < filename > | 可选的参数,用于提供连接参数 |
–relaxed-isolation | 设置每个mapmer的连接事务隔离 |
3。import控制参数
参数 | 描述 |
---|---|
–append | 导入的数据追加到数据文件中 |
–as-avrodatafile | 导入数据格式为avro |
–as-sequencefile | 导入数据格式为sqeuqncefile |
–as-textfile | 导入数据格式为textfile |
–boundary-query < statement > | 代替min(split-by),max(split-by)值指定的边界,下面会有详细介绍 |
–columns < col,col,col… > | 指定导入字段 |
–delete-target-dir | 如果导入的target路径存在,则删除 |
–direct | 使用direct模式 |
–fetch-size < n > | 从数据库一次性读入的记录数 |
-inline-lob-limit < n > | 设定大对象数据类型的最大值 |
-m, –num-mappers < n > | 指定并行导入数据的map个数,默认为4个 |
-e, –query < statement > | 导入查询语句得到的数据 |
–split-by < column-name > | 一般与-m参数一起使用,指定分割split的字段 |
–table < table-name > | 指定database中的表名 |
–target-dir < dir > | 指定目标HDFS路径 |
–warehouse-dir < dir > | 指定表目标路径所在路径 |
–where < where clause > | 即sql语句中的where条件 |
-z, –compress | 打开压缩功能 |
–compression-codec < c > | 使用Hadoop的压缩,默认为gzip压缩 |
–null-string < null-string > | 源表中为null的记录导入为string类型时显示为null-string,默认显示为”null” |
–null-non-string < null-string > | 源表中为null的记录导入为非string类型时显示为null-string,默认显示为”null” |
4。增量导入
sqoop支持两种增量导入到hive的模式, 一种是 append,即通过指定一个递增的列,比如:
--incremental append --check-column id --last-value 0
另种是可以根据时间戳,比如:
--incremental lastmodified --check-column time --last-value '2013-01-01 11:0:00'
就是只导入time比'2013-01-01 11:0:00'更大的数据。
5。具体代码示例
/opt/softwares/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/sqoop import --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--connect "jdbc:sqlserver://10.10.0.3\\sql2008;database=LuxeDc" --username bgdbo --password bgdbo123 \
--table=Customer --target-dir /user/Customer \
--columns "CustomerID,CusCode,TrueName,LogDate" \
--fields-terminated-by "\t" \
--check-column "LogDate" \
--incremental "append" \
--last-value "2018-4-24 00:00:00" \