使用Sqoop将SQL Server视图中数据导入Hive

环境版本: ·HDP-2.5.3  ·Hive 1.2.1  ·Sqoop 1.4.6  ·SQL Server 2012

1.下载sqljdbc4.jar放在$SQOOP_HOME/lib下

2.测试SQL Server连接

2.1 List available databases on a server

[root@hqc-test-hdp1 ~]# sqoop list-databases --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:13:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:13:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:13:24 INFO manager.SqlManager: Using default fetchSize of 1000
master
AI

2.2 List available tables in a database

只显示dbo中表,且不显示视图

[root@hqc-test-hdp1 ~]# sqoop list-tables --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=AI" -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:52:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 08:52:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 08:52:21 INFO manager.SqlManager: Using default fetchSize of 1000
tt
table1
Dictionary

# SELECT TOP 1000  * FROM [dbo].[Dictionary]

2.3 执行查询语句

[root@hqc-test-hdp1 ~]# sqoop eval --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx --query "SELECT TOP 5 * from [xx.xx]"
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:22:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:22:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:22:22 INFO manager.SqlManager: Using default fetchSize of 1000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Name                 | Code                 | Expr1    | Sname                | Cname                | Aname                | Longitude            | Latitude             | Position             | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
略
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3.全量导入HDFS

因为视图不是表,不能使用–table,只能使用–query。

[root@hqc-test-hdp1 ~]# su hdfs
[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 09:27:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 09:27:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 09:27:22 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 09:27:22 INFO tool.CodeGenTool: Beginning code generation
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 09:27:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.jar
19/10/30 09:27:26 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 09:27:27 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 09:27:27 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 09:27:28 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 09:27:30 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0008
19/10/30 09:27:30 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0008
19/10/30 09:27:30 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0008/
19/10/30 09:27:30 INFO mapreduce.Job: Running job: job_1564035532438_0008
19/10/30 09:27:46 INFO mapreduce.Job: Job job_1564035532438_0008 running in uber mode : false
19/10/30 09:27:46 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job: Job job_1564035532438_0008 completed successfully
19/10/30 09:27:55 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158794
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6086
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=6086
		Total vcore-milliseconds taken by all map tasks=6086
		Total megabyte-milliseconds taken by all map tasks=31160320
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=133
		CPU time spent (ms)=6280
		Physical memory (bytes) snapshot=369975296
		Virtual memory (bytes) snapshot=6399401984
		Total committed heap usage (bytes)=329777152
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 28.168 seconds (24.8605 KB/sec)
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Retrieved 5676 records.

4.全量导入Hive

不需要手动提前建hive表!不需要手动提前建hive表!不需要手动提前建hive表!–hive-table 指定表名就可以了,sqoop不同版本情况可能不同,请注意文章开头的环境版本。
因为视图不是表,不能使用–table,只能使用–query。使用 import --query时,–split-by 可以没有,但–target-dir必须有,路径写一个临时路径即可。据我观察,它是先把数据抽取到hdfs该路径下 ,然后把数据载入hive(表对应的hdfs目录),最后把该文件夹及里面文件删除。

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 10:14:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 10:14:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 10:14:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 10:14:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 10:14:25 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 10:14:25 INFO tool.CodeGenTool: Beginning code generation
19/10/30 10:14:25 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 10:14:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.jar
19/10/30 10:14:29 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 10:14:30 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 10:14:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 10:14:31 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 10:14:33 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0010
19/10/30 10:14:34 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0010
19/10/30 10:14:34 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0010/
19/10/30 10:14:34 INFO mapreduce.Job: Running job: job_1564035532438_0010
19/10/30 10:14:44 INFO mapreduce.Job: Job job_1564035532438_0010 running in uber mode : false
19/10/30 10:14:44 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job: Job job_1564035532438_0010 completed successfully
19/10/30 10:14:59 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158786
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12530
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=12530
		Total vcore-milliseconds taken by all map tasks=12530
		Total megabyte-milliseconds taken by all map tasks=64153600
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=99
		CPU time spent (ms)=5950
		Physical memory (bytes) snapshot=362852352
		Virtual memory (bytes) snapshot=6400389120
		Total committed heap usage (bytes)=336068608
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 29.3386 seconds (23.8685 KB/sec)
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Retrieved 5676 records.
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 10:14:59 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Longitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Latitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.898 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=717075, rawDataSize=0]
OK
Time taken: 0.627 seconds

5.重写,覆盖原数据

加入 --hive-overwrite

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-overwrite --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 16:12:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 16:12:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 16:12:59 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 16:12:59 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 16:13:00 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 16:13:00 INFO tool.CodeGenTool: Beginning code generation
19/10/30 16:13:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 16:13:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.jar
19/10/30 16:13:03 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 16:13:05 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 16:13:05 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 16:13:06 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 16:13:08 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0023
19/10/30 16:13:09 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0023
19/10/30 16:13:09 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0023/
19/10/30 16:13:09 INFO mapreduce.Job: Running job: job_1564035532438_0023
19/10/30 16:13:19 INFO mapreduce.Job: Job job_1564035532438_0023 running in uber mode : false
19/10/30 16:13:19 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job: Job job_1564035532438_0023 completed successfully
19/10/30 16:13:29 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158455
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=5217236
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7426
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=7426
		Total vcore-milliseconds taken by all map tasks=7426
		Total megabyte-milliseconds taken by all map tasks=38021120
	Map-Reduce Framework
		Map input records=16993
		Map output records=16993
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=108
		CPU time spent (ms)=9300
		Physical memory (bytes) snapshot=398262272
		Virtual memory (bytes) snapshot=6417539072
		Total committed heap usage (bytes)=415760384
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=5217236
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Transferred 4.9755 MB in 24.7647 seconds (205.7342 KB/sec)
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Retrieved 16993 records.
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 WARN hive.TableDefWriter: Column AlarmTimeStamp had to be cast to a less precise type in Hive
19/10/30 16:13:30 WARN hive.TableDefWriter: Column HandlerTime had to be cast to a less precise type in Hive
19/10/30 16:13:30 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.809 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=5217236, rawDataSize=0]
OK
Time taken: 1.321 seconds

备注

1.create-hive-table & hive-import 区别
问题:
Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?
回答:
The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.
只创建表:
sqoop create-hive-table --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --table test --hive-table myhive2 --hive-partition-key partition_time --map-column-hive ID=String,name=String,addr=String
全量导入(sqlserver条件用where,oracle用and):
sqoop import --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --query "select * from test i where \$CONDITIONS" --target-dir /user/hive/warehouse/myhive2/partition_time=20171023 --hive-import -m 5 --hive-table myhive2 --split-by ID --hive-partition-key partition_time --hive-partition-value 20171023

2.sqoop help
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:42:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information
  
3.sqoop export 从hdfs/hive等导入关系型数据库,支持全量、增量、更新三种模式
--update-mode  updateonly/allowinsert
  全量导出
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from <sourceHiveTable>;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table <targetTable> --fields-terminated-by ','  --columns F1,F2,F3 --export-dir /user/root/export/test
 
--update-mode  allowinsert 更新且增加
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from <sourceHiveTable> where <condition>;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table <targetTable> --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  allowinsert --export-dir /user/root/export/test
 
 
--update-mode  updateonly 只更新
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from <sourceHiveTable> where <condition>;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table <targetTable> --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  updateonly --export-dir /user/root/export/test

usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect <jdbc-uri>                         Specify JDBC connect
                                                string
   --connection-manager <class-name>            Specify connection manager
                                                class name
   --connection-param-file <properties-file>    Specify connection
                                                parameters file
   --driver <class-name>                        Manually specify JDBC
                                                driver class to use
   --hadoop-home <hdir>                         Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home <dir>                   Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password <password>                        Set authentication
                                                password
   --password-alias <password-alias>            Credential provider
                                                password alias
   --password-file <password-file>              Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir <rootdir>                Defines the temporary root
                                                directory for the import
   --username <username>                        Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Export control arguments:
   --batch                                                    Indicates
                                                              underlying
                                                              statements
                                                              to be
                                                              executed in
                                                              batch mode
   --call <arg>                                               Populate the
                                                              table using
                                                              this stored
                                                              procedure
                                                              (one call
                                                              per row)
   --clear-staging-table                                      Indicates
                                                              that any
                                                              data in
                                                              staging
                                                              table can be
                                                              deleted
   --columns <col,col,col...>                                 Columns to
                                                              export to
                                                              table
   --direct                                                   Use direct
                                                              export fast
                                                              path
   --export-dir <dir>                                         HDFS source
                                                              path for the
                                                              export
-m,--num-mappers <n>                                          Use 'n' map
                                                              tasks to
                                                              export in
                                                              parallel
   --mapreduce-job-name <name>                                Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --staging-table <table-name>                               Intermediate
                                                              staging
                                                              table
   --table <table-name>                                       Table to
                                                              populate
   --update-key <key>                                         Update
                                                              records by
                                                              specified
                                                              key column
   --update-mode <mode>                                       Specifies
                                                              how updates
                                                              are
                                                              performed
                                                              when new
                                                              rows are
                                                              found with
                                                              non-matching
                                                              keys in
                                                              database
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler <validation-failurehandler>    Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold <validation-threshold>              Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator <validator>                                    Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Code generation arguments:
   --bindir <dir>                        Output directory for compiled
                                         objects
   --class-name <name>                   Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string <null-str>    Input null non-string
                                         representation
   --input-null-string <null-str>        Input null string representation
   --jar-file <file>                     Disable code generation; use
                                         specified jar
   --map-column-java <arg>               Override mapping for specific
                                         columns to java types
   --null-non-string <null-str>          Null non-string representation
   --null-string <null-str>              Null string representation
   --outdir <dir>                        Output directory for generated
                                         code
   --package-name <name>                 Put auto-generated classes in
                                         this package

HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive <arg>                          Override mapping for
                                                    specific column to
                                                    hive types.

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

At minimum, you must specify --connect, --export-dir, and --table

4.sqoop import help
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect <jdbc-uri>                         Specify JDBC connect
                                                string
   --connection-manager <class-name>            Specify connection manager
                                                class name
   --connection-param-file <properties-file>    Specify connection
                                                parameters file
   --driver <class-name>                        Manually specify JDBC
                                                driver class to use
   --hadoop-home <hdir>                         Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home <dir>                   Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password <password>                        Set authentication
                                                password
   --password-alias <password-alias>            Credential provider
                                                password alias
   --password-file <password-file>              Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir <rootdir>                Defines the temporary root
                                                directory for the import
   --username <username>                        Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Import control arguments:
   --append                                                   Imports data
                                                              in append
                                                              mode
   --as-avrodatafile                                          Imports data
                                                              to Avro data
                                                              files
   --as-parquetfile                                           Imports data
                                                              to Parquet
                                                              files
   --as-sequencefile                                          Imports data
                                                              to
                                                              SequenceFile
                                                              s
   --as-textfile                                              Imports data
                                                              as plain
                                                              text
                                                              (default)
   --autoreset-to-one-mapper                                  Reset the
                                                              number of
                                                              mappers to
                                                              one mapper
                                                              if no split
                                                              key
                                                              available
   --boundary-query <statement>                               Set boundary
                                                              query for
                                                              retrieving
                                                              max and min
                                                              value of the
                                                              primary key
   --columns <col,col,col...>                                 Columns to
                                                              import from
                                                              table
   --compression-codec <codec>                                Compression
                                                              codec to use
                                                              for import
   --delete-target-dir                                        Imports data
                                                              in delete
                                                              mode
   --direct                                                   Use direct
                                                              import fast
                                                              path
   --direct-split-size <n>                                    Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query <statement>                                        Import
                                                              results of
                                                              SQL
                                                              'statement'
   --fetch-size <n>                                           Set number
                                                              'n' of rows
                                                              to fetch
                                                              from the
                                                              database
                                                              when more
                                                              rows are
                                                              needed
   --inline-lob-limit <n>                                     Set the
                                                              maximum size
                                                              for an
                                                              inline LOB
-m,--num-mappers <n>                                          Use 'n' map
                                                              tasks to
                                                              import in
                                                              parallel
   --mapreduce-job-name <name>                                Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --merge-key <column>                                       Key column
                                                              to use to
                                                              join results
   --split-by <column-name>                                   Column of
                                                              the table
                                                              used to
                                                              split work
                                                              units
   --split-limit <size>                                       Upper Limit
                                                              of rows per
                                                              split for
                                                              split
                                                              columns of
                                                              Date/Time/Ti
                                                              mestamp and
                                                              integer
                                                              types. For
                                                              date or
                                                              timestamp
                                                              fields it is
                                                              calculated
                                                              in seconds.
                                                              split-limit
                                                              should be
                                                              greater than
                                                              0
   --table <table-name>                                       Table to
                                                              read
   --target-dir <dir>                                         HDFS plain
                                                              table
                                                              destination
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler <validation-failurehandler>    Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold <validation-threshold>              Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator <validator>                                    Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator
   --warehouse-dir <dir>                                      HDFS parent
                                                              for table
                                                              destination
   --where <where clause>                                     WHERE clause
                                                              to use
                                                              during
                                                              import
-z,--compress                                                 Enable
                                                              compression

Incremental import arguments:
   --check-column <column>        Source column to check for incremental
                                  change
   --incremental <import-type>    Define an incremental import of type
                                  'append' or 'lastmodified'
   --last-value <value>           Last imported value in the incremental
                                  check column

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --hive-compute-stats                        Overwrite existing data in
                                               the Hive table
   --hive-database <database-name>             Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement <arg>             Replace Hive record \0x01
                                               and row delimiters (\n\r)
                                               from imported string fields
                                               with user-defined string
   --hive-drop-import-delims                   Drop Hive record \0x01 and
                                               row delimiters (\n\r) from
                                               imported string fields
   --hive-home <dir>                           Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key <partition-key>        Sets the partition key to
                                               use when importing to hive
   --hive-partition-value <partition-value>    Sets the partition value to
                                               use when importing to hive
   --hive-table <table-name>                   Sets the table name to use
                                               when importing to hive
   --map-column-hive <arg>                     Override mapping for
                                               specific column to hive
                                               types.

HBase arguments:
   --column-family <family>    Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key <col>       Specifies which input column to use as the
                               row key
   --hbase-table <table>       Import to <table> in HBase

HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive <arg>                          Override mapping for
                                                    specific column to
                                                    hive types.

HCatalog import specific options:
   --create-hcatalog-table             Create HCatalog before import
   --drop-and-create-hcatalog-table    Drop and Create HCatalog before
                                       import
   --hcatalog-storage-stanza <arg>     HCatalog storage stanza for table
                                       creation

Accumulo arguments:
   --accumulo-batch-size <size>          Batch size in bytes
   --accumulo-column-family <family>     Sets the target column family for
                                         the import
   --accumulo-create-table               If specified, create missing
                                         Accumulo tables
   --accumulo-instance <instance>        Accumulo instance name.
   --accumulo-max-latency <latency>      Max write latency in milliseconds
   --accumulo-password <password>        Accumulo password.
   --accumulo-row-key <col>              Specifies which input column to
                                         use as the row key
   --accumulo-table <table>              Import to <table> in Accumulo
   --accumulo-user <user>                Accumulo user name.
   --accumulo-visibility <vis>           Visibility token to be applied to
                                         all rows imported
   --accumulo-zookeepers <zookeepers>    Comma-separated list of
                                         zookeepers (host:port)

Code generation arguments:
   --bindir <dir>                        Output directory for compiled
                                         objects
   --class-name <name>                   Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string <null-str>    Input null non-string
                                         representation
   --input-null-string <null-str>        Input null string representation
   --jar-file <file>                     Disable code generation; use
                                         specified jar
   --map-column-java <arg>               Override mapping for specific
                                         columns to java types
   --null-non-string <null-str>          Null non-string representation
   --null-string <null-str>              Null string representation
   --outdir <dir>                        Output directory for generated
                                         code
   --package-name <name>                 Put auto-generated classes in
                                         this package

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


At minimum, you must specify --connect and --table
Arguments to mysqldump and other subprograms may be supplied
after a '--' on the command line.
原创文章 135 获赞 266 访问量 85万+

猜你喜欢

转载自blog.csdn.net/Dr_Guo/article/details/102823338