Sqoop2(1.99.7) 安装配置与测试

下载

sqoop1:1.4.7
sqoop2:1.99.7
版本区别请参考《sqoop简介及sqoop1与sqoop2区别》
我们需要下载sqoop-1.99.7-bin-hadoop200.tar.gz


确保Hadoop已安装,未安装请参考《centos下Hadoop集群安装》


安装配置

1. 解压

tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /root/
mv sqoop-1.99.7 sqoop

2. 环境变量配置

2.1 编辑/etc/profile

export SQOOP_HOME=/root/sqoop
export LOGDIR=$SQOOP_HOME/logs
export PATH=$PATH:$JAVA_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HBASE_HOME/sbin:$HIVE_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=/root/sqoop/server/lib

source /etc/profile使之生效

2.2 编辑/sqoop/conf/sqoop.properties

# JDBC repository provider configuration
#org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler

org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
#org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/db;create=true
#我的是MySQL数据库
org.apache.sqoop.repository.jdbc.url=jdbc:mysql://master:3306/sqoop

#org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver

org.apache.sqoop.repository.jdbc.user=root
#记得修改为自己数据库的密码
org.apache.sqoop.repository.jdbc.password=yourpassword

# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/root/hadoop-2.10.0/etc/hadoop

org.apache.sqoop.jetty.port=12000

2.3 修改Hadoop的yarn-site.xml

<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>

2.4 修改core-site.xml并重启Hadoop

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

2.5 需要安装JDBC驱动,将JDBC驱动放入server/lib/目录下。

我安装的是MySQL,需要先确定好mysql版本,

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.46    |
+-----------+
1 row in set (0.01 sec)

再下载对应的mysql驱动放入到server/lib/目录下,下载地址:https://repo1.maven.org/maven2/mysql/mysql-connector-java/,不过我已经安装了HIVE,直接将hive目录下的mysql-connector-java-5.1.46.jar复制过来即可

cp /root/hive-3.1.2/lib/mysql-connector-java-5.1.46.jar /root/sqoop/server/conf

3.初始化和验证

3.1 sqoop初始化

[root@master bin]# ./sqoop2-tool upgrade
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2020-01-28 21:56:19,969 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

3.2 sqoop验证

[root@master bin]# ./sqoop2-tool verify
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2020-01-28 21:56:35,082 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 21:56:35,089 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
3.2.1 异常处理

ERROR[main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"
在mysql中执行

#此步骤每次MySQL重启均需重新配置
set global sql_mode ='ANSI_QUOTES';

或者在mysql的配置文档my-default.cnf中添加

#此步骤永久配置
sql_mode=ANSI_QUOTES

4 测试访问

4.1 启动sqoop

4.1.1 启动historyserver这个进程
#在hadoop的sbin目录下执行如下命令
# /root/hadoop-2.10.0/etc/hadoop 是我的hadoop存放配置文件的目录
./mr-jobhistory-daemon.sh start historyserver --config /root/hadoop-2.10.0/etc/hadoop
4.1.2 启动sqoop服务端
[root@master bin]# ./sqoop2-server start
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Starting the Sqoop2 server...
2020-01-28 07:44:00,338 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 07:44:00,412 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Sqoop2 server started.
4.1.3 启动sqoop客户端
[root@master bin]# ./sqoop-shell
bash: ./sqoop-shell: No such file or directory
[root@master bin]# ./sqoop2-shell
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> 

4.2 sqoop测试使用

将MySQL数据通过sqool工具传输到hdfs上

4.2.1 连接sqoop服务端
sqoop:000> set server --host master --port 12000 -webapp sqoop
Server is set successfully
4.2.2 检查Sqoop服务(server)已经注册的 connectors:
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+
4.2.3 创建连接Mysql导入的link
sqoop:000> create link -c generic-jdbc-connector 
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: link1_generic_jdbc

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop   #确保有sqoop这个数据库
Username: root
Password: *********    #与MYSQL密码一致
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose:  (此处为空格)
New link was successfully created with validation status OK and name link1_generic_jdbc
4.2.4 查看link情况
sqoop:000> show link
+--------------------+------------------------+---------+
|        Name        |     Connector Name     | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true    |
+--------------------+------------------------+---------+
4.2.5 创建导出到hdfs的link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: link2_hdfs

HDFS cluster

URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name link2_hdfs
4.2.6 查看已创建的link具体状态
sqoop:000> show link --all
2 link(s) to show: 
link with name link1_generic_jdbc (Enabled: true, Created by root at 1/28/20 8:13 PM, Updated by root at 1/28/20 8:13 PM)
Using Connector generic-jdbc-connector with name {1}
  Database connection
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://master/sqoop
    Username: root
    Password: 
    Fetch Size: 
    Connection Properties: 
  SQL Dialect
    Identifier enclose:  
link with name link2_hdfs (Enabled: true, Created by root at 1/28/20 8:14 PM, Updated by root at 1/28/20 8:14 PM)
Using Connector hdfs-connector with name {1}
  HDFS cluster
    URI: hdfs://master:9000
    Conf directory: /root/hadoop-2.10.0/etc/hadoop
    Additional configs:: 
4.2.7 创建数据从mysql 转移到hdfs 的job :
sqoop:000> create job -f "link1_generic_jdbc" -t "link2_hdfs"
Creating job for links with from name link1_generic_jdbc and to name link2_hdfs
Please fill following values to create new job object
Name: jdbcToHdfs

Database source
#确保有sqoop这个数据库
Schema name: sqoop
#user表即需要导入到hdfs库的数据
Table name: user
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec: 
Output directory: /jdbctohdfs   #确保hdfs无此文件夹,或此文件夹为空
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name jdbcToHdfs
4.2.8 查看job状态
sqoop:000> show job --all
1 job(s) to show: 
Job with name jdbcToHdfs (Enabled: true, Created by root at 1/28/20 8:17 PM, Updated by root at 1/28/20 8:17 PM)
  Throttling resources
    Extractors: 
    Loaders: 
  Classpath configuration
    Extra mapper jars: 
From link: link1_generic_jdbc
  Database source
    Schema name: sqoop
    Table name: user
    SQL statement: 
    Column names: 
    Partition column: 
    Partition column nullable: 
    Boundary query: 
  Incremental read
    Check column: 
    Last value: 
To link: link2_hdfs
  Target configuration
    Override null value: 
    Null value: 
    File format: TEXT_FILE
    Compression codec: NONE
    Custom codec: 
    Output directory: /jdbctohdfs
    Append mode: 

4.2.9 启动job
sqoop:000> start job -n jdbcToHdfs
2020-01-28 21:44:57 PST: FAILED 

运行失败,查看sqoop.log分析原因
在这里插入图片描述
按理说应该在sqoop/server/lib文件中查找对应jar包,这里显示的是sqoop/bin下查找的,未弄清需要在哪里修改配置,直接把sqoop/server/lib下的jar软连接过去

[root@master bin]# ln -s /root/sqoop/server/lib/*  /root/sqoop/bin

查看/root/sqoop/bin下,已有jar包软连接
在这里插入图片描述
再次启动,成功

sqoop:000> start job -n jdbcToHdfs
Submission details
Job Name: jdbcToHdfs
Server URL: http://master:12000/sqoop/
Created by: root
Creation date: 2020-01-28 21:58:31 PST
Lastly updated by: root
External ID: job_local292485477_0006
	http://localhost:8080/
Source Connector schema: Schema{name= sqoop . website ,columns=[
	FixedPoint{name=websiteid,nullable=true,type=FIXED_POINT,byteSize=8,signed=true},
	Text{name=companyName,nullable=true,type=TEXT,charSize=null},
	Text{name=companyAddr,nullable=true,type=TEXT,charSize=null},
	Text{name=companyInfo,nullable=true,type=TEXT,charSize=null},
	Text{name=jobInfo,nullable=true,type=TEXT,charSize=null},
	Text{name=jobAddr,nullable=true,type=TEXT,charSize=null},
	Text{name=jobName,nullable=true,type=TEXT,charSize=null},
	Text{name=salary,nullable=true,type=TEXT,charSize=null}]}
2020-01-28 21:58:34 PST: SUCCEEDED

在web上查看HDFS数据上传成功
在这里插入图片描述

发布了86 篇原创文章 · 获赞 23 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/qq_32392597/article/details/104106701