Sqoop2 安装使用

  1. 解压并安装
tar xvfz sqoop-1.99.6-bin-hadoop200.tar.gz mv sqoop-1.99.6-bin-hadoop200 sqoop/
chmod -R 775 sqoop/
chown -R zhu  sqoop/


  1. 修改环境变量
zhu@Master:~$ sudo vim  /etc/profile
#sqoop
export SQOOP_HOME=$HOME/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
hadoop@Master:~$ source /etc/profile
  1. 修改sqoop的环境变量
zhu@Master:/$ sudo vim  /home/zhu/sqoop/server/conf/sqoop.properties 
#修改指向我的hadoop安装目录  
org.apache.sqoop.submission.engine.mapreduce.configuration.directory= /home/zhu/hadoop/etc/hadoop  
#catalina.properties 
zhu@Master:/$ sudo vim /home/zhu/sqoop/server/catalina.properties 
common.loader=/home/zhu/hadoop/share/hadoop/common/*.jar,/home/zhu/hadoop/share/hadoop/common/lib/*.jar,/home/zhu/hadoop/share/hadoop/hdfs/*.jar,/home/zhu/hadoop/share/hadoop/hdfs/lib/*.jar,/home/zhu/hadoop/share/hadoop/mapreduce/*.jar,/home/zhu/hadoop/share/hadoop/mapreduce/lib/*.jar,/home/zhu/hadoop/share/hadoop/tools/*.jar,/home/zhu/hadoop/share/hadoop/tools/lib/*.jar,/home/zhu/hadoop/share/hadoop/yarn/*.jar,/home/zhu/hadoop/share/hadoop/yarn/lib/*.jar,/home/zhu/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar,

下载mysql驱动包,mysql-connector-java-5.1.27.jar
把jar包丢到到$SQOOP_HOME/server/lib下面

sudo cp mysql-connector-java-5.1.27.jar $SQOOP_HOME/server/lib

有时,启动sqoop时可能会遇到找不到JAVA_HOME的情况,为了保险起见我们直接在配置文件中写入JAVA_HOME
在/home/zhonglan/sqoop/bin/sqoop.sh中,添加

  export JAVA_HOME=/usr/local/java
  HADOOP_COMMON_HOME=/home/zhu/hadoop/share/hadoop/common
  HADOOP_HDFS_HOME=/home/zhu/hadoop/share/hadoop/hdfs
  HADOOP_MAPRED_HOME=/home/zhu/hadoop/share/hadoop/mapreduce
  HADOOP_YARN_HOME=/home/zhu/hadoop/share/hadoop/yarn
  1. 启动sqoop
    sqoop.sh server start
  2. 验证启动成功
./sqoop.sh server start    启动 
./sqoop.sh server stop     停止
./sqoop.sh client          进入客户端
set server --host hadoophadoopMaster --port 12000 --webapp sqoop 设置服务器,注意hadoopMaster为sqoop2服务端所在服务器的主机名
show connector --all    查看连接类型
create link --cid 1    创建连接,cid为连接类型id
show link 查看连接
update link -l 1 修改id为1的连接
delete link -l 1 删除id为1的连接
create job -f 1 -t 2 创建从连接1到连接2的job
show job 查看job
update job -jid 1    修改job
delete job -jid 1    删除job
status job -jid 1    看看job状态
stop job -jid    1   停止job
  1. Sqoop2使用
sqoop.sh client

Sqoop 2 被分布为两个单独的软件包;一个客户端软件包 (sqoop2-client) 和一个服务器软件包 (sqoop2-server) 。在集群中的一个节点上安装服务器包;因为 Sqoop2 Server 充当 MapReduce 客户端,所以此节点必须已安装和配置 Hadoop 

在将充当客户端的每个节点上安装客户端包。 Sqoop2 客户端将始终连接至 Sqoop2 Server ,以执行任何操作,因此, Hadoop 无需安装在客户端节点上。

以下是通过yum 安装,这里客户端和服务器安装在同一节点上

# yum install sqoop2-server sqoop2-client





七、sqoop2 使用

7.1 命令说明

服务端的启动

sqoop2-server start

使用以下命令,以交互式模式启动客户端:

# sqoop2

sqoop:000> help

For information about Sqoop, visit: http://sqoop.apache.org/

 

Available commands:

  exit    (\x  ) Exit the shell

  history (\H  ) Display, manage and recall edit-line history

  help    (\h  ) Display this help message

  set     (\st ) Configure various client options and settings

  show    (\sh ) Display various objects and configuration options

  create  (\cr ) Create new object in Sqoop repository

  delete  (\d  ) Delete existing object in Sqoop repository

  update  (\up ) Update objects in Sqoop repository

  clone   (\cl ) Create new object based on existing one

  start   (\sta) Start job

  stop    (\stp) Stop job

  status  (\stu) Display status of a job

  enable  (\en ) Enable object in Sqoop repository

  disable (\di ) Disable object in Sqoop repository

 

For help on a specific command type: help command

 

7.2 MySQL导入数据到Hdfs

我们在使用的过程中可能会遇到错误,使用以下命令来使错误信息显示出来

sqoop:000> set option --name verbose --value true

连接 Sqoop 服务端 :

sqoop:000> set server --host hadoopMaster

Server is set successfully

6.2.1 创建 Link 对象(连接 mysql  hdfs 等)

检查 Sqoop 服务 (server) 已经注册的 connectors:

sqoop:000> show connector

+----+------------------------+-----------------+------------------------------------------------------+----------------------+

| Id |          Name          |     Version     |                        Class                         | Supported Directions |

+----+------------------------+-----------------+------------------------------------------------------+----------------------+

| 1  | kite-connector         | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.kite.KiteConnector        | FROM/TO              |

| 2  | kafka-connector        | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.kafka.KafkaConnector      | TO                   |

| 3  | hdfs-connector         | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.hdfs.HdfsConnector        | FROM/TO              |

| 4  | generic-jdbc-connector | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO              |

+----+------------------------+-----------------+------------------------------------------------------+----------------------+

 

这里我们利用 Generic JDBC Connector  id 来为 connector 创建新的 link 对象,下面创建的是连接 JDBC  link 

sqoop:000> create link -c 4        #note :这边的 4  connector  id ,表明创建的是一个 generic jdbc connector     

Creating link for connector with id 4

Please fill following values to create new link object

Name: mysql-link                   #note  Name 是唯一的

 

Link configuration

 

JDBC Driver Class: com.mysql.jdbc.Driver

JDBC Connection String: jdbc:mysql://20.12.20.14:3306/hive     #note  jdbc:mysql:// 主机名 (ip): 端口 / 数据库名

Username: root

Password: ******

JDBC Connection Properties:

There are currently 0 values in the map:

entry# protocol=tcp

There are currently 1 values in the map:

protocol = tcp

New link was successfully created with validation status OK and persistent id 1

sqoop:000> show link

+----+------------+--------------+------------------------+---------+

| Id |    Name    | Connector Id |     Connector Name     | Enabled |

+----+------------+--------------+------------------------+---------+

| 1  | mysql-link | 4            | generic-jdbc-connector | true    |

+----+------------+--------------+------------------------+---------+

 

上面,我们成功创建了一个 id  1  jdbc link 对象。

接下来我们为 hdfs connector 创建一个 link 

sqoop:000> create link -c 3

Creating link for connector with id 3

Please fill following values to create new link object

Name: hdfs-link

 

Link configuration

 

HDFS URI: hdfs://hadoopMaster:8020

New link was successfully created with validation status OK and persistent id 2

已经成功创建了 id  2  hdfs link 对象。

7.2.1创建Job 对象

Connectors  From 用于读取数据, To 用于写入数据。使用上面的 show connector -all 命令可以显示出 Generic JDBC Connector  From  To 都是支持的。也就是说我们既可以从数据库中读取数据,也可以往数据库中写入数据。为了创建一个 Job ,我们需要指定 Job  From  To 部分, From  To 部分可以使用 link Id 来表示。

最后,我们可以使用这两个 link Id 来关联 job  From  To 部分。说的通俗一点,就是我们需要从哪里 (From) 读取数据 , 把这些数据导入 (To) 到哪里。

#  MySQL 读数据导入到 hdfs

sqoop:000> create job -f 1 -t 2

Creating job for links with from id 1 and to id 2

Please fill following values to create new job object

Name: mysql-hdfs                               #Name 必须唯一

 

From database configuration

 

Schema name: hive                              # 必填,数据库名称

Table name: TBLS                               # 必填,表名

Table SQL statement:                           # 可选

Table column names:                            # 可选

Partition column name:                         # 可选

Null value allowed for the partition column:   # 可选

Boundary query:                                # 可选

 

ToJob configuration

 

Override null value:                           # 可选

Null value:

Output format:

  0 : TEXT_FILE

  1 : SEQUENCE_FILE

Choose: 0                                      # 必选

Compression format:

  0 : NONE

  1 : DEFAULT

  2 : DEFLATE

  3 : GZIP

  4 : BZIP2

  5 : LZO

  6 : LZ4

  7 : SNAPPY

  8 : CUSTOM

Choose: 0                                      # 必选

Custom compression format:

Output directory: /tmp/tbls                    # 必填  

 

Throttling resources

 

Extractors: 2                                  # 可选,对应 mapreduce  job 中的 map 的数量

Loaders: 1                                     # 可选,对应 mapreduce  job 中的 reduce 的数量

New job was successfully created with validation status OK  and persistent id 1

 

这样就建立了一个新的 job ,他的 id  1

7.2.2启动job

可以使用以下命令来执行 job:

# 这边 -j 后面的 1 代表上面创建的 job  id ,可以使用 show job 来查看已经创建的 job

sqoop:000> start job -j 1 -s

 

7.2.3查看数据是否导入

# hadoop fs -cat /tmp/tbls/33ccf4ab-8531-45cf-bf45-b046afdb9ff6.txt

3,1440743238,1,0,'hue',0,3,'test','MANAGED_TABLE',NULL,NULL,NULL

4,1440743248,1,0,'hue',0,4,'test01','MANAGED_TABLE',NULL,NULL,NULL

5,1440743252,1,0,'hue',0,5,'test02','MANAGED_TABLE',NULL,NULL,NULL

 

7.3 Hdfs导出数据到MySQL

H dfs 数据

1,aaaaaaa,bbbbbb

2,ccccccc,dddddd

3,eeeeeee,ffffff

4,df,df

同样我们可以根据上面已创建的两个 Link 对象将 hdfs 上数据导出到 MySQL 的表中,只需重新创建一个 job 即可

sqoop:000>  create job -f 2 -t 1

Creating job for links with from id 2 and to id 1

Please fill following values to create new job object

Name: hdfs-mysql

 

From Job configuration

 

Input directory: /tmp/test.txt

Override null value:

Null value:

 

To database configuration

 

Schema name: hive

Table name: test1

Table SQL statement:

Table column names:

Stage table name:

Should clear stage table:

 

Throttling resources

 

Extractors: 2

Loaders: 1

New job was successfully created with validation status OK  and persistent id 2

 

这样就建立了一个新的 job ,他的 id  2

 

启动该 Job

sqoop:000> start job -j 2 -s

 

查看结果

mysql> select * from test1;

+------+---------+--------+

| id   | va01    | va02   |

+------+---------+--------+

|    1 | aaaaaaa | bbbbbb |

|    2 | ccccccc | dddddd |

|    3 | eeeeeee | ffffff |

|    4 | df      | df     |

+------+---------+--------+

4 rows in set (0.00 sec)

 


cdh安装sqoop2后,执行start job常见错误 

执行start job时出现

hadoop出现java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException

Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:563)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:561)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:549)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
        at com.pzoom.mapred.WordCount.main(WordCount.java:40)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

把jackson-core-asl-1.8.8.jar   和   jackson-mapper-asl-1.8.8.jar 复制到安装sqoop2服务器下

/var/lib/sqoop2

文件夹下,然后重启sqoop2

hdfs 修改权限 本人使用cdh安装hdfs,在任意目录下执行

hdfs dfs -ls /  查看权限

drwxrwxrwt   - hdfs supergroup          0 2018-04-09 21:38 /tmp

drwxr-xr-x   - hdfs supergroup          0 2018-05-29 20:47 /user

su hdfs 

hdfs dfs -chmod -R 777 /

hdfs改变所有者

hdfs dfs  -chown -R  impala  /user/impala

 


猜你喜欢

转载自blog.csdn.net/zsj777/article/details/80495397