sqoop




3. sqoop data migration

3.1 Overview

Sqoop is a tool for " transferring data between Hadoop and relational database servers " under Apache .

Import data : MySQL , Oracle import data to Hadoop 's HDFS , HIVE , HBASE and other data storage systems;

Export data: export data from Hadoop 's file system to a relational database

 

 

3.2 Working mechanism

It is implemented by translating import or export commands into mapreduce programs

In the translated mapreduce , the inputformat and outputformat are mainly customized

 

 

 

3.3 sqoop combat and principle

3.3.1 sqoop installation

The premise of installing sqoop is to have the environment of java and hadoop

1. Download and unzip

The latest version download address http://ftp.wayne.edu/apache/sqoop/1.4.6/

 

 

2. Modify the configuration file

$ cd $SQOOP_HOME/conf

$ mv sqoop-env-template.sh sqoop-env.sh

Open sqoop-env.sh and edit the following lines:

export HADOOP_COMMON_HOME=/home/hadoop/apps/hadoop-2.6.1/

export HADOOP_MAPRED_HOME=/home/hadoop/apps/hadoop-2.6.1/

export HIVE_HOME=/home/hadoop/apps/hive-1.2.1

 

 

3. Add mysql jdbc driver package

cp  ~/app/hive/lib/mysql-connector-java-5.1.28.jar   $SQOOP_HOME/lib/

4. Verify startup

$ cd $SQOOP_HOME/bin

$ sqoop-version

Expected output:

15/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Sqoop 1.4.6 git commit id 5b34accaca7de251fc91161733f906af2eddbe83

Compiled by abe on Fri Aug 1 11:19:26 PDT 2015

At this point, the entire Sqoop installation is complete.

 

 

 

 

3.4 Data import of Sqoop

The " Import Tool " imports a single table from RDBMS to HDFS . Each row in the table is treated as a record in HDFS . All records are stored as text data in text files ( or binary data such as Avro, sequence files , etc.

3.4.1 Syntax

The following syntax is used to import data into HDFS .

$ sqoop import (generic-args) (import-args)

 

3.4.2 Example

table data

There is a library userdb in mysql with three tables: emp, emp_add and emp_contact

table emp:

id

name

you

salary

dept

1201

gopal

manager

50,000

TP

1202

manisha

Proof reader

50,000

TP

1203

khalil

php dev

30,000

AC

1204

prasanth

php dev

30,000

AC

1205

kranthi

admin

20,000

TP

Table emp_add:

id

bro

street

city

1201

288A

vgiri

jublee

1202

108I

aoc

sec-bad

1203

144Z

pgutta

length

1204

78B

old city

sec-bad

1205

720X

hitec

sec-bad

emp_conn:

 

id

phno

email

1201

2356742

[email protected]

1202

1661663

[email protected]

1203

8887776

[email protected]

1204

9988774

[email protected]

1205

1231231

[email protected]

Import table data to HDFS

The following command is used to import HDFS from emp table in MySQL database server .

$bin/sqoop import   \

--connect jdbc:mysql://hdp-node-01:3306/test   \

--username root  \

--password root   \

--table emp   \

--m 1  

 

If executed successfully, you will get the following output.

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar

-----------------------------------------------------

O mapreduce.Job: map 0% reduce 0%

14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0%

14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully

-----------------------------------------------------

-----------------------------------------------------

14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)

14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

 

 

为了验证在HDFS导入的数据,请使用以下命令查看导入的数据

$ $HADOOP_HOME/bin/hadoop fs -cat /user/hadoop/emp/part-m-00000

 

emp表的数据和字段之间用逗号(,)表示。

1201, gopal,    manager, 50000, TP

1202, manisha,  preader, 50000, TP

1203, kalil,    php dev, 30000, AC

1204, prasanth, php dev, 30000, AC

1205, kranthi,  admin,   20000, TP

 

导入关系表到HIVE

bin/sqoop import --connect jdbc:mysql://hdp-node-01:3306/test --username root --password root --table emp --hive-import --m 1

导入到HDFS指定目录

在导入表数据到HDFS使用Sqoop导入工具,我们可以指定目标目录。

以下是指定目标目录选项的Sqoop导入命令的语法。

--target-dir <new or exist directory in HDFS>

 

下面的命令是用来导入emp_add表数据到'/queryresult'目录。

bin/sqoop import \

--connect jdbc:mysql://hdp-node-01:3306/test \

--username root \

--password root \

--target-dir /queryresult \

--table emp --m 1

 

 

下面的命令是用来验证 /queryresult 目录中 emp_add表导入的数据形式。

 $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*

 

 

它会用逗号(,)分隔emp_add表的数据和字段。

1201, 288A, vgiri,   jublee

1202, 108I, aoc,     sec-bad

1203, 144Z, pgutta,  hyd

1204, 78B,  oldcity, sec-bad

1205, 720C, hitech,  sec-bad

 

 

 

 

 

导入表数据子集

我们可以导入表的使用Sqoop导入工具,"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询,并将结果存储在HDFS的目标目录。

where子句的语法如下。

--where <condition>

 

下面的命令用来导入emp_add表数据的子集。子集查询检索员工ID和地址,居住城市为:Secunderabad

bin/sqoop import \

--connect jdbc:mysql://hdp-node-01:3306/test \

--username root \

--password root \

--where "city ='sec-bad'" \

--target-dir /wherequery \

--table emp_add --m 1

 

按需导入

bin/sqoop import \

--connect jdbc:mysql://hdp-node-01:3306/test \

--username root \

--password root \

--target-dir /wherequery2 \

--query 'select id,name,deg from emp WHERE  id>1207 and $CONDITIONS' \

--split-by id \

--fields-terminated-by '\t' \

--m 1

 

 

 

下面的命令用来验证数据从emp_add表导入/wherequery目录

$HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*

 

 

它用逗号(,)分隔 emp_add表数据和字段。

1202, 108I, aoc, sec-bad

1204, 78B, oldcity, sec-bad

1205, 720C, hitech, sec-bad

 

 

 

增量导入

增量导入是仅导入新添加的表中的行的技术。

它需要添加‘incremental’, ‘check-column’, ‘last-value’选项来执行增量导入。

下面的语法用于Sqoop导入命令增量选项。

--incremental <mode>

--check-column <column name>

--last value <last check column value>

 

 

假设新添加的数据转换成emp表如下:

1206, satish p, grp des, 20000, GR

下面的命令用于在EMP表执行增量导入。

bin/sqoop import \

--connect jdbc:mysql://hdp-node-01:3306/test \

--username root \

--password root \

--table emp --m 1 \

--incremental append \

--check-column id \

--last-value 1208

 

 

以下命令用于从emp表导入HDFS emp/ 目录的数据验证。

$ $HADOOP_HOME/bin/hadoop fs -cat /user/hadoop/emp/part-m-*

它用逗号(,)分隔 emp_add表数据和字段。

1201, gopal,    manager, 50000, TP

1202, manisha,  preader, 50000, TP

1203, kalil,    php dev, 30000, AC

1204, prasanth, php dev, 30000, AC

1205, kranthi,  admin,   20000, TP

1206, satish p, grp des, 20000, GR

 

下面的命令是从表emp 用来查看修改或新添加的行

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1

这表示新添加的行用逗号(,)分隔emp表的字段。

1206, satish p, grp des, 20000, GR

 

 

 

3.5 Sqoop的数据导出

将数据从HDFS导出到RDBMS数据库

导出前,目标表必须存在于目标数据库中。

默认操作是从将文件中的数据使用INSERT语句插入到表中

更新模式是生成UPDATE语句更新表数据

语法

以下是export命令语法。

$ sqoop export (generic-args) (export-args)

 

 

示例

数据是在HDFS “EMP/”目录的emp_data文件中。所述emp_data如下

1201, gopal,     manager, 50000, TP

1202, manisha,   preader, 50000, TP

1203, kalil,     php dev, 30000, AC

1204, prasanth,  php dev, 30000, AC

1205, kranthi,   admin,   20000, TP

1206, satish p,  grp des, 20000, GR

 

1、首先需要手动创建mysql中的目标表

$ mysql

mysql> USE db;

mysql> CREATE TABLE employee (

   id INT NOT NULL PRIMARY KEY,

   name VARCHAR(20),

   deg VARCHAR(20),

   salary INT,

   dept VARCHAR(10));

 

2、然后执行导出命令

bin/sqoop export \

--connect jdbc:mysql://hdp-node-01:3306/test \

--username root \

--password root \

--table employee \

--export-dir /user/hadoop/emp/

 

3验证表mysql命令行。

mysql>select * from employee;

如果给定的数据存储成功,那么可以找到数据在如下的employee表。

+------+--------------+-------------+-------------------+--------+

| Id   | Name         | Designation | Salary            | Dept   |

+------+--------------+-------------+-------------------+--------+

| 1201 | gopal        | manager     | 50000             | TP     |

| 1202 | manisha      | preader     | 50000             | TP     |

| 1203 | kalil        | php dev     | 30000               | AC     |

| 1204 | prasanth     | php dev     | 30000             | AC     |

| 1205 | kranthi | admin | 20000 | TP |

| 1206 | p | group of | 20000 | GR |

+------+--------------+-------------+-------------------+--------+

For the following export command,
--connect 

and --username --password --table 

and --export-dir these three options are required. Among them, export-dir is the HDFS path of the exported table, and the column separator of the Hive table is notified to Sqoop through --fields-terminated-by. When exporting to mysql, the columns are divided according to "\t".

sqoop export

--connect jdbc:mysql://localhost:3306/test

--table order_info

--export-dir /user/hive/warehouse/test.db/order_info

--username root -password root

-m 1

--fields-terminated-by '\t'

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325815664&siteId=291194637