Data handling component: manage data import and export based on Sqoop

Source code of this article: GitHub || GitEE

One, Sqoop overview

Sqoop is an open source big data component, mainly used to transfer data between Hadoop (Hive, HBase, etc.) and traditional databases (mysql, postgresql, oracle, etc.).

Data handling component: manage data import and export based on Sqoop

Usually the basic functions of data handling components: import and export.

Since Sqoop is a component of the big data technology system, importing relational databases into the Hadoop storage system is called importing, and vice versa.

Sqoop is a command-line component tool that converts import or export commands into mapreduce programs. The main purpose of mapreduce is to customize inputformat and outputformat.

Two, environment deployment

When testing Sqoop components, at least a basic environment such as Hadoop series, relational data, and JDK is required.

Given that Sqoop is a tool component, a single node installation is sufficient.

1. Upload the installation package

Installation package and version:sqoop-1.4.6

[root@hop01 opt]# tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[root@hop01 opt]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop1.4.6

2. Modify the configuration file

File location:sqoop1.4.6/conf

[root@hop01 conf]# pwd
/opt/sqoop1.4.6/conf
[root@hop01 conf]# mv sqoop-env-template.sh sqoop-env.sh

Configuration content: involving the common components of the hadoop series and the scheduling component zookeeper.

[root@hop01 conf]# vim sqoop-env.sh
# 配置内容
export HADOOP_COMMON_HOME=/opt/hadoop2.7
export HADOOP_MAPRED_HOME=/opt/hadoop2.7
export HIVE_HOME=/opt/hive1.2
export HBASE_HOME=/opt/hbase-1.3.1
export ZOOKEEPER_HOME=/opt/zookeeper3.4
export ZOOCFGDIR=/opt/zookeeper3.4

3. Configure environment variables

[root@hop01 opt]# vim /etc/profile

export SQOOP_HOME=/opt/sqoop1.4.6
export PATH=$PATH:$SQOOP_HOME/bin

[root@hop01 opt]# source /etc/profile

4. Introduce MySQL driver

[root@hop01 opt]# cp mysql-connector-java-5.1.27-bin.jar sqoop1.4.6/lib/

5. Environmental inspection

Data handling component: manage data import and export based on Sqoop

Key points: import and export

View the help command, and view the version number through version. Sqoop is a tool based on command line operation, so the commands here will be used below.

6. Relevant environment

At this point, look at the relevant environment in the sqoop deployment node, which is basically a cluster mode:

Data handling component: manage data import and export based on Sqoop

7. Test the MySQL connection

sqoop list-databases --connect jdbc:mysql://hop01:3306/ --username root --password 123456

Here is the command to view the MySQL database, as shown in the figure, the result is printed correctly:

Data handling component: manage data import and export based on Sqoop

Three, data import case

1. MySQL data script

CREATE TABLE `tb_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键id',
  `user_name` varchar(100) DEFAULT NULL COMMENT '用户名',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户表';
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (1, 'spring');
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (2, 'c++');
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (3, 'java');

Data handling component: manage data import and export based on Sqoop

2. Sqoop import script

Specify the tables of the database and import all the tables into the Hadoop system. Note that the Hadoop service must be started here;

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \ 
--table tb_user \
--target-dir /hopdir/user/tbuser0 \
-m 1

3. Hadoop query

Data handling component: manage data import and export based on Sqoop

[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbuser0/part-m-00000

4. Specify columns and conditions

WHERE\$CONDITIONS must be included in the query SQL statement:

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--target-dir /hopdir/user/tbname0 \
--num-mappers 1 \
--query 'select user_name from tb_user where 1=1 and $CONDITIONS;'

View the export results:

[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbname0/part-m-00000

5. Import Hive components

Without specifying the database used by hive, the default library is imported by default, and the table name is automatically created:

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--hive-import \
-m 1

During the execution process, pay attention to the execution log of sqoop here:

Step 1: Import MySQL data into the default path of HDFS;

Step 2: Migrate the data in the temporary directory to the hive table;

Data handling component: manage data import and export based on Sqoop

6. Import HBase components

The current hbase cluster version is 1.3. You need to create a table before you can perform data import normally:

sqoop import 
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--columns "id,user_name" \
--column-family "info" \
--hbase-table tb_user \
--hbase-row-key id \
--split-by id

View table data in HBase:

Data handling component: manage data import and export based on Sqoop

Four, data export case

Create a new MySQL database and table, and then export the data in HDFS to MySQL. Here you can use the data generated by the first import script:

Data handling component: manage data import and export based on Sqoop

sqoop export 
--connect jdbc:mysql://hop01:3306/sq_export \
--username root \
--password 123456 \
--table tb_user \
--num-mappers 1 \
--export-dir /hopdir/user/tbuser0/part-m-00000 \
--num-mappers 1 \
--input-fields-terminated-by ","

View the data in MySQL again, the records are completely exported, here ,is the separator between each data field, the grammatical rules can be compared to the script-HDFS data query result.

Five, source code address

GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent

Data handling component: manage data import and export based on Sqoop

Read the label

[ Java Foundation ] [ Design Pattern ] [ Structure and Algorithm ] [ Linux System ] [ Database ]

[ Distributed architecture ] [ micro service ] [ big data components ] [ SpringBoot Advanced ] [ Spring & Boot foundation ]

[ Data Analysis ] [ Technical Map ] [ Workplace ]

Guess you like

Origin blog.51cto.com/14439672/2659836