Source code of this article: GitHub || GitEE
One, Sqoop overview
Sqoop is an open source big data component, mainly used to transfer data between Hadoop (Hive, HBase, etc.) and traditional databases (mysql, postgresql, oracle, etc.).
Usually the basic functions of data handling components: import and export.
Since Sqoop is a component of the big data technology system, importing relational databases into the Hadoop storage system is called importing, and vice versa.
Sqoop is a command-line component tool that converts import or export commands into mapreduce programs. The main purpose of mapreduce is to customize inputformat and outputformat.
Two, environment deployment
When testing Sqoop components, at least a basic environment such as Hadoop series, relational data, and JDK is required.
Given that Sqoop is a tool component, a single node installation is sufficient.
1. Upload the installation package
Installation package and version:sqoop-1.4.6
[root@hop01 opt]# tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[root@hop01 opt]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop1.4.6
2. Modify the configuration file
File location:sqoop1.4.6/conf
[root@hop01 conf]# pwd
/opt/sqoop1.4.6/conf
[root@hop01 conf]# mv sqoop-env-template.sh sqoop-env.sh
Configuration content: involving the common components of the hadoop series and the scheduling component zookeeper.
[root@hop01 conf]# vim sqoop-env.sh
# 配置内容
export HADOOP_COMMON_HOME=/opt/hadoop2.7
export HADOOP_MAPRED_HOME=/opt/hadoop2.7
export HIVE_HOME=/opt/hive1.2
export HBASE_HOME=/opt/hbase-1.3.1
export ZOOKEEPER_HOME=/opt/zookeeper3.4
export ZOOCFGDIR=/opt/zookeeper3.4
3. Configure environment variables
[root@hop01 opt]# vim /etc/profile
export SQOOP_HOME=/opt/sqoop1.4.6
export PATH=$PATH:$SQOOP_HOME/bin
[root@hop01 opt]# source /etc/profile
4. Introduce MySQL driver
[root@hop01 opt]# cp mysql-connector-java-5.1.27-bin.jar sqoop1.4.6/lib/
5. Environmental inspection
Key points: import and export
View the help command, and view the version number through version. Sqoop is a tool based on command line operation, so the commands here will be used below.
6. Relevant environment
At this point, look at the relevant environment in the sqoop deployment node, which is basically a cluster mode:
7. Test the MySQL connection
sqoop list-databases --connect jdbc:mysql://hop01:3306/ --username root --password 123456
Here is the command to view the MySQL database, as shown in the figure, the result is printed correctly:
Three, data import case
1. MySQL data script
CREATE TABLE `tb_user` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键id',
`user_name` varchar(100) DEFAULT NULL COMMENT '用户名',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户表';
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (1, 'spring');
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (2, 'c++');
INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (3, 'java');
2. Sqoop import script
Specify the tables of the database and import all the tables into the Hadoop system. Note that the Hadoop service must be started here;
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--target-dir /hopdir/user/tbuser0 \
-m 1
3. Hadoop query
[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbuser0/part-m-00000
4. Specify columns and conditions
WHERE\$CONDITIONS must be included in the query SQL statement:
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--target-dir /hopdir/user/tbname0 \
--num-mappers 1 \
--query 'select user_name from tb_user where 1=1 and $CONDITIONS;'
View the export results:
[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbname0/part-m-00000
5. Import Hive components
Without specifying the database used by hive, the default library is imported by default, and the table name is automatically created:
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--hive-import \
-m 1
During the execution process, pay attention to the execution log of sqoop here:
Step 1: Import MySQL data into the default path of HDFS;
Step 2: Migrate the data in the temporary directory to the hive table;
6. Import HBase components
The current hbase cluster version is 1.3. You need to create a table before you can perform data import normally:
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--columns "id,user_name" \
--column-family "info" \
--hbase-table tb_user \
--hbase-row-key id \
--split-by id
View table data in HBase:
Four, data export case
Create a new MySQL database and table, and then export the data in HDFS to MySQL. Here you can use the data generated by the first import script:
sqoop export
--connect jdbc:mysql://hop01:3306/sq_export \
--username root \
--password 123456 \
--table tb_user \
--num-mappers 1 \
--export-dir /hopdir/user/tbuser0/part-m-00000 \
--num-mappers 1 \
--input-fields-terminated-by ","
View the data in MySQL again, the records are completely exported, here ,
is the separator between each data field, the grammatical rules can be compared to the script-HDFS data query result.
Five, source code address
GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent
Read the label
[ Java Foundation ] [ Design Pattern ] [ Structure and Algorithm ] [ Linux System ] [ Database ]
[ Distributed architecture ] [ micro service ] [ big data components ] [ SpringBoot Advanced ] [ Spring & Boot foundation ]
[ Data Analysis ] [ Technical Map ] [ Workplace ]