Sqoop big data technologies
Introduction Chapter 1 Sqoop
Sqoop is an open source tool, mainly used in the traditional database (mysql, postgresql ...) between Hadoop (Hive) for delivery of data, a relational database can be (for example: MySQL, Oracle, Postgres, etc.) the pilot data into Hadoop HDFS, it is also possible to enter data HDFS leads to a relational database.
Sqoop project began in 2009, originally as Hadoop presence of a third-party modules, then in order to allow the user to quickly deploy, in order to allow developers to more rapid iterative development, Sqoop independence as an Apache project.
Sqoop2 latest version is 1.99.7. Note 1 and 2 are not compatible, and features incomplete, it is not intended for production deployments.
Chapter 2 Sqoop principle
Import or export command translated into mapreduce program.
In the translated mapreduce in outputformat mainly on inputformat and customization.
Chapter 3 Sqoop installation (build)
Sqoop premise installation that already have Java and Hadoop environments.
3.1 Download and unzip package rack Sqoop
mkdir /usr/local/sqoop
cd /usr/local/sqoop
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
rm -rf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
3.2 modify the configuration file (in conf)
- Renaming a Profile
cd /usr/local/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0/conf/
cp sqoop-env-template.sh sqoop-env.sh
- Modify the configuration file
sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.9.2
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.9.2
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.10
export ZOOCFGDIR=/usr/local/zookeeper/zookeeper-3.4.10
export HBASE_HOME=/usr/local/hbase
3.3 Upload carrier package
upload a mysql driver jar package to the install directory sqoop lib,
mysql-connector-java-5.1.39.jar
cd ../lib/
3.4 Verify that the startup (under bin directory)
cd ../bin/
./sqoop-version
./sqoop-list-databases --connect jdbc:mysql://localhost:3306 --username root --password 123456
./sqoop-list-tables --connect jdbc:mysql://localhost:3306 --username root --password 123456
Simple use case of Chapter 4 Sqoop
4.1 Importing data
in Sqoop, the "import" concept means: data from a large non-clustered (RDBMS) to large data clusters (HDFS, HIVE, HBASE) the transmission of data, called: import, that is, use the import keyword.
4.1.1 RDBMS to HDFS
- Mysql service to determine the normal open
- Create a table and insert some data Mysql
3) import data
(1) introducing all (MySql be introduced into the emp table HDFS)
./sqoop import \
> --connect jdbc:mysql://localhost:3306/test \
> --username root \
> --password 123456 \
> --table emp \
> --m 1
Inquire
hadoop fs -cat /user/root/emp/part-m-00000
(1) introducing all (MySql be introduced in the specified directory to the emp table of HDFS)
./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp \
--delete-target-dir \
--fields-terminated-by '\001' \
--table emp \
--m 1
Inquire
hadoop fs -cat /emp/part-m-00000
(2) Import Query
./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp3 \
--fields-terminated-by '\001' \
--delete-target-dir \
--m 1 \
--query 'select id,name from emp where salary <= 30000 and $CONDITIONS;'
Inquire
hadoop fs -cat /emp3/part-m-00000
(3) introduction of the specified column
./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp1 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--columns id,name
Inquire
hadoop fs -cat /emp1/part-m-00000
Tip: columns, if it comes to multiple columns, separated by commas, do not add spaces when separated
(4) using the keyword filter query sqoop import data
./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp2 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--where "id = 1203"
Inquire
hadoop fs -cat /emp2/part-m-00000
4.2, export the data
in Sqoop, the "Export" concept refers to: Cluster (RDBMS) for transmitting data to a non-large data from large data clusters (HDFS, HIVE, HBASE), called: export, namely the use of export keyword.
4.2.1 HIVE / HDFS to RDBMS
./sqoop export \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--table dept \
--export-dir /user/hive/warehouse/dept/dept.txt \
--m 1 \
--input-fields-terminated-by "\t"
Tip: Mysql if the table does not exist, does not automatically create