Sqoop big data technologies


Sqoop big data technologies



Introduction Chapter 1 Sqoop


Sqoop is an open source tool, mainly used in the traditional database (mysql, postgresql ...) between Hadoop (Hive) for delivery of data, a relational database can be (for example: MySQL, Oracle, Postgres, etc.) the pilot data into Hadoop HDFS, it is also possible to enter data HDFS leads to a relational database.
Sqoop project began in 2009, originally as Hadoop presence of a third-party modules, then in order to allow the user to quickly deploy, in order to allow developers to more rapid iterative development, Sqoop independence as an Apache project.
Sqoop2 latest version is 1.99.7. Note 1 and 2 are not compatible, and features incomplete, it is not intended for production deployments.


Chapter 2 Sqoop principle


Import or export command translated into mapreduce program.
In the translated mapreduce in outputformat mainly on inputformat and customization.


Chapter 3 Sqoop installation (build)


Sqoop premise installation that already have Java and Hadoop environments.
3.1 Download and unzip package rack Sqoop

mkdir /usr/local/sqoop
cd /usr/local/sqoop
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
rm -rf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

3.2 modify the configuration file (in conf)

  1. Renaming a Profile
cd /usr/local/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0/conf/
cp sqoop-env-template.sh sqoop-env.sh
  1. Modify the configuration file
    sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.9.2
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.9.2
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.10
export ZOOCFGDIR=/usr/local/zookeeper/zookeeper-3.4.10
export HBASE_HOME=/usr/local/hbase

3.3 Upload carrier package
upload a mysql driver jar package to the install directory sqoop lib,
mysql-connector-java-5.1.39.jar

cd ../lib/

3.4 Verify that the startup (under bin directory)

cd ../bin/
./sqoop-version
./sqoop-list-databases --connect jdbc:mysql://localhost:3306 --username root --password 123456
./sqoop-list-tables --connect jdbc:mysql://localhost:3306 --username root --password 123456

Simple use case of Chapter 4 Sqoop


4.1 Importing data
in Sqoop, the "import" concept means: data from a large non-clustered (RDBMS) to large data clusters (HDFS, HIVE, HBASE) the transmission of data, called: import, that is, use the import keyword.
4.1.1 RDBMS to HDFS

  1. Mysql service to determine the normal open
  2. Create a table and insert some data Mysql
    mysql data3) import data
    (1) introducing all (MySql be introduced into the emp table HDFS)
./sqoop import \
> --connect jdbc:mysql://localhost:3306/test \
> --username root \
> --password 123456 \
> --table emp \
> --m 1

Introducing the emp table to HDFS MySqlInquire

hadoop fs -cat /user/root/emp/part-m-00000

Inquire

(1) introducing all (MySql be introduced in the specified directory to the emp table of HDFS)

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp \
--delete-target-dir \
--fields-terminated-by '\001' \
--table emp \
--m 1 

Introducing MySql emp table to a specified directory in the HDFSInquire

hadoop fs -cat /emp/part-m-00000

Inquire
(2) Import Query

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp3 \
--fields-terminated-by '\001' \
--delete-target-dir \
--m 1 \
--query 'select id,name from emp where salary <= 30000 and $CONDITIONS;'

Queries Import
Inquire

hadoop fs -cat /emp3/part-m-00000

Inquire

(3) introduction of the specified column

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp1 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--columns id,name

Import of the specified column
Inquire

hadoop fs -cat /emp1/part-m-00000

InquireTip: columns, if it comes to multiple columns, separated by commas, do not add spaces when separated

(4) using the keyword filter query sqoop import data

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp2 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--where "id = 1203"

Use the keyword filter query sqoop import dataInquire

hadoop fs -cat /emp2/part-m-00000

Inquire4.2, export the data
in Sqoop, the "Export" concept refers to: Cluster (RDBMS) for transmitting data to a non-large data from large data clusters (HDFS, HIVE, HBASE), called: export, namely the use of export keyword.
4.2.1 HIVE / HDFS to RDBMS

./sqoop export \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--table dept \
--export-dir /user/hive/warehouse/dept/dept.txt \
--m 1 \
--input-fields-terminated-by "\t"

Tip: Mysql if the table does not exist, does not automatically create
Inquire


Published 11 original articles · won praise 2 · Views 1800

Guess you like

Origin blog.csdn.net/weixin_45553177/article/details/104277302