Background: sqoop is mainly used to transfer data between Hadoop (Hive) and traditional databases (mysql, postgresql...), and can import data from a relational database (for example: MySQL, Oracle, Postgres, etc.) In the HDFS of Hadoop, you can also import HDFS data into a relational database. Here mainly take mysql as an example to introduce how to use import and export.
For details, refer to the official document: http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_purpose_4
1. First install the corresponding version of sqoop
2. Copy the mysql jar package to /home/sqoop/lib
3. Introduction
Import data to hdfs
1) Method one
sqoop import \
--connect jdbc:mysql://localhost/db \
--username root \
--password 123456 \
--table table123 \
--target-dir /user/foo/joinresults \ #hdfs path
--num-mappers 1 \
#map number-- as-parquetfile \ #Set the file format
--columns id, name \ # can set the column to be exported
2) Method two sql method
sqoop import \
--connect jdbc:mysql ://localhost/db \
--username root \
--password 123456 \
--query'select * from table123 WHERE $CONDITIONS' \ #where is required
--target-dir /user/foo/joinresults \ #hdfs path
--delete-target-dir \ #If the target directory already exists, delete
--compress \ #specify compression
--compression-codec org.apache.hadoop.io.compress.snappyCodec \ #Specify the compression method snappy
--fields-terminated-by'|'
3) In addition to importing incremental data in sql using where conditions to filter, you can also have The command options are as follows
sqoop import \
--connect jdbc:mysql://localhost/db \
--username root \
--password 123456 \
--query'select * from table123 WHERE $CONDITIONS' \ #where is required
--target -dir /user/foo/joinresults \ #hdfsPATH
#--delete-target-dir \ #If the target directory already exists, delete it (not through append below, use together)
--check-column \ #check which One column, such as id
--incremental append \
#Append , also optional lastmodified --last-value 7636 \ #id greater than 7636
4) Quickly export data
sqoop import \
--connect jdbc:mysql://localhost/db \
- -username root \
--password 123456 \
--query'select * from table123 WHERE $CONDITIONS' \ #where is necessary
--target-dir /user/foo/joinresults \ #import hdfs path
--delete-target-dir \ #If the target directory already exists, then Delete (not through append below, use together)
--direct \ #Use this command to export from mysql instead of using mapreduce but directly import, the speed is much faster than before.
5) Data export
sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--password 123456 \
--table table123 \
--export-dir /user/foo/joinresults \ #export hdfs Path
6) Import and export to hive
sqoop import \
--connect jdbc:mysql://localhost/db \
--username root \
--password 123456 \
--query'select * from table123 WHERE $CONDITIONS' \ #where is required the
--fields-terminated-by '|'
--hive-import \
--hive-database default \
--hive-table table 123 \
export hive to mysql
sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--password 123456 \
--table table123 \
--export-dir /user/foo/joinresults \ #export hive table data storage path
--input-fields-terminated-by'|' \ #separator
Note:
The version before 1.4.6 If the database imports data to hive, if the hive storage format is parquet, the import will be wrong. The
script execution method
bin/sqoop --option-file /opt/script/sqoop_test.txt --table tablename (can also be passed in this way Parameters)
vi sqoop_test.txt
import
--connect
jdbc:mysql://localhost/db
--username
root
--password
123456
--table
table123
--target-dir
/ the User / foo / joinresults
--num-mappers
1
---------------------
Author: hh_666
Source: CSDN
Original: https://blog.csdn.net/qq_34485930/article/details/80868017
Copyright statement: This article is the original article of the blogger, please attach the link of the blog post if you reprint it!