hadoop series of nineteen --sqoop knowledge summary

1 Overview

sqoop is apache's a "transfer data between Hadoop and relational database server" tool.

Importing data: MySQL, Oracle import data into the Hadoop HDFS, HIVE, HBASE other data storage system. Hadoop for data storage systems and relational databases migrate each other;

Export data: export data from Hadoop file system into a relational database such as mysql. ,

2, working mechanism

Import or export command translated into mapreduce program.

In the translated mapreduce in outputformat mainly on inputformat and customization.

3, sqoop use

Importing data

Introducing a single table from the RDBMS to HDFS. Each row in the table is considered HDFS records. All records are stored in the text data as a text file (or Avro, sequence of binary data files, etc.)

The mysql table guide hdfs

Code

/**  将mysql的表导入 hdfs **/
bin/sqoop import \                  --启动sqoop
--connect jdbc:mysql://hdp-04:3306/userdb \             --sqoop 的所在位置
--username root \      
--password root \
--target-dir \             --   目标的目录
/sqooptest \              
--fields-terminated-by ',' \                    --新生成的库字段以什么分区,指定分隔符
--table emp \                             --导入的表明   
--split-by id \              --如果maptask大于2,则表示根据什么字段来做切片
--m 2                     --几个maptask

Note:
If you set --m 1, it means that the execution will start a maptask data import;
if not set --m 1, the default is to start four map task execution data import, you need to specify a column as divided according to the map task tasks.

The introduction table mysql hive

Code

bin/sqoop import \
--connect jdbc:mysql://hdp-04:3306/userdb \
--username root \
--password root \
--hive-import \       -- 指定导入hive  
--fields-terminated-by ',' \
--table emp \
--split-by id \
--m 2

Import table data subsets

We can use Sqoop import tool to import table "where" a subset of the clause. It performs in its own database server the appropriate SQL query, and stores the result in the destination directory of HDFS.

where clause syntax is as follows:

--where <condition>

The following command to import a subset emp_add table data. A subset of the query retrieves employee ID and address, city of residence is (sec-bad):

    bin/sqoop import \
    --connect jdbc:mysql://hdp-node-01:3306/test \
    --username root \
    --password root \
    --where "city ='sec-bad'" \子集条件声明
    --target-dir /wherequery \
    --table emp_add \
     --m 1

The incremental data mysql table is the new data import hdfs

Incremental import only import technology rows in the table newly added.
sqoop supports two delta MySql introduced into the hive mode,

  • One is append, i.e., incrementing a column by specifying, for example:
--incremental append  --check-column num_id --last-value 0 
  • The other kind is based on the time stamp, such as:
--incremental lastmodified --check-column created --last-value '2018-02-01 11:0:00' 

It is to import only '2018-02-01 11: 0: 00' created more than data.

append mode

It is necessary to add 'incremental', 'check-column', and 'last-value' option to perform incremental introduction.

The following command syntax is used Sqoop incremental import options:

--incremental <mode>        --指定模式。,表示增量导入
--check-column <column name>   --根据 哪一行盘点增量,指定递增的列
--last value <last check column value>      --从哪一行开始

Code

bin/sqoop import \
--connect jdbc:mysql://hdp-04:3306/userdb \
--target-dir /sqooptest  \
--username root \
--password root \
--table emp \
--m 1 \
--incremental append \            --  表示增量导入
--check-column id \              --指定递增的列
--last-value 1205    --上次导入到多少行

Sqoop的数据导出

5.1、将数据从HDFS文件导出到RDBMS数据库
导出前,目标表必须存在于目标数据库中。
默认操作是将文件中的数据使用INSERT语句插入到表中
更新模式下,是生成UPDATE语句更新表数据

Hdfs export data files to mysql

bin/sqoop export \
--connect jdbc:mysql://hdp-04:3306/userdb \            --mysql的位置
--username root \
--password root \
--input-fields-terminated-by ',' \             --Hdfs的分隔符
--table emp \         --表名
--export-dir /sqooptest/                         --导出的数据在哪        

The hive of table data (hdfs file) Export to mysql

bin/sqoop export \
--connect jdbc:mysql://hdp-04:3306/userdb \
--username root \
--password root \
--input-fields-terminated-by ',' \
--table t_from_hive \
--export-dir /user/hive/warehouse/t_a/
Published 44 original articles · won praise 0 · Views 864

Guess you like

Origin blog.csdn.net/heartless_killer/article/details/103037657