Big Data Road week07 - day06 (Sqoop relational database (oracle, mysql, postgresql, etc.) and data hadoop data conversion tool)

In order to facilitate the back of learning, learning in the process of learning a Hive's first tool that Sqoop, you will find the next opportunity sqoop is the easiest framework we learn in the big data framework.

Sqoop is used to Hadoop tool data and relational database mutually transferred may be a relational database : inlet guide data (e.g. MySQL, Oracle, Postgres, etc.) to the HDFS Hadoop, it is also possible to HDFS the pilot data into a relational database.
For some NoSQL database it also provides a connector.
Sqoop, similar to other ETL tool, using metadata model to determine the data type and data transfer from the data source Hadoop ensure security type when the data processing.
Sqoop designed for large data -volume transmission design, data collection and can be divided to create a task Hadoop to process each block.
Despite the above advantages, when using Sqoop there are a few things to note.
First, the default parallelism to be careful. Parallel means that the default assumption Sqoop large data within partition key range uniformly distributed. This works well when the source system when you are using a sequence number generator generating a primary key.
Analogy, when you have a 10-node cluster, then the workload in this 10 servers distributed equally on. However, if you split is based on an alphanumeric key, the number has a key example in the "A" as would be the beginning of the "M" key as 20 times the number at the beginning, then the workload will become from one inclined server to another server.
If you are most worried about is performance, so be loaded directly under study. Direct loaded directly loaded tool to bypass the usual Java Database Connectivity import, using the database itself provides, such as MySQL 's mysqldump.
But there is a limit for a particular database. For example, you can not use MySQL or PostgreSQL connector to import BLOB and CLOB type. No driver support is introduced from view. Oracle direct drive required privileges to read such similar dba_objects and v_ $ parameter metadata . Refer to your database direct drivers limitations related documents.
Incremental imports are issues related to the efficiency of the most talked about, because Sqoop is specifically designed for large data sets. Sqoop supports incremental updates, will add a new record to the most recent export data source or specify a timestamp of the last modification.
Since Sqoop to move data into and out of relational database capabilities, which for Hive- Hadoop ecosystem in the famous class SQL data warehouse - a dedicated support is not surprising. Command "create-hive-table" may be used to import data table definitions Hive .
 

:( version of the two versions are not compatible entirely up to sqoop1 use)

  sqoop1:1.4.x

  sqoop2:1.99.x

 

Similar products

  DataX: Ali top-level data exchange tool

 

Note that, here it is relative to the import and export of Hadoop it! ! ! ! !

 

 

 

 

Hadoop data into the HDFS in:

 

 

HDFS data in a relational database to export to:

 

 

Guess you like

Origin www.cnblogs.com/wyh-study/p/12078226.html