Speeding up data transfer 2.7sqoop

Apache Sqoop Cookbook in English - translation learn !!
More information https://blue-shadow.top/

problem

sqoop is a powerful tool, able to handle large amounts of data transmission, but how to make Sqoop faster.

solution

For some database by using parameters --direct to take advantage of the direct connection.

sqoop import \
--connect jdbc:mysql://msyql.example.com/sqoop \
--username sqoop \
--table cities  \
--direct

discuss

Instead of using a direct mode delegate JDBC interface to transmit data, but the data transmitted to the unit utility database vendor. To Mysql, for example,
mysqldump and mysqlimport be used to obtain data from or write data will be. For postgresql, sqoop take advantage of pg_dump to import the data. Use the local utility can be
significantly improved performance, as they are optimized to provide the best transmission speed, while reducing the burden on the database server. But there are several limitations to the use of this fast import, first: not
all databases using locally available tools; to direct this model is not effective for all database, currently only supports sqoop good for the mysql postgresql direct.

Since all data transfer operations are performed in the MapReduce job generation, and due to data transmissions in the application tool acting direct data transmission delay, it is necessary to
ensure that local application used on Hadoop TaskTracker active node, for example: in use Mysql, TaskTracker on each server need to install and mysqlimport
mysqldump

Another limitation of direct mode does not support all of the parameters, as the native utilities typically produce text output, binary format or as SequenceFile Avro will not work.
Further, in the following cases: the Custom escape character, type mapping, column and row delimiter or alternatively NULL string parameters may not be supported.

Reproduced in: https: //www.jianshu.com/p/2caf1d7707bb

Guess you like

Origin blog.csdn.net/weixin_33757609/article/details/91186119