Sqoop - Tools for between Hadoop and relational database data import and export work

Turn: https://blog.csdn.net/qx12306/article/details/67014096

Sqoop is an open source tool, mainly used in Hadoop-related storage (HDFS, Hive, HBase) data transfer work with traditional relational databases (MySql, Oracle, etc.) between. Sqoop was first Hadoop as a third-party modules exist, later became an independent Apache project. In addition to the relational database, the database for some NoSQL, Sqoop also provided a connector.

A, Sqoop basics

  Sqoop beginning of the project, data import and export work can be carried out in 2009 between Hadoop storage associated with traditional relational databases. Sqoop will open multiple MapReduce tasks to be performed in parallel data import and export work, improve work efficiency.

Two, Sqoop installation

  This version Installation example: Sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz .

1, the installation files uploaded to / usr / local / directory and extract, then rename sqoop.

2, configure the environment variables, execute the command: vi / etc / profile, increased export $ SQOOP_HOME = / usr / local / bin, and increase $ PIG_HOME / bin in the export PATH, and then execute the command: source / etc / profile so that the profile Effective immediately.

3, the need to connect the database drivers copy to lib directory, mysql database connected to this example, drivers of: mysql-Connector-Java-5.1.34.jar . Note, hadoop2 seemingly need mysql driver version after 5.1.30, or is likely an error.

Three, Sqoop use the command

   Refer to the official document: http://sqoop.apache.org/docs/1.4.5/index.html .

1, MySQL data into Hadoop

   sqoop import --connect jdbc:mysql://192.168.137.1:3306/test --username root --password 123456 --table mytabs --fields-terminated-by '\t' -m 1 [--hive-import] --append --check-column 'id'  --incremental append --last-value 5 --where 'id>2 and id<5'

   Common parameters:

   ▶ --fields-terminated-by '\ t', indicates to import data into the interval between character columns hadoop recording, the default symbol is a comma. Is generally used herein tab \ t interval data to avoid confusion from dividing the data again due to HDFS into relation database;

   ▶ -m 1, is an abbreviation of --num-mappers indicates the number of a designated MapReduce (default will automatically open multiple), Sqoop transformed MR does not contain the reduce;

   ▶ --append, represents import data into hadoop way for additional, or allowed to repeat introduced;

   ▶ --check-column 'primary key column name' --incremental append --last-value 5, import data representing increments, depending on the value --last-value to determine, record this value is greater than the import, or do not perform the import operations;

   ▶ --hive-import, represents import the data into the Hive;

   ▶ --where '', data filters

   ▶ -e or --query 'select * from table where id> 5 and $ CONDITIONS', custom import data sql statement. Use custom sql statement Note: ① using custom sql can not specify --table; ② from where conditions are defined sql statement must contain the string "$ CONDITIONS", $ CONDITIONS is a variable used to give more mandate map division of tasks; ③ using custom sql, if a plurality of map tasks specified by the parameter -m, since there may be the sql custom multi-table queries, it is necessary to use the parameter "--split-by table field name. "specified task into a plurality of map data based on the field, such as --split-by users.id;

   ▶ --target-dir, display data is introduced into the specified location in HDFS, save default path: / user / {current user} / {} table / table data file, it is necessary to delete an existing file when importing HDFS may be used --delete-target-dir; 

2, Hadoop data into MySQL

   sqoop export --connect jdbc:mysql://192.168.137.1:3306/test --username root --password 123456 --table ids --fields-terminated-by '\t' --export-dir '/ids'

   Wherein --export-dir '/ ids', HDFS file represents the address specified to be exported. Before performing the export operation, you must ensure that the MySQL table ids exist.

3, Sqoop command to save the job easy call

   sqoop job --create myjob -- import --connect jdbc:mysql://192.168.137.1:3306/test --username root --password 123456 --table mytabs --fields-terminated-by '\t'

   Which indicates that the job myjob name. Password stored in the job by default when the call will be asked to enter a password, the password needs to be stored directly in the next job can be executed directly without a password, you can sqoop.metastore.client /conf/sqoop-site.xml in. record.password gaze removed.

   Other related job commands: ① job sqoop job --list, view the list of jobs; ② job sqoop job --delete myjob, delete the job

Guess you like

Origin www.cnblogs.com/ceshi2016/p/12125342.html