Sqoop use the data migration tool

Author of the article: foochane 

Original link: https://foochane.cn/article/2019063001.html

Use sqoop Sqoop data migration tool briefly sqoop data to HDFS / HIVE sqoop data to MySQL

1 sqoop brief

sqoop is apache's a "transfer data between Hadoop and relational database server" tool. For importing and exporting data.

sqoop

  • Importing data: MySQL, Oracle import data into the Hadoop HDFS, HIVE, HBASE other data storage system;
  • Export data: export data from Hadoop file system into a relational database such as mysql

sqoop working mechanism is to import or export commands translated into mapreduce program is implemented, in translated in mapreduce mainly on inputformat and outputformat customization.

2 sqoop installation

Before installing sqoop first installed java environment and hadoop environment.

sqoop just a tool, can be mounted on that node, as long as hadoop java environment and the environment, and can be connected to the corresponding database can.

2.1 Download and unzip

Download: http: //mirror.bit.edu.cn/apache/sqoop/
Download: sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

To extract the installation directory

2.2 modify the configuration file

The sqoop-env-template.shcopy rename the sqoop-env.shfile, sqoop-env.shadd the following files:

export HADOOP_COMMON_HOME=/usr/local/bigdata/hadoop-2.7.1
export HADOOP_MAPRED_HOME=/usr/local/bigdata/hadoop-2.7.1
export HIVE_HOME=/usr/local/bigdata/hive-2.3.5

2.3 installation of mysql jdbc start

The mysql-connector-java-5.1.45.jar into the lib directory of sqoop.

$ sudo apt-get install libmysql-java #之前已经装过了
$ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/bigdata/sqoop-1.4.7/lib

You can also copy mysql-connector-java-5.1.45.jar own hand.

2.4 verify sqoop

View sqoop version

$ bin/sqoop-version
Warning: /usr/local/bigdata/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/bigdata/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/06/30 03:03:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 20

There will be several warnings, but hold that thought.

Sqoop to verify connectivity between the service database mysql:

$ bin/sqoop-list-databases --connect jdbc:mysql://Master:3306 --username hiveuser --password 123456
$ bin/sqoop-list-tables --connect jdbc:mysql://Master:3306/metastore --username hiveuser --password 123456

3 sqoop Data Import

3.1 MySql guide data to the HDFS

First in mysql, insert the test data to build the table;

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for `emp`
-- ----------------------------
DROP TABLE IF EXISTS `emp`;
CREATE TABLE `emp` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL,
  `deg` varchar(100) DEFAULT NULL,
  `salary` int(11) DEFAULT NULL,
  `dept` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- ----------------------------
-- Records of emp
-- ----------------------------
INSERT INTO `emp` VALUES ('1201', 'gopal', 'manager', '50000', 'TP');
INSERT INTO `emp` VALUES ('1202', 'manisha', 'Proof reader', '50000', 'TP');
INSERT INTO `emp` VALUES ('1203', 'khalil', 'php dev', '30000', 'AC');
INSERT INTO `emp` VALUES ('1204', 'prasanth', 'php dev', '30000', 'AC');
INSERT INTO `emp` VALUES ('1205', 'kranthi', 'admin', '20000', 'TP');

-- ----------------------------
-- Table structure for `emp_add`
-- ----------------------------
DROP TABLE IF EXISTS `emp_add`;
CREATE TABLE `emp_add` (
  `id` int(11) DEFAULT NULL,
  `hno` varchar(100) DEFAULT NULL,
  `street` varchar(100) DEFAULT NULL,
  `city` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- ----------------------------
-- Records of emp_add
-- ----------------------------
INSERT INTO `emp_add` VALUES ('1201', '288A', 'vgiri', 'jublee');
INSERT INTO `emp_add` VALUES ('1202', '108I', 'aoc', 'sec-bad');
INSERT INTO `emp_add` VALUES ('1203', '144Z', 'pgutta', 'hyd');
INSERT INTO `emp_add` VALUES ('1204', '78B', 'old city', 'sec-bad');
INSERT INTO `emp_add` VALUES ('1205', '720X', 'hitec', 'sec-bad');

-- ----------------------------
-- Table structure for `emp_conn`
-- ----------------------------
DROP TABLE IF EXISTS `emp_conn`;
CREATE TABLE `emp_conn` (
  `id` int(100) DEFAULT NULL,
  `phno` varchar(100) DEFAULT NULL,
  `email` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- ----------------------------
-- Records of emp_conn
-- ----------------------------
INSERT INTO `emp_conn` VALUES ('1201', '2356742', '[email protected]');
INSERT INTO `emp_conn` VALUES ('1202', '1661663', '[email protected]');
INSERT INTO `emp_conn` VALUES ('1203', '8887776', '[email protected]');
INSERT INTO `emp_conn` VALUES ('1204', '9988774', '[email protected]');
INSERT INTO `emp_conn` VALUES ('1205', '1231231', '[email protected]');

Import:

bin/sqoop import   \
--connect jdbc:mysql://Master:3306/test   \
--username root  \
--password root   \
--target-dir /sqooptest \
--fields-terminated-by ',' \
--table emp   \
--m 2 \
--split-by id;
  • --connect: Specify the database
  • --username: Specifies the user name
  • --password: Specifies the password
  • --table: Table to be imported
  • --target-dir: directory specified hdfs
  • --fields-terminated-by: the specified file delimiters
  • --m: Specifies the number of MapTask, if more than one, must specify split-by parameter 2 is designated as such, it would be the final document produced two
  • --split-by: fragment designated field

If the data table is not large do not specify settings for --mparameters of

Hdfs start running before attention and yarn imported before, and submit the yarn, instead of running locally.

Example:

$ bin/sqoop import   --connect jdbc:mysql://Master:3306/test   --username hadoop  --password 123456   --target-dir /sqooptest --fields-terminated-by ',' --table emp  --m 1  --split-by id;
Warning: /usr/local/bigdata/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/bigdata/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/06/30 05:00:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/06/30 05:00:43 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/06/30 05:00:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/06/30 05:00:44 INFO tool.CodeGenTool: Beginning code generation
Sun Jun 30 05:00:45 UTC 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
19/06/30 05:00:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/06/30 05:00:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/06/30 05:00:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/bigdata/hadoop-2.7.1
Note: /tmp/sqoop-hadoop/compile/cd17c36add75dfe67edd3facf7538def/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/06/30 05:00:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cd17c36add75dfe67edd3facf7538def/emp.jar
19/06/30 05:00:56 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/06/30 05:00:56 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/06/30 05:00:56 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/06/30 05:00:56 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/06/30 05:00:56 INFO mapreduce.ImportJobBase: Beginning import of emp
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hbase-2.0.5/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/06/30 05:00:58 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/06/30 05:01:06 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/06/30 05:01:07 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.233.200:8032
Sun Jun 30 05:01:55 UTC 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
19/06/30 05:01:56 INFO db.DBInputFormat: Using read commited transaction isolation
19/06/30 05:01:56 INFO mapreduce.JobSubmitter: number of splits:1
19/06/30 05:01:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1561868076549_0002
19/06/30 05:02:05 INFO impl.YarnClientImpl: Submitted application application_1561868076549_0002
19/06/30 05:02:06 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1561868076549_0002/
19/06/30 05:02:06 INFO mapreduce.Job: Running job: job_1561868076549_0002
19/06/30 05:02:47 INFO mapreduce.Job: Job job_1561868076549_0002 running in uber mode : false
19/06/30 05:02:48 INFO mapreduce.Job:  map 0% reduce 0%
19/06/30 05:03:35 INFO mapreduce.Job:  map 100% reduce 0%
19/06/30 05:03:36 INFO mapreduce.Job: Job job_1561868076549_0002 completed successfully
19/06/30 05:03:37 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=135030
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=151
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=42476
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=42476
                Total vcore-seconds taken by all map tasks=42476
                Total megabyte-seconds taken by all map tasks=43495424
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=250
                CPU time spent (ms)=2700
                Physical memory (bytes) snapshot=108883968
                Virtual memory (bytes) snapshot=1934733312
                Total committed heap usage (bytes)=18415616
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=151
19/06/30 05:03:37 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 150.6495 seconds (1.0023 bytes/sec)
19/06/30 05:03:37 INFO mapreduce.ImportJobBase: Retrieved 5 records.

Check whether the import was successful:

$ hdfs dfs -cat /sqooptest/part-m-*
1201,gopal,manager,50000,TP
1202,manisha,Proof reader,50000,TP
1203,khalil,php dev,30000,AC
1204,prasanth,php dev,30000,AC
1205,kranthi,admin,20000,TP

3.2 MySql guide to data from Hive

command:

bin/sqoop import \
--connect jdbc:mysql://Master:3306/test \
--username hadoop  \
--password 123456  \
--table emp  \
--hive-import  \
--split-by id  \
--m 1;

Imported into the hive, you need to add --hive-importparameters, without specifying --target-dirother parameters like into the hdfs.

Import table data subset 3.3

Sometimes we do not need to import all the data in the data table, sqoop also support some data into the data table.

It is possible to use Sqoop a wherestatement. whereA subset of the clause. It performs in its own database server the appropriate SQL query, and stores the result in the destination directory of HDFS.

where clause syntax is as follows:

--where <condition>

The following command to import a subset emp_add table data. City to: sec-bad

bin/sqoop import \
--connect jdbc:mysql://Master:3306/test \
--username hadoop \
--password 123456 \
--where "city ='sec-bad'" \
--target-dir /wherequery \
--table emp_add \
 --m 1

Alternatively, you can use the select statement:

bin/sqoop import \
--connect jdbc:mysql://Master:3306/test \
--username hadoop \
--password 123456 \
--target-dir /wherequery2 \
--query 'select id,name,deg from emp WHERE id>1207 and $CONDITIONS' \
--split-by id \
--fields-terminated-by '\t' \
--m 2

3.4 Incremental import

Incremental import only import technology rows in the table newly added.

sqoop supports two delta MySql introduced into the hive modes, one append, i.e., by incrementing a specified column. The other kind is based on the time stamp.

3.4.1 append

Specify the following parameters:

--incremental append  
--check-column num_id 
--last-value 0 

--check-column indicates the specified incrementing field, - specified time to the position of the last-value

Such as:

bin/sqoop import \
--connect jdbc:mysql://Master:3306/test \
--username hadoop \
--password 123456 \
--table emp --m 1 \
--incremental append \
--check-column id \
--last-value 1208

3.4.2 According to the time stamp

Command to add the following parameters:

--incremental lastmodified 
--check-column created 
--last-value '2012-02-01 11:0:00' 

It is the only import createdthan 2012-02-01 11:0:00larger data.

4 Sqoop Data Export

The data is derived from the HDFS file to the RDBMS database, export before the target table must exist in the target database. The default action is inserted into the table from the data file using the INSERT statement. Update mode is generated UPDATE statement to update the table data
syntax

$ sqoop export (generic-args) (export-args) 

The import process

1, you first need to manually create the target table mysql

mysql> USE db;
mysql> CREATE TABLE employee ( 
   id INT NOT NULL PRIMARY KEY, 
   name VARCHAR(20), 
   deg VARCHAR(20),
   salary INT,
   dept VARCHAR(10));

2, the export command

bin/sqoop export \
--connect jdbc:mysql://Master:3306/test \
--username root \
--password root \
--table employee \
--export-dir /user/hadoop/emp/

3. Verify that the table mysql command line.

mysql>select * from employee;

If a given data storage success, then you can find the data in the employee table below.

+------+--------------+-------------+-------------------+--------+
| Id   | Name         | Designation | Salary            | Dept   |
+------+--------------+-------------+-------------------+--------+
| 1201 | gopal        | manager     | 50000             | TP     |
| 1202 | manisha      | preader     | 50000             | TP     |
| 1203 | kalil        | php dev     | 30000               | AC     |
| 1204 | prasanth     | php dev     | 30000             | AC     |
| 1205 | kranthi      | admin       | 20000             | TP     |
| 1206 | satish p     | grp des     | 20000             | GR     |
+------+--------------+-------------+-------------------+--------+

Guess you like

Origin www.cnblogs.com/foochane/p/11110583.html