Installation and simple use of sqoop

One: Introduction to sqoop:
Sqoop is an open source tool, which is mainly used to transfer data between Hadoop (Hive) and traditional databases (mysql, postgresql...). It can connect a relational database (for example: MySQL, Oracle, etc.) Data in Postgres, etc.) is imported into Hadoop's HDFS, and HDFS data can also be imported into a relational database.

Here you need to understand the concept of importing data and exporting data.
Import data (import) : Transfer data from non-big data clusters (mysql, sql Server) to big data clusters (HDFS, HIVE, HBASE).
Export data (export) : Transfer data from a big data cluster (HDFS, HIVE, HBASE) to a non-big data cluster (RDBMS).

Two: sqoop installation
1. Unzip the tar package under /opt/oftware to /opt/module.
Here I am using version 1.4.6. Try not to download the 1.99 series version, because the official website clearly indicates 1.99.7 and 1.4 .7 is not compatible, and the function is not complete, it is not suitable for production deployment

Insert picture description here

tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/

2. Enter the module directory and modify the sqoop-1.4.6.bin__hadoop-2.0.4-alpha directory name to sqoop

mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/   sqoop

3. Modify sqoop-env-template.sh under /opt/module/sqoop/conf to sqoop-env.sh, and then save and exit after adding the following content to the last line of this file (similar to hadoop, zk The installation directory can be adjusted according to the actual installation path).

mv sqoop-env-template.sh sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export ZOOCFGDIR=/opt/module/zookeeper-3.4.10
export HBASE_HOME=/opt/module/hbase

4. Copy the jar package of mysql-connector-java-5.1.27-bin.jar to the lib directory of sqoop. If you don’t have this jar package, go to the Internet to download one. Basically, the 5.x version can be used.

[root@hadoop102 sqoop]# cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./lib/

The above is the whole installation process of sqoop. At this time, let's verify whether the configuration of sqoop is correct.
Execute [root@hadoop102 sqoop]# bin/sqoop help

[root@hadoop102 sqoop]# bin/sqoop help
Warning: /opt/module/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/07/30 19:14:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.
[root@hadoop102 sqoop]#

The above warning can be ignored. The above content shows that our sqoop has been installed and configured successfully.

Test whether Sqoop can successfully connect to the database, pay attention to the mysql password according to your own settings, and then my host name is hadoop102

bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password 123456

Insert picture description here

The above content shows that Sqoop is able to connect to the database successfully.

Three: the simple use of sqoop:
before use, we first create a database in mysql, and then create a table in this database and add related content.
mysql -uroot -p123456
mysql> create database company;
mysql> create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255));
mysql> insert into company.staff (name, sex) values('Thomas','Male');
mysql> insert into company.staff(name, sex) values('Catalina','FeMale');

1. Import mysql data into HDFS ( hadoop needs to be started here )

1). Import all (here \ is the meaning of a newline character, because the readability will be poor if all written on one line, the same below)

Parameter interpretation:
username: mysql connection username
password: mysql password
table: which table data to import
target-dir: display the location where the specified data is imported into HDFS, the default save path is: /user/{current user }/{Table name}/table data file
delete-target-dir: If you need to delete an existing HDFS file during import, you can use the delete parameter.

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

After execution, we can find that the file part-m-00000 has been generated in the /user/company directory, we download the file to the local, and we can see that the file has been divided according to'\t'

Insert picture description here

2) Query import:
query: You can select, and you can add the where clause to the SQL statement.
$CONDITIONS means to maintain the import order unchanged, this keyword must be added.

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select name,sex from staff where id <=1 and $CONDITIONS;'

Download the generated file and you can see that the output has been as expected

Insert picture description here

3). Import the specified column

Note: If multiple columns are involved in the columns, separate them with commas and do not add spaces when separating

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--columns id,sex \
--table staff

Insert picture description here
4). Use sqoop keyword filter query to import data

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table staff \
--where "id=1"

Insert picture description here

2. Import mysql data into hive
* Import the table name into hive directly on the command line, without manually creating a table in hive.
*The process is divided into two steps. The first step is to import the data into HDFS, and the second step is to migrate the data imported into HDFS to the Hive warehouse. The default temporary directory in the first step is /user/root/table name.

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive

Insert picture description here

3. Import mysql data into hbase
-hbase-create-table: If specified, create the missing HBase table, (it is worth noting that due to the incompatibility of mysql and hbase versions, the table itself can be created by the command, now it is required We manually create a table in hbase) --split
-by: column used to split the table of work units

bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--columns "id,name,sex" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "hbase_company" \
--num-mappers 1 \
--split-by id

Here, I scan the table in hbase and find that the data has been imported.

Insert picture description here

Data export (export). Note that when exporting, it does not support hbase export to mysql, but supports hive or hadoop export to mysql.
Note: a separator must be added here

bin/sqoop export \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff1 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"

Note Mysql if table does not exist, does not automatically create
here I want to export data to mysql in staff1 table, but staff1 table does not create, it will report the following error.

Insert picture description here

When I create the table again and execute the statement, I can see that the data has been imported from hive to mysql

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/107696096