sqoop将数据从MySQL导入到hive中的步骤,以及会出现的问题

1.关于sqoop连接MySQL的一些命令

1.列出MySQL数据库中的所有数据库

sqoop list-databases --connect jdbc:mysql://hadoop101:3306?useSSL=false --username root --password 123456

2.连接mysql并列出数据库中的表

sqoop list-tables --connect jdbc:mysql://hadoop101:3306/test?useSSL=false --username root --password 123456

3.将MySQL的first表结构复制到Hive的student库中,表名为first

sqoop create-hive-table --connect jdbc:mysql://hadoop101:3306/student?useSSL=false --first --username root --password 123456 --hive-table first

4.将mysql表的数据导入到hive中

1. 追加数据

sqoop import --connect jdbc:mysql://hadoop101:3306/student?useSSL=false --username root --password 123456 --first --hive-import --hive-table first

2. 覆盖数据

sqoop import --connect jdbc:mysql://hadoop101:3306/student?useSSL=false --username root --password 123456 --first --hive-import --hive-overwrite --hive-table first

5.将hive表的数据导入到mysql中

sqoop export --connect jdbc:mysql://hadoop101:3306/student?useSSL=false --username root --password 123456 --first --export-dir /user/hive/warehouse/test.db/mysql_t1

参考:https://wxy0327.blog.csdn.net/article/details/50921702?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.control

2.sqoop导数据时会出现以下问题:

21/01/13 04:31:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
21/01/13 04:31:05 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/01/13 04:31:05 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
21/01/13 04:31:05 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
21/01/13 04:31:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/01/13 04:31:06 INFO tool.CodeGenTool: Beginning code generation
21/01/13 04:31:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `first` AS t LIMIT 1
21/01/13 04:31:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `first` AS t LIMIT 1
21/01/13 04:31:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/module/hadoop-2.7.2
注: /tmp/sqoop-root/compile/728c44b28d5a73198d56dcd206fb2184/first.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
21/01/13 04:31:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/728c44b28d5a73198d56dcd206fb2184/first.jar
21/01/13 04:31:10 WARN manager.MySQLManager: It looks like you are importing from mysql.
21/01/13 04:31:10 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
21/01/13 04:31:10 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
21/01/13 04:31:10 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
21/01/13 04:31:10 ERROR tool.ImportTool: Error during import: No primary key could be found for table first. Please specify one with --split-by or perform a sequential impo
rt with '-m 1'.

分析出现该现象的原因:是因为MySQL中要导出的数据表中没有主键,所以解决方法就是将MySQL中的表加上主键就能解决问题

1. 解决方法

alter table student first add primary key(stu_no);

解释:primary key(stu_no)括号里面的内容是要加主键的字段,这个根据自己的需要自己定义就好。

3.sqoop的执行原理

从运行的过程中发现,sqoop在运行脚本的时候其实最终是转换成了MapReduce,所以sqoop在执行脚本的时候会比较慢。
但是我相信会有办法将sqoop底层的mapreduce转换成其他的计算引擎,例如hive可以将mapreduce转换成spark。

Guess you like

Origin blog.csdn.net/weixin_48929324/article/details/112759730