kettle8.2.0 下面mysql 导入Hive3.1.0

mysql导入hive3.1.0中花了两天时间,弄的有点泪崩。

主要问题最新hive模式有些变化,另外hive本身直接导入非常低效,因此只能采用mysql导入hdfs,然后再加载到hive中。

先点击new transformation 生成一个ktr文件。

1 先创建表输入和Hadoop file output,详细如下:

2 配置Hadoop file output,详细如下图

点击测试,进行查看配置是否正确

不正确的情况:

1:core-site.xml,mapred-site.xml,yarn-site.xml,hive-site.xml,hdfs-site.xml缺少。需要去大数据平台找到对应的配置,然后进入pdi-ce-8.2.0.0-342\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\hdp30中进行替换

2:Windows电脑的hosts中没有配置对应的映射关系

C:\Windows\System32\drivers\etc\hosts

3 配置Hadoop file output的内容,详细如下

字段获取如下,先点击获取字段,然后设置最小宽度。重点注意:有些字段是int类型的,需要设置格式为#,如果是时间格式,需要选择标准时间。

所有配置完成后,点击执行,再次点开Hadoop file output,点击浏览,返回上一层,查找是否生成txt文件。如下图所示:

进一步终端中,切换为hdfs或者hive用户

su hdfs

 

说明数据库文件已经导出来了,存储为txt格式。

4 创建一个新的job任务,中间模块如下:

转化选择刚刚上面创建的ktr文件。

5 采用SQL的模式进行数据加载

配置如下:

创建目标表

CREATE TABLE t_housing_management (uuid string,user_uuid string,open_city_uuid string,building_uuid string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE;

 

大坑请注意:

 A)创建文件格式不对:

0: jdbc:hive2://master:2181,slave1:2181,slave> load data inpath '/tmp/data/t_housing_management.txt' overwrite into table t_housing_management;
Error: Error while compiling statement: FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table. (state=42000,code=40000)
0: jdbc:hive2://master:2181,slave1:2181,slave> drop table t_housing_management;
INFO  : Compiling command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035): drop table t_housing_management
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035); Time taken: 0.073 seconds
INFO  : Executing command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035): drop table t_housing_management
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035); Time taken: 0.234 seconds
INFO  : OK
No rows affected (0.355 seconds)

创建文件一定到设置为,以“;”为结束符的类型,存储格式也是txt模式;

可以通过终端命令行进行类型查看:

show create table t_housing_management;

B)加载权限不够

jdbc:hive2://master:2181,slave1:2181,slave> LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management;
INFO  : Compiling command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c): LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c); Time taken: 0.084 seconds
INFO  : Executing command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c): LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table userdb.t_housing_management from hdfs://master:8020/tmp/data/t_housing_management.txt
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Access denied: Unable to move source hdfs://master:8020/tmp/data/t_housing_management.txt to destination hdfs://master:8020/warehouse/tablespace/managed/hive/userdb.db/t_housing_management/base_0000001: Permission denied: user=hive, access=WRITE, inode="/tmp/data":admin:hdfs:drwxr-xr-x
INFO  : Completed executing command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c); Time taken: 0.501 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Access denied: Unable to move source hdfs://master:8020/tmp/data/t_housing_management.txt to destination hdfs://master:8020/warehouse/tablespace/managed/hive/userdb.db/t_housing_management/base_0000001: Permission denied: user=hive, access=WRITE, inode="/tmp/data":admin:hdfs:drwxr-xr-x (state=08S01,code=1)

su hdfs

hdfs dfs -chmod -R 777 /tmp/data/

猜你喜欢

转载自blog.csdn.net/weixin_42575806/article/details/115730826