mysql导入hive3.1.0中花了两天时间,弄的有点泪崩。
主要问题最新hive模式有些变化,另外hive本身直接导入非常低效,因此只能采用mysql导入hdfs,然后再加载到hive中。
先点击new transformation 生成一个ktr文件。
1 先创建表输入和Hadoop file output,详细如下:
2 配置Hadoop file output,详细如下图
点击测试,进行查看配置是否正确
不正确的情况:
1:core-site.xml,mapred-site.xml,yarn-site.xml,hive-site.xml,hdfs-site.xml缺少。需要去大数据平台找到对应的配置,然后进入pdi-ce-8.2.0.0-342\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\hdp30中进行替换
2:Windows电脑的hosts中没有配置对应的映射关系
C:\Windows\System32\drivers\etc\hosts
3 配置Hadoop file output的内容,详细如下
字段获取如下,先点击获取字段,然后设置最小宽度。重点注意:有些字段是int类型的,需要设置格式为#,如果是时间格式,需要选择标准时间。
所有配置完成后,点击执行,再次点开Hadoop file output,点击浏览,返回上一层,查找是否生成txt文件。如下图所示:
进一步终端中,切换为hdfs或者hive用户
su hdfs
说明数据库文件已经导出来了,存储为txt格式。
4 创建一个新的job任务,中间模块如下:
转化选择刚刚上面创建的ktr文件。
5 采用SQL的模式进行数据加载
配置如下:
创建目标表
CREATE TABLE t_housing_management (uuid string,user_uuid string,open_city_uuid string,building_uuid string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE;
大坑请注意:
A)创建文件格式不对:
0: jdbc:hive2://master:2181,slave1:2181,slave> load data inpath '/tmp/data/t_housing_management.txt' overwrite into table t_housing_management;
Error: Error while compiling statement: FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table. (state=42000,code=40000)
0: jdbc:hive2://master:2181,slave1:2181,slave> drop table t_housing_management;
INFO : Compiling command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035): drop table t_housing_management
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035); Time taken: 0.073 seconds
INFO : Executing command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035): drop table t_housing_management
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20210415135135_97e35ec2-33ed-44d8-aea2-198cc2675035); Time taken: 0.234 seconds
INFO : OK
No rows affected (0.355 seconds)
创建文件一定到设置为,以“;”为结束符的类型,存储格式也是txt模式;
可以通过终端命令行进行类型查看:
show create table t_housing_management;
B)加载权限不够
jdbc:hive2://master:2181,slave1:2181,slave> LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management;
INFO : Compiling command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c): LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c); Time taken: 0.084 seconds
INFO : Executing command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c): LOAD DATA INPATH '/tmp/data/t_housing_management.txt' OVERWRITE INTO TABLE t_housing_management
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table userdb.t_housing_management from hdfs://master:8020/tmp/data/t_housing_management.txt
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Access denied: Unable to move source hdfs://master:8020/tmp/data/t_housing_management.txt to destination hdfs://master:8020/warehouse/tablespace/managed/hive/userdb.db/t_housing_management/base_0000001: Permission denied: user=hive, access=WRITE, inode="/tmp/data":admin:hdfs:drwxr-xr-x
INFO : Completed executing command(queryId=hive_20210415135149_641f1f69-60ab-424e-be99-48ebe254a88c); Time taken: 0.501 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: Access denied: Unable to move source hdfs://master:8020/tmp/data/t_housing_management.txt to destination hdfs://master:8020/warehouse/tablespace/managed/hive/userdb.db/t_housing_management/base_0000001: Permission denied: user=hive, access=WRITE, inode="/tmp/data":admin:hdfs:drwxr-xr-x (state=08S01,code=1)
su hdfs
hdfs dfs -chmod -R 777 /tmp/data/