在对hive 进行 select 查询的时候 我们可以编写 python 、php 、perl等脚本来进行相应的数据处理,我们要用到hive 的 transform 和 using。
在使用的时候容易报如图所示的错误:an error occurred when trying to close the Operator running your custom script.
hive> create table kv_data(line string);
OK
Time taken: 0.935 seconds
hive> load data local inpath '${env:HOME}/qhy/kv_data.txt' into table kv_data;
Loading data to table default.kv_data
Table default.kv_data stats: [numFiles=1, totalSize=49]
OK
Time taken: 1.174 seconds
hive> select * from kv_data;
OK
k1=v1,k2=v2
k4=v4,k5=v5,k6=k6
k7=v7,k7=v7,k3=v7
Time taken: 0.155 seconds, Fetched: 4 row(s)
hive> select transform(line)
> using 'perl split_kv.pl'
> as (key,value)
> from kv_data;
Query ID = hadoop_20181214115722_6bd27f59-f419-49f1-a1b0-1c33aefaf4f1
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1544754045260_0009, Tracking URL = http://master:8088/proxy/application_1544754045260_0009/
Kill Command = /home/hadoop/opt/software/hadoop-2.8.3/bin/hadoop job -kill job_1544754045260_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-12-14 11:57:46,033 Stage-1 map = 0%, reduce = 0%
2018-12-14 11:58:16,628 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1544754045260_0009 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1544754045260_0009_m_000000 (and more) from job job_1544754045260_0009
Task with the most failures(4):
-----
Task ID:
task_1544754045260_0009_m_000000
URL:
http://master:8088/taskdetails.jsp?jobid=job_1544754045260_0009&tipid=task_1544754045260_0009_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:210)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:192)
... 8 more
FAILED: Execution Error, return code 20003 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. An error occurred when trying to close the Operator running your custom script.
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
原因是因为没有把map脚本添加到分布式缓存中,因此会报错 metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script. 这种错误 。PS:这里的 路径为本地路径不是分布式HDFS 路径。
只要在hive上补充执行add file即可,如图所示:
hive> add file ${env:HOME}/qhy/split_kv.pl;
Added resources: [/home/hadoop/qhy/split_kv.pl]
hive> select transform(line)
> using 'perl split_kv.pl'
> as (key,value)
> from kv_data;
Query ID = hadoop_20181214142017_79c197f6-3b07-4b0b-9cad-f9f8be078cb9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1544754045260_0012, Tracking URL = http://master:8088/proxy/application_1544754045260_0012/
Kill Command = /home/hadoop/opt/software/hadoop-2.8.3/bin/hadoop job -kill job_1544754045260_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-12-14 14:20:36,236 Stage-1 map = 0%, reduce = 0%
2018-12-14 14:20:47,251 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec
MapReduce Total cumulative CPU time: 1 seconds 220 msec
Ended Job = job_1544754045260_0012
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.22 sec HDFS Read: 3527 HDFS Write: 48 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 220 msec
OK
k1 v1
k2 v2
k4 v4
k5 v5
k6 k6
k7 v7
k7 v7
k3 v7
Time taken: 30.884 seconds, Fetched: 8 row(s)
hive>