First put the jar package in hive lib under the classpath
extends udf
There can be multiple evaluate methods, which are overloaded and distinguished according to the type of parameter values passed in
Exporting the jar package idea is very troublesome, so change to eclipse decisively
hive下 add JAR xxx.jar;
hive> create temporary function functionName as 'the class name where the jar package is located';
select num xx(num) from p;
Processing json format files
ObjectMapper om =
new
ObjectMapper();
try
{
MovierateBean bean = om.readValue(jsonline,MovierateBean.
class
);
return
bean.toString();
}
catch
(Exception e){
return
(jsonline);
}
Hive's UDF and UDAF need to be written in the java language. Hive provides another way to achieve the purpose of custom UDF and UDAF, but the use method is simpler. This is TRANSFORM. The TRANSFORM language supports UDF-like functions through multiple languages.
Hive also provides two keywords MAP and REDUCE. But MAP and REDUCE can generally be understood as just aliases of TRANSFORM. It does not mean that it is generally called in the map phase or in the reduce phase. See the official website description for details.
We can use the following python script in place of the UDF function above:
The content of the server-side /opt/movie_trans.py script is as follows:
import
sys
import
datetime
import
json
for
line
in
sys.stdin:
#line='{"movie":"2797","rate":"4","timeStamp":"978302039","uid":"1"}'
line
=
line.strip()
hjson
=
json.loads(line)
movie
=
hjson[
'movie'
]
rate
=
hjson[
'rate'
]
timeStamp
=
hjson[
'timeStamp'
]
uid
=
hjson[
'uid'
]
timeStamp
=
datetime.datetime.fromtimestamp(
float
(timeStamp))
print
'\t'
.join([movie, rate,
str
(timeStamp),uid])
|
Execute the following script in hive:
ADD
FILE
/
opt
/
movie_trans.py;
SELECT
TRANSFORM (rate)
USING
'python movie_trans.py'
AS (movie,rate, timeStamp, uid)
FROM rating;