Hive's custom functions include three types: UDF, UDAF, and UDTF
1. UDF is a single-line function
When customizing, you need to inherit the UDF class, and then implement the evaluate method.
Code example:
package test; import java.util.ArrayList; import org.apache.hadoop.hive.ql.exec.UDF; public class ConnStr2 extends UDF{ / / Input two arrays, output the splicing of the corresponding positions of the two arrays, the length of the input arrays is required to be the same //For example: (['a','b','c'],[1,2,3]) -->['a-1','b-2','c-3'] public ArrayList<String> evaluate(ArrayList<String> f1,ArrayList<String> f2) { ArrayList<String> re = new ArrayList<>(); for(int i=0;i<f1.size();i++){ String rr = f1.get(i)+'-'+f2.get(i); re.add(rr); } return re; } }
Upload the jar package to the server
Add the jar package to hive's
classpathhive>add JAR /home/hadoop/hivejar/udf.jar;
View the added jar command: hive> list jar;
Create a temporary function associated with the developed class
hive>create temporary function connstr as 'test.Connstr2';
At this point, you can use custom functions in hql
select connstr(name),age from student2. UDAF is an aggregate function:
Need to implement the class AbstractGernericUDAFResolver, and then the inner class implements GenericUDAFEvaluator
Detailed principle reference: https://blog.csdn.net/kent7306/article/details/50110067
Write code according to the principle to realize the maximum character length in different columns:
package test; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.parse.SemanticException; import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFAverage.AbstractGenericUDAFAverageEvaluator; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.ObjectInspectorOptions; import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; import org.apache.hadoop.io.IntWritable; // Find the maximum length of a column of characters public class Max_udaf extends AbstractGenericUDAFResolver{ @Override public GenericUDAFEvaluator getEvaluator(TypeInfo[] info) throws SemanticException { if (info.length != 1) { throw new UDFArgumentTypeException(info.length - 1, "Exactly one argument is expected."); } ObjectInspector oi = TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(info[0]); if (oi.getCategory() != ObjectInspector.Category.PRIMITIVE){ throw new UDFArgumentTypeException(0, "Argument must be PRIMITIVE, but " + oi.getCategory().name() + " was passed."); } PrimitiveObjectInspector inputOI = (PrimitiveObjectInspector) oi; if (inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING){ throw new UDFArgumentTypeException(0, "Argument must be String, but " + inputOI.getPrimitiveCategory().name() + " was passed."); } return new My_max_udaf(); } public static class My_max_udaf extends GenericUDAFEvaluator{ PrimitiveObjectInspector inputOI; ObjectInspector outputOI; PrimitiveObjectInspector integerOI; int maxval = 0; @Override public ObjectInspector init(Mode m, ObjectInspector[] parameters) throws HiveException { //Assertion function, if false, throw an exception assert (parameters.length == 1); super.init(m, parameters); if (m == Mode.PARTIAL1 || m == Mode.COMPLETE) { inputOI = (PrimitiveObjectInspector) parameters [0]; } else { //The rest of the stage, the input is Integer basic data format integerOI = (PrimitiveObjectInspector) parameters[0]; } // Specify that the output data format of each stage is Integer type outputOI = ObjectInspectorFactory.getReflectionObjectInspector(Integer.class, ObjectInspectorOptions.JAVA); return outputOI; } /** * Store the current maximum character length */ static class LetterMaxLen implements AggregationBuffer { int maxv = 0; void getmax(int num){ maxv = num; } } @Override public AggregationBuffer getNewAggregationBuffer() throws HiveException { LetterMaxLen result = new LetterMaxLen(); return result; } @Override public void reset(AggregationBuffer agg) throws HiveException { LetterMaxLen mymax = new LetterMaxLen(); } private boolean warned = false; @Override public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException { assert(parameters.length == 1); if (parameters[0] != null) { LetterMaxLen mymax =(LetterMaxLen) agg; Object p1 = ((PrimitiveObjectInspector)inputOI).getPrimitiveJavaObject(parameters[0]); mymax.getmax(String.valueOf(p1).length()); } } @Override public Object terminatePartial(AggregationBuffer agg) throws HiveException { LetterMaxLen mymax =(LetterMaxLen)agg; // logic code implementation if (maxval < mymax.maxv) { maxval = mymax.maxv; } return maxval; } @Override public void merge(AggregationBuffer agg, Object partial) throws HiveException { if (partial != null) { LetterMaxLen mymax1 = (LetterMaxLen)agg; Integer partialMaxV = (Integer) integerOI.getPrimitiveJavaObject(partial); LetterMaxLen mymax2 = new LetterMaxLen(); mymax2.getmax(partialMaxV); mymax1.getmax(mymax2.maxv); } } @Override public Object terminate(AggregationBuffer agg) throws HiveException { LetterMaxLen mymax = (LetterMaxLen)agg; maxval = mymax.maxv; return mymax.maxv; } } }
Also package and upload, create a temporary function mymax, and test it
Test Data:
+---------------+---------------+--------------+ | exam1.name | exam1.course | exam1.score | +---------------+---------------+--------------+ | huangbo | math | 81 | | huangbo | english | 87 | | huangbo | computer | 57 | | xuzheng | math | 89 | | xuzheng | english | 92 | | xuzheng | computer | 83 | | wangbaoqiang | math | 78 | | wangbaoqiang | english | 88 | | wangbaoqiang | computer | 90 | | dengchao | math | 88 | | dengchao | computer | 58 | +---------------+---------------+--------------+
hiveSQL statement:
select mymax(course) from exam1;
search result:
+---------------+------+ | name | len | +---------------+------+ | dengchao | 8 | | huangbo | 8 | | wangbaoqiang | 8 | | xuzheng | 8 | +---------------+------+
3. UDTF, a list generator, can turn one row of data into multiple columns of data
Need to inherit GenericUDTF
Implement the following three methods
//In this method, we will specify input and output parameters: ObjectInspector for input parameters and StructObjectInspector for output parameters abstract StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException; //We will process one input record and output several result records abstract void process(Object[] record) throws HiveException; //This method will be called when there is no record processing, to clean up the code or generate additional output abstract void close() throws HiveException;
Case requirements:
+--------------+--------------------+------------------------+-----------------+ | exam2_22.id | exam2_22.username | exam2_22.course | exam2_22.score | +--------------+--------------------+------------------------+-----------------+ | 1 | huangbo | math,computer,english | 34,58,58 | | 2 | xuzheng | math,computer,english | 45,87,45 | | 3 | wangbaoqiang | math,computer,english | 76,34,89 | +--------------+--------------------+------------------------+-----------------+Display the courses and grades separately in the table, and get the results as follows:
+-----+---------------+------------+-----------+ | id | username | source | score | +-----+---------------+------------+-----------+ | 1 | huangbo | math | 34 | | 1 | huangbo | computer | 58 | | 1 | huangbo | english | 58 | | 2 | xuzheng | math | 45 | | 2 | xuzheng | computer | 87 | | 2 | xuzheng | english | 45 | | 3 | wangbaoqiang | math | 76 | | 3 | wangbaoqiang | computer | 34 | | 3 | wangbaoqiang | english | 89 | +-----+---------------+------------+-----------+
Code:
package test; import java.util.ArrayList; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; public class My_Udtf extends GenericUDTF{ @Override public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { if (argOIs.length != 1) { throw new UDFArgumentLengthException("ExplodeMap takes only one argument"); } if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) { throw new UDFArgumentException("ExplodeMap takes string as a parameter"); } ArrayList<String> fieldNames = new ArrayList<String>(); ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(); // list fieldNames.add("cource"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); fieldNames.add("score"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs); } @Override public void process(Object[] args) throws HiveException { //split logic String input = args[0].toString(); String[] split = input.split("-"); String[] s1 = split[0].split(","); String[] s2 = split[1].split(","); // form multiple columns for(int i= 0; i< s1.length; i++){ String[] res ={s1[i],s2[i]}; forward(res); } } @Override public void close() throws HiveException { // TODO Auto-generated method stub } }
Package and upload to form a temporary function myudtf
execute hiveSQL
select id,username ,ss.* from exam2_22 lateral view myudtf(concat_ws('-',course,score)) ss as course,score;
got the answer:
+-----+---------------+------------+-----------+ | id | username | ss.course | ss.score | +-----+---------------+------------+-----------+ | 1 | huangbo | math | 34 | | 1 | huangbo | computer | 58 | | 1 | huangbo | english | 58 | | 2 | xuzheng | math | 45 | | 2 | xuzheng | computer | 87 | | 2 | xuzheng | english | 45 | | 3 | wangbaoqiang | math | 76 | | 3 | wangbaoqiang | computer | 34 | | 3 | wangbaoqiang | english | 89 | +-----+---------------+------------+-----------+