hive custom function

Hive's custom functions include three types: UDF, UDAF, and UDTF

1. UDF is a single-line function

When customizing, you need to inherit the UDF class, and then implement the evaluate method.

Code example:

package test;  
  
  
import java.util.ArrayList;  
  
import org.apache.hadoop.hive.ql.exec.UDF;  
  
public class ConnStr2 extends UDF{  
      
    / / Input two arrays, output the splicing of the corresponding positions of the two arrays, the length of the input arrays is required to be the same  
    //For example: (['a','b','c'],[1,2,3]) -->['a-1','b-2','c-3']  
    public ArrayList<String> evaluate(ArrayList<String> f1,ArrayList<String> f2) {  
        ArrayList<String> re = new ArrayList<>();  
  
        for(int i=0;i<f1.size();i++){  
            String rr = f1.get(i)+'-'+f2.get(i);  
            re.add(rr);  
              
        }  
        return re;  
    }  
  
}

Upload the jar package to the server

Add the jar package to hive's 

classpathhive>add JAR /home/hadoop/hivejar/udf.jar;

View the added jar command: hive> list jar;

Create a temporary function associated with the developed class

hive>create temporary function connstr as 'test.Connstr2';

At this point, you can use custom functions in hql

select connstr(name),age from student

2. UDAF is an aggregate function:

Need to implement the class AbstractGernericUDAFResolver, and then the inner class implements GenericUDAFEvaluator

Detailed principle reference: https://blog.csdn.net/kent7306/article/details/50110067

Write code according to the principle to realize the maximum character length in different columns:

package test;  
  
import org.apache.hadoop.hive.ql.exec.UDAF;  
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;  
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;  
import org.apache.hadoop.hive.ql.metadata.HiveException;  
import org.apache.hadoop.hive.ql.parse.SemanticException;  
import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;  
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFAverage.AbstractGenericUDAFAverageEvaluator;  
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;  
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode;  
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo;  
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;  
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;  
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.ObjectInspectorOptions;  
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;  
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;  
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;  
import org.apache.hadoop.io.IntWritable;  
// Find the maximum length of a column of characters  
public class Max_udaf extends AbstractGenericUDAFResolver{  
    @Override  
    public GenericUDAFEvaluator getEvaluator(TypeInfo[] info) throws SemanticException {  
        if (info.length != 1) {    
            throw new UDFArgumentTypeException(info.length - 1,    
                    "Exactly one argument is expected.");    
        }  
        ObjectInspector oi = TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(info[0]);  
  
        if (oi.getCategory() != ObjectInspector.Category.PRIMITIVE){    
            throw new UDFArgumentTypeException(0,    
                            "Argument must be PRIMITIVE, but "    
                            + oi.getCategory().name()    
                            + " was passed.");    
        }    
            
        PrimitiveObjectInspector inputOI = (PrimitiveObjectInspector) oi;    
            
        if (inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING){    
            throw new UDFArgumentTypeException(0,    
                            "Argument must be String, but "    
                            + inputOI.getPrimitiveCategory().name()    
                            + " was passed.");    
        }    
            
        return new My_max_udaf();    
          
    }  
    public static class My_max_udaf extends GenericUDAFEvaluator{  
          
        PrimitiveObjectInspector inputOI;  
        ObjectInspector outputOI;  
        PrimitiveObjectInspector integerOI;  
          
        int maxval = 0;  
          
          
  
        @Override  
        public ObjectInspector init(Mode m, ObjectInspector[] parameters) throws HiveException {  
            //Assertion function, if false, throw an exception  
            assert (parameters.length == 1);  
            super.init(m, parameters);  
              
            if (m == Mode.PARTIAL1 || m == Mode.COMPLETE) {    
                inputOI = (PrimitiveObjectInspector) parameters [0];    
            } else {    
            //The rest of the stage, the input is Integer basic data format    
                integerOI = (PrimitiveObjectInspector) parameters[0];    
            }    
    
             // Specify that the output data format of each stage is Integer type    
            outputOI = ObjectInspectorFactory.getReflectionObjectInspector(Integer.class,    
                    ObjectInspectorOptions.JAVA);    
            return outputOI;   
              
        }  
        /**  
         * Store the current maximum character length
         */    
        static class LetterMaxLen implements AggregationBuffer {    
            int maxv = 0;    
            void getmax(int num){    
                maxv = num;    
            }    
        }    
  
        @Override  
        public AggregationBuffer getNewAggregationBuffer() throws HiveException {  
            LetterMaxLen result = new LetterMaxLen();  
            return result;  
        }  
  
        @Override  
        public void reset(AggregationBuffer agg) throws HiveException {  
            LetterMaxLen mymax = new LetterMaxLen();  
              
        }  
        private boolean warned = false;  
  
        @Override  
        public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException {  
            assert(parameters.length == 1);  
              
            if (parameters[0] != null) {  
                LetterMaxLen mymax =(LetterMaxLen) agg;  
                Object p1 = ((PrimitiveObjectInspector)inputOI).getPrimitiveJavaObject(parameters[0]);  
                mymax.getmax(String.valueOf(p1).length());  
            }  
              
        }  
  
        @Override  
        public Object terminatePartial(AggregationBuffer agg) throws HiveException {  
            LetterMaxLen mymax =(LetterMaxLen)agg;  
            // logic code implementation  
            if (maxval < mymax.maxv) {  
                maxval = mymax.maxv;  
            }  
            return maxval;  
        }  
  
        @Override  
        public void merge(AggregationBuffer agg, Object partial) throws HiveException {  
            if (partial != null) {  
                LetterMaxLen mymax1 = (LetterMaxLen)agg;  
                Integer partialMaxV = (Integer) integerOI.getPrimitiveJavaObject(partial);  
                LetterMaxLen mymax2 = new LetterMaxLen();  
                mymax2.getmax(partialMaxV);  
                mymax1.getmax(mymax2.maxv);  
                  
            }  
              
        }  
  
        @Override  
        public Object terminate(AggregationBuffer agg) throws HiveException {  
            LetterMaxLen mymax = (LetterMaxLen)agg;  
            maxval = mymax.maxv;  
            return mymax.maxv;  
        }  
          
    }  
      
  
}  

Also package and upload, create a temporary function mymax, and test it

Test Data:

+---------------+---------------+--------------+  
|  exam1.name   | exam1.course  | exam1.score  |  
+---------------+---------------+--------------+  
| huangbo | math | 81 |  
| huangbo | english | 87 |  
| huangbo | computer | 57 |  
| xuzheng | math | 89 |  
| xuzheng | english | 92 |  
| xuzheng | computer | 83 |  
| wangbaoqiang  | math          | 78           |  
| wangbaoqiang | english | 88 |  
| wangbaoqiang | computer | 90 |  
| dengchao | math | 88 |  
| dengchao | computer | 58 |  
+---------------+---------------+--------------+

hiveSQL statement:

select mymax(course) from exam1;

search result:

+---------------+------+  
| name | len |  
+---------------+------+  
| dengchao | 8 |  
| huangbo | 8 |  
| wangbaoqiang | 8 |  
| xuzheng | 8 |  
+---------------+------+
3. UDTF, a list generator, can turn one row of data into multiple columns of data

Need to inherit GenericUDTF 

Implement the following three methods

//In this method, we will specify input and output parameters: ObjectInspector for input parameters and StructObjectInspector for output parameters  
abstract StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;   
  
//We will process one input record and output several result records  
abstract void process(Object[] record) throws HiveException;  
  
//This method will be called when there is no record processing, to clean up the code or generate additional output  
abstract void close() throws HiveException;

Case requirements:

+--------------+--------------------+------------------------+-----------------+  
| exam2_22.id  | exam2_22.username  |    exam2_22.course     | exam2_22.score  |  
+--------------+--------------------+------------------------+-----------------+  
| 1            | huangbo            | math,computer,english  | 34,58,58        |  
| 2            | xuzheng            | math,computer,english  | 45,87,45        |  
| 3            | wangbaoqiang       | math,computer,english  | 76,34,89        |  
+--------------+--------------------+------------------------+-----------------+
Display the courses and grades separately in the table, and get the results as follows:
+-----+---------------+------------+-----------+  
| id  |   username    | source     |    score  |  
+-----+---------------+------------+-----------+  
| 1 | huangbo | math | 34 |  
| 1 | huangbo | computer | 58 |  
| 1 | huangbo | english | 58 |  
| 2 | xuzheng | math | 45 |  
| 2 | xuzheng | computer | 87 |  
| 2 | xuzheng | english | 45 |  
| 3   | wangbaoqiang  | math       | 76        |  
| 3 | wangbaoqiang | computer | 34 |  
| 3 | wangbaoqiang | english | 89 |  
+-----+---------------+------------+-----------+

Code:

package test;  
  
import java.util.ArrayList;  
  
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;  
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;  
import org.apache.hadoop.hive.ql.metadata.HiveException;  
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;  
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;  
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;  
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;  
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;  
  
public class My_Udtf extends GenericUDTF{  
      
      
  
    @Override  
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {  
        if (argOIs.length != 1) {  
            throw new UDFArgumentLengthException("ExplodeMap takes only one argument");  
        }  
        if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {  
            throw new UDFArgumentException("ExplodeMap takes string as a parameter");  
        }  
          
        ArrayList<String> fieldNames = new ArrayList<String>();  
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();  
        // list  
        fieldNames.add("cource");  
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);  
        fieldNames.add("score");  
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);  
  
        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);  
          
  
    }  
  
  
    @Override  
    public void process(Object[] args) throws HiveException {  
        //split logic  
        String input = args[0].toString();  
        String[] split = input.split("-");  
              
        String[] s1 = split[0].split(",");  
        String[] s2 = split[1].split(",");  
            // form multiple columns  
        for(int i= 0; i< s1.length; i++){  
            String[] res ={s1[i],s2[i]};  
            forward(res);  
        }  
  
          
    }  
  
    @Override  
    public void close() throws HiveException {  
        // TODO Auto-generated method stub  
          
    }  
  
}  

Package and upload to form a temporary function myudtf

execute hiveSQL

select id,username ,ss.* from exam2_22 lateral view  myudtf(concat_ws('-',course,score)) ss as course,score;

got the answer:

+-----+---------------+------------+-----------+  
| id  |   username    | ss.course  | ss.score  |  
+-----+---------------+------------+-----------+  
| 1 | huangbo | math | 34 |  
| 1 | huangbo | computer | 58 |  
| 1 | huangbo | english | 58 |  
| 2 | xuzheng | math | 45 |  
| 2 | xuzheng | computer | 87 |  
| 2 | xuzheng | english | 45 |  
| 3   | wangbaoqiang  | math       | 76        |  
| 3 | wangbaoqiang | computer | 34 |  
| 3 | wangbaoqiang | english | 89 |  
+-----+---------------+------------+-----------+



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325169178&siteId=291194637