1. UDF: User-defined (ordinary) function, which only has an effect on a single line of values;
Inherit the UDF class and add the method evaluate()
/**
* @function 自定义UDF统计最小值
*
*/
public class Min extends UDF {
public Double evaluate(Double a, Double b) {
if (a == null)
a = 0.0;
if (b == null)
b = 0.0;
if (a >= b) {
return b;
} else {
return a;
}
}
}
2. UDAF: User-Defined Aggregation Funcation; user-defined aggregation function, which can have an effect on multiple rows of data; it is equivalent to the commonly used SUM() and AVG() in SQL, and is also an aggregation function;
Aggregate functions use:
SELECT store_name, SUM(sales)
FROM Store_Information
GROUP BY store_name
HAVING SUM(sales) > 1500
ORDER BY SUM(sales);
键字HAVING总要放在GROUP BY之后,ORDER BY之前
There are two ways to implement UDAF: simple and general:
- a. Simple UDAF causes performance loss due to the use of Java reflection, and some features cannot be used and have been deprecated;
- b. The other involves two classes: AbstractGenericUDAFResolver, GenericUDAFEvaluator;
- Inherit the UDAFResolver class and override the getEvaluator() method;
- Inherit the GenericUDAFEvaluator class and generate an instance to getEvaluator();
- In the GenericUDAFEvaluator class, override the init(), iterate(), terminatePartial(), merge(), terminate() methods;
Refer to: Introduction to hive udaf development and detailed explanation of operation process
Hive UDAF development detailed explanation
3. UDTF: User-Defined Table-Generating Functions, user-defined table-generating functions, used to solve the problem of inputting one line and outputting multiple lines;
Inherit the GenericUDTF class and rewrite the initialize (return the output row information: the number of columns, type), process, close three methods;
Please refer to: UDTF writing and using in hive (transfer) .
Example of hive0.13 udtf usage .
4. Other
Delete temporary function
drop temporary function toUpper;