UDF
-
User-Defined-Function custom functions into a one;
-
background
- The system can not be built-in functions to solve real business problems, require developers to write their own functions to achieve their own business to achieve aspirations.
- Scenarios is very large, resulting in different business faced personalized achieve a lot, so udf really need.
-
significance
- Function expansion is resolved, greatly enriched customizable business needs.
- IO requirements - the problem to be solved
- in: out = 1: 1, only one record among the input data, and returns a processing result.
- Among the most common self-defined functions, like cos, sin, substring, indexof so is the case required
-
Implementation steps (Java UDF to create a custom class)
- A custom class java
- UDF class inheritance
- Rewrite evaluate methods
- Packed into a class project where the all-in-one jar package and uploaded to the machine where the hive
- Performs the Add operation hive jar, the jar is loaded into the classpath.
- In the hive to create a template function that can be used behind the name of the function calls the actual function udf
- hive sql like to call system function in the same function using udf
-
Code
- Functional requirements: to achieve when the input string when more than two characters, the extra characters in "..." to represent.
- The "12" returns "12", such as "123" Return "... 12"
- Custom class inherit the UDF, override evaluate method have been embodied in the code
import org.apache.hadoop.hive.ql.exec.UDF; /* * 功能:实现当输入字符串超过2个字符的时候,多余的字符以"..."来表示。 * 输入/输出:* 如“12”则返回“12”,如“123”返回“12..." */ public class ValueMaskUDF extends UDF{ public String evaluate(String input,int maxSaveStringLength,String replaceSign) { if(input.length()<=maxSaveStringLength){ return input; } return input.substring(0,maxSaveStringLength)+replaceSign; } public static void main(String[] args) { System.out.println(new ValueMaskUDF().evaluate("河北省",2,"..."));; } }
OUT OF
-
Custom udaf function self_count, achieve system udaf count function
-
Input / Output requirements - the problem to be solved
- in: out = n: 1, i.e. accept data input from among the N records, and returns a processing result.
- Among the most common self-defined functions, like count, sum, avg, max so is the case required
-
Implementation steps
- A custom class java
- UDAF class inheritance
- Defined inside a static class that implements the interface UDAFEvaluator
- Implementation init, iterate, terminatePartial, merge, terminate, a total of five methods. See graph
- Performs the Add operation hive jar, the jar is loaded into the classpath.
- In the hive to create a template function that can be used behind the name of the function calls the actual function udf
-
hive sql like to call system function in the same function using udaf
-
Business Test
Input:
输出:
- UDAF代码开发
import java.util.HashMap; import java.util.Map; import java.util.Set; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.log4j.Logger; /** * 实现多条数据合并成一条数据 */ // 主类继承UDAF public class StudentScoreAggUDAF extends UDAF { // 日志对象初始化 public static Logger logger = Logger.getLogger(StudentScoreAggUDAF.class); // 静态类实现UDAFEvaluator public static class Evaluator implements UDAFEvaluator { // 设置成员变量,存储每个统计范围内的总记录数 private Map<String, String> courseScoreMap; //初始化函数,map和reduce均会执行该函数,起到初始化所需要的变量的作用 public Evaluator() { init(); } // 初始化函数间传递的中间变量 public void init() { courseScoreMap = new HashMap<String, String>(); } //map阶段,返回值为boolean类型,当为true则程序继续执行,当为false则程序退出 public boolean iterate(String course, String score) { if (course == null || score == null) { return true; } courseScoreMap.put(course, score); return true; } /** * 类似于combiner,在map范围内做部分聚合,将结果传给merge函数中的形参mapOutput * 如果需要聚合,则对iterator返回的结果处理,否则直接返回iterator的结果即可 */ public Map<String, String> terminatePartial() { return courseScoreMap; } // reduce 阶段,用于逐个迭代处理map当中每个不同key对应的 terminatePartial的结果 public boolean merge(Map<String, String> mapOutput) { this.courseScoreMap.putAll(mapOutput); return true; } // 处理merge计算完成后的结果,即对merge完成后的结果做最后的业务处理 public String terminate() { return courseScoreMap.toString(); } } }
测试sql语句
select id,username,score_agg(course,score) from student_score group by id,username;
- 自定义udaf实现max:https://www.cnblogs.com/itxuexiwang/p/6263110.html
UDTF
- User-Defined Table-Generating Functions
- 要解决一行输入多行输出的问题,问题的应用场景不少
- 用udtf解决一行输入多行输出的不多,往往被lateral view explode+udf等替代实现,比直接用udtf会更简单、直接一些