What briefly UDF / UDAF / UDTF is their problem solving and application scenarios

UDF

  • User-Defined-Function custom functions into a one;

  • background

    • The system can not be built-in functions to solve real business problems, require developers to write their own functions to achieve their own business to achieve aspirations.
    • Scenarios is very large, resulting in different business faced personalized achieve a lot, so udf really need.
  • significance

    • Function expansion is resolved, greatly enriched customizable business needs.
    • IO requirements - the problem to be solved
      • in: out = 1: 1, only one record among the input data, and returns a processing result.
      • Among the most common self-defined functions, like cos, sin, substring, indexof so is the case required
  • Implementation steps (Java UDF to create a custom class)

    • A custom class java
    • UDF class inheritance
    • Rewrite evaluate methods
    • Packed into a class project where the all-in-one jar package and uploaded to the machine where the hive
    • Performs the Add operation hive jar, the jar is loaded into the classpath.
    • In the hive to create a template function that can be used behind the name of the function calls the actual function udf
    • hive sql like to call system function in the same function using udf
  • Code

    • Functional requirements: to achieve when the input string when more than two characters, the extra characters in "..." to represent.
    • The "12" returns "12", such as "123" Return "... 12"
    • Custom class inherit the UDF, override evaluate method have been embodied in the code
import org.apache.hadoop.hive.ql.exec.UDF; /* * 功能:实现当输入字符串超过2个字符的时候,多余的字符以"..."来表示。 * 输入/输出:* 如“12”则返回“12”,如“123”返回“12..." */ public class ValueMaskUDF extends UDF{ public String evaluate(String input,int maxSaveStringLength,String replaceSign) { if(input.length()<=maxSaveStringLength){ return input; } return input.substring(0,maxSaveStringLength)+replaceSign; } public static void main(String[] args) { System.out.println(new ValueMaskUDF().evaluate("河北省",2,"..."));; } } 

OUT OF

  • Custom udaf function self_count, achieve system udaf count function

  • Input / Output requirements - the problem to be solved

    • in: out = n: 1, i.e. accept data input from among the N records, and returns a processing result.
    • Among the most common self-defined functions, like count, sum, avg, max so is the case required
  • Implementation steps

    • A custom class java
    • UDAF class inheritance
    • Defined inside a static class that implements the interface UDAFEvaluator
    • Implementation init, iterate, terminatePartial, merge, terminate, a total of five methods. See graph
    • Performs the Add operation hive jar, the jar is loaded into the classpath.
    • In the hive to create a template function that can be used behind the name of the function calls the actual function udf
    • hive sql like to call system function in the same function using udaf


       
      Hive_UDAF five methods .png
  • Business Test

Input:


 
Enter .png

输出:


 
输出.png
  • UDAF代码开发
import java.util.HashMap; import java.util.Map; import java.util.Set; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.log4j.Logger; /** * 实现多条数据合并成一条数据 */ // 主类继承UDAF public class StudentScoreAggUDAF extends UDAF { // 日志对象初始化 public static Logger logger = Logger.getLogger(StudentScoreAggUDAF.class); // 静态类实现UDAFEvaluator public static class Evaluator implements UDAFEvaluator { // 设置成员变量,存储每个统计范围内的总记录数 private Map<String, String> courseScoreMap; //初始化函数,map和reduce均会执行该函数,起到初始化所需要的变量的作用 public Evaluator() { init(); } // 初始化函数间传递的中间变量 public void init() { courseScoreMap = new HashMap<String, String>(); } //map阶段,返回值为boolean类型,当为true则程序继续执行,当为false则程序退出 public boolean iterate(String course, String score) { if (course == null || score == null) { return true; } courseScoreMap.put(course, score); return true; } /** * 类似于combiner,在map范围内做部分聚合,将结果传给merge函数中的形参mapOutput * 如果需要聚合,则对iterator返回的结果处理,否则直接返回iterator的结果即可 */ public Map<String, String> terminatePartial() { return courseScoreMap; } // reduce 阶段,用于逐个迭代处理map当中每个不同key对应的 terminatePartial的结果 public boolean merge(Map<String, String> mapOutput) { this.courseScoreMap.putAll(mapOutput); return true; } // 处理merge计算完成后的结果,即对merge完成后的结果做最后的业务处理 public String terminate() { return courseScoreMap.toString(); } } } 

测试sql语句

select id,username,score_agg(course,score) from student_score group by id,username; 

UDTF

  • User-Defined Table-Generating Functions
  • 要解决一行输入多行输出的问题,问题的应用场景不少
  • 用udtf解决一行输入多行输出的不多,往往被lateral view explode+udf等替代实现,比直接用udtf会更简单、直接一些

Guess you like

Origin www.cnblogs.com/sx66/p/12039552.html