HIVE自定义函数之UDF,UDAF和UDTF

hive允许用户使用自定义函数解决hive 自带函数无法处理的逻辑。hive自定义函数只在当前线程内临时有效,可以使用shell脚本调用执行hive命令。

  • UDF

输入一行数据输出一行数据。 
解决问题描述 
想要比较两个逗号分隔的字符串是否相同。 
-使用方法 
如果ignoreNullFlag是1,则两个字符串都是空算相等,如果不是1,算不等

 
  1. add jar /home/mart_wzyf/zhuhongmei/plist_udf_udaf.jar;

  2. CREATE TEMPORARY FUNCTION compareStringBySplit AS 'com.jd.plist.udf.TestUDF';

  3. SELECT compareStringBySplit("22,11,33", "11,33,22",1) FROM scores;

  4. DROP TEMPORARY FUNCTION compareStringBySplit;

java代码中用户必须要继承UDF,且必须至少实现一个evalute方法

 
  1. package com.jd.plist.udf;

  2. import org.apache.commons.lang.StringUtils;

  3. import org.apache.hadoop.hive.ql.exec.UDF;

  4. public class TestUDF extends UDF {

  5. private static final int MATCH = 1;

  6. private static final int NOT_MATCH = 0;

  7. /**

  8. * 入参3个。

  9. * @param aids

  10. * @param bids

  11. * @param ignoreNullFlag

  12. * @return

  13. */

  14. public int evaluate(String aids, String bids, int ignoreNullFlag) {

  15. if (StringUtils.isBlank(aids) && StringUtils.isBlank(bids)) {

  16. if (ignoreNullFlag == 1) {

  17. return MATCH;

  18. } else {

  19. return NOT_MATCH;

  20. }

  21. } else if (StringUtils.isBlank(aids) && !StringUtils.isBlank(bids)) {

  22. return NOT_MATCH;

  23. } else if (!StringUtils.isBlank(aids) && StringUtils.isBlank(bids)) {

  24. return NOT_MATCH;

  25. } else {

  26. String[] aidArray = aids.split(",");

  27. String[] bidArray = bids.split(",");

  28. for (String aid : aidArray) {

  29. boolean exist = false;

  30. for (String bid : bidArray) {

  31. if (aid.equals(bid)) {

  32. exist = true;

  33. }

  34. }

  35. if (!exist) {

  36. return NOT_MATCH;

  37. }

  38. }

  39. return MATCH;

  40. }

  41. }

  42. }


  • UDAF 
    输入多行数据输出一行数据,一般在group by中使用。 
    解决问题描述 
    自己实现将相同主id下的子id用逗号拼接 
    使用方法
 
  1. add jar /home/mart_wzyf/zhuhongmei/plist_udf_udaf-0.0.1.jar;

  2. CREATE TEMPORARY FUNCTION concat_sku_id AS 'com.jd.plist.udaf.TestUDAF';

  3. select concat_sku_id(item_sku_id,',') from app.app_cate3_sku_info where dt =sysdate(-1) and item_third_cate_cd = 870 group by main_sku_id;

  4. DROP TEMPORARY FUNCTION concat_sku_id;

java代码 
Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数 
init初始化,iterate函数处理读入的行数据,terminatePartial返回iterate处理的中建结果,merge合并上述处理结果,terminate返回最终值。

 
  1. package com.jd.plist.udaf;

  2.  
  3. import org.apache.hadoop.hive.ql.exec.UDAF;

  4. import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

  5. public class TestUDAF extends UDAF {

  6. public static class TestUDAFEvaluator implements UDAFEvaluator {

  7. public static class PartialResult {

  8. String skuids;

  9. String delimiter;

  10. }

  11.  
  12. private PartialResult partial;

  13.  
  14. public void init() {

  15. partial = null;

  16. }

  17.  
  18. public boolean iterate(String item_sku_id, String deli) {

  19.  
  20. if (item_sku_id == null) {

  21. return true;

  22. }

  23. if (partial == null) {

  24. partial = new PartialResult();

  25. partial.skuids = new String("");

  26. if (deli == null || deli.equals("")) {

  27. partial.delimiter = new String(",");

  28. } else {

  29. partial.delimiter = new String(deli);

  30. }

  31.  
  32. }

  33. if (partial.skuids.length() > 0) {

  34. partial.skuids = partial.skuids.concat(partial.delimiter);

  35. }

  36.  
  37. partial.skuids = partial.skuids.concat(item_sku_id);

  38.  
  39. return true;

  40. }

  41.  
  42. public PartialResult terminatePartial() {

  43. return partial;

  44. }

  45.  
  46. public boolean merge(PartialResult other) {

  47. if (other == null) {

  48. return true;

  49. }

  50. if (partial == null) {

  51. partial = new PartialResult();

  52. partial.skuids = new String(other.skuids);

  53. partial.delimiter = new String(other.delimiter);

  54. } else {

  55. if (partial.skuids.length() > 0) {

  56. partial.skuids = partial.skuids.concat(partial.delimiter);

  57. }

  58. partial.skuids = partial.skuids.concat(other.skuids);

  59. }

  60. return true;

  61. }

  62.  
  63. public String terminate() {

  64. return new String(partial.skuids);

  65. }

  66.  
  67. }

  • UDTF 
    udtf用来实现一行输入多行输出 
    用途 
    将字符串(key1:20;key2:30;key3:40)按照分好拆分行按照冒号拆分列进行展示。 
    使用方法
 
  1. add jar /home/mart_wzyf/zhuhongmei/plist_udf_udaf-0.0.3.jar;

  2. CREATE TEMPORARY FUNCTION explode_map AS 'com.jd.plist.udtf.TestUDTF';

  3. select explode_map(mapstrs) as (col1,col2) from app.app_test_zhuzhu_maps;

  4. DROP TEMPORARY FUNCTION explode_map;

java代码 
initialize初始化校验参数是否正确。process处理返回结果。forward将结果返回

 
  1. package com.jd.plist.udtf;

  2. import java.util.ArrayList;

  3. import org.apache.hadoop.hive.ql.exec.UDFArgumentException;

  4. import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;

  5. import org.apache.hadoop.hive.ql.metadata.HiveException;

  6. import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;

  7. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;

  8. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;

  9. import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;

  10. import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

  11. public class TestUDTF extends GenericUDTF {

  12.  
  13. @Override

  14. public void close() throws HiveException {

  15. // TODO Auto-generated method stub

  16.  
  17. }

  18.  
  19. @Override

  20. public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {

  21. if (args.length != 1) {

  22. throw new UDFArgumentLengthException("ExplodeMap takes only one argument");

  23. }

  24. if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {

  25. throw new UDFArgumentException("ExplodeMap takes string as a parameter");

  26. }

  27.  
  28. ArrayList<String> fieldNames = new ArrayList<String>();

  29. ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

  30. fieldNames.add("col1");

  31. fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

  32. fieldNames.add("col2");

  33. fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

  34.  
  35. return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);

  36. }

  37.  
  38. @Override

  39. public void process(Object[] args) throws HiveException {

  40. String input = args[0].toString();

  41. String[] test = input.split(";");

  42. for (int i = 0; i < test.length; i++) {

  43. try {

  44. String[] result = test[i].split(":");

  45. forward(result);

  46. } catch (Exception e) {

  47. continue;

  48. }

  49. }

  50.  
  51. }

  52. }

-注意UDTF使用 
UDTF有两种使用方法,一种直接放到select后面,一种和lateral view一起使用。 
1:直接select中使用

select explode_map(properties) as (col1,col2) from src;
  •  

不可以添加其他字段使用

select a, explode_map(properties) as (col1,col2) from src
  •  

不可以嵌套调用

select explode_map(explode_map(properties)) from src
  •  

不可以和group by/cluster by/distribute by/sort by一起使用

select explode_map(properties) as (col1,col2) from src group by col1, col2
  •  

2:和lateral view一起使用

select src.id, mytable.col1, mytable.col2 from src lateral view explode_map(properties) mytable as col1, col2;
  •  

此方法更为方便日常使用。执行过程相当于单独执行了两次抽取,然后union到一个表里。

猜你喜欢

转载自blog.csdn.net/u011500419/article/details/88562192