HIVE在UDAF中使用TreeMap的问题

由于业务需要，编写了一个自定义的聚合函数，实现一个自定义的累加器。

因为其中vlues需要排序，所以就使用了一个TreeMap，根据模板及《hadoop权威指南》中的说明，merge()方法，接受一个对象作为输入。这个对象的类型必须和terminatePartial()方法的返回类型一致。

我的代码片段如下：

private TreeMap<Long,ArrayList<Long>> operations=new TreeMap<Long,ArrayList<Long>>();
		@Override
		public void init() {
			// TODO Auto-generated method stub
			System.out.println("----------------------init:");
			operations=new TreeMap<Long,ArrayList<Long>>();
		}

        //iterate方法略

        public TreeMap<Long,ArrayList<Long>> terminatePartial()
        {
        	System.out.println("----------------------terminatePartial_operations:"+operations);
            return operations; ///TreeMap
        }

      //合并两个部分聚集值会调用这个方法，相当于reduce。//合并多个TreeMap
        public boolean merge(TreeMap<Long,ArrayList<Long>> other)
        {
        	System.out.println("----------------------merge_other1:"+other);
        	if(other==null){
        		return true;
        	}
        	System.out.println("----------------------merge_other2:"+other);
        	operations.putAll(other);
            return true;
        }

注意一下这两个地方：

调试的时候，报错如下：

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean com.iflyzunhong.udf.SkTransExp$SkTransExpArrayUDAFEvaluator.merge(java.util.TreeMap)  on object com.iflyzunhong.udf.SkTransExp$SkTransExpArrayUDAFEvaluator@61e3cf4d of class com.iflyzunhong.udf.SkTransExp$SkTransExpArrayUDAFEvaluator with arguments {
  
  {1=[57, 1546444800000], 2=[27, 4070966400000]}:java.util.HashMap} of size 1

注意一下红色方框的地方：

从报错信息可以看到merge方法的入参，并不是TreeMap，而是HashMap。按理说，这两个类应该是相互兼容的，但是在UDAF中却报错了。所以尝试将merge方法的入参类型修改的HashMap，测试通过。

/合并两个部分聚集值会调用这个方法，相当于reduce。//合并多个TreeMap
        public boolean merge(HashMap<Long,ArrayList<Long>> other)
        {
        	System.out.println("----------------------merge_other1:"+other);
        	if(other==null){
        		return true;
        	}
        	System.out.println("----------------------merge_other2:"+other);
        	operations.putAll(other);
            return true;
        }

总结，在hive的中，可能并不存在TreeMap，所以，在map和reduce传参时，需要将参数转换成HashMap，所以merge方法的参数类型需要做一些兼容性处理，将TreeMap改成HashMap。

HIVE在UDAF中使用TreeMap的问题

猜你喜欢