hive 学习系列四(UDF)

如果入参是简单的数据类型,直接继承UDF,实现一个或者多个evaluate 方法。

具体流程如下:

1,实现大写字符转换成小写字符的UDF

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class Lower extends UDF {
    public Text evaluate(final Text s) {
        if (s == null) {
            return null;
        }
        return new Text(s.toString().toLowerCase());
    }
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

2,打包成jar 包。

建立maven 项目,使用maven 打包。
这里打包成的jar 包是,hiveudf-1.0.0.jar

3,上传到hdfs 路径上。

[root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes where applicable
[root@master /opt]# hadoop fs -ls /user/hive/udf 
18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
 for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
[root@master /opt]#
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

4, 在Hive 命令行里面创建函数。

add jar hdfs:////udf/hiveudf-1.0.0.jar;
create temporary function lower as 'com.example.hive.udf.Lower';

hive> delete jar  hiveudf-1.0.0.jar;
hive> list jars
    > ;
hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar
    > ;
Added [/tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar] to class path
Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
hive> list jars;
/tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar
hive> create temporary function lower as 'com.example.hive.udf.Lower';
OK
Time taken: 0.594 seconds
hive> 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

5,然后就可以用这个注册的函数了。

hive> select lower('AbcDEfg')
    > ;
OK
abcdefg
Time taken: 1.718 seconds, Fetched: 1 row(s)
hive> 

  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

至于入参是复杂数据类型,比如Array 等, 可以继承GenericUDF

1,同样的,先写一个类,继承GenericUDF,

此自定义函数实现的是,把一个点,根据经纬度,转换成一个字符串。

package com.zbra.udf;


import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

/**
 * 针对复杂数据
 */
public class GeoUdf extends GenericUDF {

    private DoubleObjectInspector doubleObjectInspector01;
    private DoubleObjectInspector doubleObjectInspector02;

    public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
        if (objectInspectors.length != 2) {
            throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: String,  String");
        }
        // 1. 检查是否接收到正确的参数类型
        ObjectInspector a = objectInspectors[0];
        ObjectInspector b = objectInspectors[1];
        if (!(a instanceof DoubleObjectInspector) || !(b instanceof DoubleObjectInspector)) {
            throw new UDFArgumentException("first argument must be a double, second argument must be a double");
        }

        this.doubleObjectInspector01 = (DoubleObjectInspector) a;
        this.doubleObjectInspector02 = (DoubleObjectInspector) b;

        return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
    }

    public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {

        Double lat = this.doubleObjectInspector01.get(deferredObjects[0].get());
        Double lng = this.doubleObjectInspector02.get(deferredObjects[1].get());

        if (lat == null || lng == null) {
            return new String("");
        }

        return new GeoHash(lat, lng).getGeoHashBase32();
    }

    public String getDisplayString(String[] strings) {
        if (strings.length == 2) {
            return "geo_hash(" + strings[0] + ", " + strings[1] + ")";
        } else {
            return "传入的参数不对...";
        }
    }
}
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

2,打包成jar 包

本文中打包成hiveudf-1.0.0.jar

3,同样的上传到hdfs 路径中

[root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes where applicable
[root@master /opt]# hadoop fs -ls /user/hive/udf 
18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
 for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
[root@master /opt]#
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

4, 创建自定义函数。

hive> list jars;
/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
hive> delete jar /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
    > ;
Deleted [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] from class path
hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar;
Added [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] to class path
Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
hive> create temporary function geohash as 'com.zbra.udf.GeoUdf';
OK
Time taken: 0.145 seconds
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

5, 使用如下:

hive> select geohash(12.0d, 123.0d);
OK
wdpkqbtc
Time taken: 0.8 seconds, Fetched: 1 row(s)
hive> select geohash(cast('12' as Double), cast('123' as Double));
OK
wdpkqbtc
Time taken: 0.733 seconds, Fetched: 1 row(s)
hive> 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

猜你喜欢

转载自blog.csdn.net/weixin_42177380/article/details/90713018