Hive自定义函数（UDF）的编写及使用

前提条件：

1. 安装好hadoop2.7.3（LInux环境下）

2. 安装好Hive2.3.3，参考：Hive的安装配置

3. 安装好Maven（Windows环境下）参考：Maven安装

4. 安装好eclipse（Windows环境下）

Hive自定义函数（UDF），UDF是user-defined function的简写。虽然Hive内置了许多函数供使用，但有时还是满足不了我们实际项目开发的需求，可通过编写自定义函数满足这一需求。

该如何编写和使用Hive的自定义函数呢？

1.建立Maven工程：myhive

2.编辑pom.xml,在</dependencies>之前添加如下语句：

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-exec</artifactId>
    <version>2.3.3</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.7.3</version>
</dependency>

3.新建自定义函数类

MyConcatString.java

4.编写代码：

package com.myhive;

import org.apache.hadoop.hive.ql.exec.UDF;

public class MyConcatString extends UDF{

	//必须重写一个方法，方法的名字必须叫：evaluate
	public String evaluate(String a,String b){
		return a+"*******"+b;
	}
}

5.打包工程：

复制工程所在路径：右键工程-->Properties-->Resource-->找到location，复制E:\EclipsePro\Hive\myhive

在cmd下切换到工程所在目录：

切换到E盘：>e:

e:

切换到工程目录：> cd E:\EclipsePro\Hive\myhive

cd E:\EclipsePro\Hive\myhive

执行打包命令：

mvn clean package

打包成功出现如下图：

6.上传打包生产的jar文件到Linux目录下

这里使用winscp工具上传：

Linux的ls命令查看已上传成功，如下图：

$ ls

7. 进入hive命令行

$ hive

8. 添加自定义函数的jar文件到hive的类路径：

hive> add jar /home/hadoop/jarfile/myhive-0.0.1-SNAPSHOT.jar;

9. 使用用户自定义的函数创建一个临时函数：

hive> create temporary function myconcat as 'com.myhive.MyConcatString';

10. 测试数据准备：

创建一张hive托管表t1表：

hive> create table t1(ename string, mgr string) row format delimited fields terminated by ',';

开启一个新的终端，在Linux系统下新建一个test.txt文件

$ nano test.txt

$ nano test.txt

填入以下内容：

hello,word
hello,hadoop
hello,hive

保存，退出。

回到hive命令行，将test.txt的内容加载到输入t1表：

hive> load data local inpath '/home/hadoop/jarfile/test.txt' into table t1;

注意：/home/hadoop/jarfile/test.txt为test.txt的绝对路径，需要根据实际情况修改。

查看t1表内容：

hive> select * from t1;

输出内容如下：

OK
hello   word
hello   hadoop
hello   hive
Time taken: 4.614 seconds, Fetched: 3 row(s)

11. 使用Hive自定义函数：

hive> select myconcat(ename,mgr) from t1;

输出内容如下：

OK
hello*******word
hello*******hadoop
hello*******hive
Time taken: 1.661 seconds, Fetched: 3 row(s)
hive>

如上输出了用******来拼接两个字符串，说明已经成功编辑及使用了Hive自定义函数。

完成！ enjoy it!