Impala/Hive UDF编写

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/jeffiny/article/details/80449422

一、说明

    编写Impala  UDF和Hive UDF 其实是一回事;大致分为以下两种添加UDF;

    (1) 编写Hive的UDF后,登陆impala-shell ,invalidate metadata;

    (2) 编写impala 的UDF,指定UDF的jar包所在位置和返回值的类型;

二、编写hive UDF (按照永久的处理的,但是还是一个session结束后没有了;还是临时的)

hive 最低版本:0.13.0  

  2.1 编写UDF 对应的java类,本文以md5加密为例;

package com.nanine.md5.utils;

import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

import sun.misc.BASE64Encoder;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/*
 * @param string 
 * @ return the string under MD5
*/
public class Md5Utils  extends UDF {
	
	public  String getMd5Str(String str) throws NoSuchAlgorithmException, UnsupportedEncodingException{
		MessageDigest md5=MessageDigest.getInstance("MD5");
		BASE64Encoder base64en = new BASE64Encoder();
	    String newstr=base64en.encode(md5.digest(str.getBytes("utf-8")));
	    return newstr;
	}
	 public Text evaluate(Text s) throws NoSuchAlgorithmException, UnsupportedEncodingException {
	    if (s == null) { return null; }
	    return new Text(getMd5Str(s.toString()));
	 }    
}

2.2 打包.jar包:md5Utils.jar

2.3  把jar包放到hdfs路径下/或者本地某个目录:hdfs dfs -put md5Utils.jar  /

2.4  把jar包添加到classpath中;

        add  jar  hdfs:///md5Utils.jar;

2.5  创建hive md5函数;

  CREATE FUNCTION  default.mymd5 AS 'com.nanine.md5.utils.Md5Utils' using jar 'hdfs:///md5Utils.jar';

2.6 reload function;

2.7 hive 测试;

hive>  select mymd5(msisdn) from rong_getCustomers_data  limit 10;
converting to local hdfs:///md5Utils.jar
Added [/tmp/8315888d-2385-450a-89cf-698b692b38e9_resources/md5Utils.jar] to class path
Added resources: [hdfs:///md5Utils.jar]
OK
dSbzJoHdEK57WxGJEdrG4w==
WhUTClF0lDLiJTp6iKxXqg==
aFxT/6wJBnk0nBKRNJ7zMA==
G4oo6rN/GFCsOd4hzEy5fQ==
iYgsUMJyTvJ42EiJRDWPYg==
I+kU50AcLvo5aXrN0+X+LQ==
vJSo7sXzXpULVX7Gj0RySA==
0AMIyc3+1bH/dm88vuCzFw==
F9VqTHOFr596cGEq4vJu/w==
6vSFVDrim8UM4Mxh4617Yw==
Time taken: 1.426 seconds, Fetched: 10 row(s)

2.8 impala测试;

    invalidate metadata;

[evercloud113:21000] > select mymd5(msisdn) from rong_getCustomers_data  limit 10;
Query: select mymd5(msisdn) from rong_getCustomers_data  limit 10
+--------------------------+
| default.mymd5(msisdn)    |
+--------------------------+
| dSbzJoHdEK57WxGJEdrG4w== |
| WhUTClF0lDLiJTp6iKxXqg== |
| aFxT/6wJBnk0nBKRNJ7zMA== |
| G4oo6rN/GFCsOd4hzEy5fQ== |
| iYgsUMJyTvJ42EiJRDWPYg== |
| I+kU50AcLvo5aXrN0+X+LQ== |
| vJSo7sXzXpULVX7Gj0RySA== |
| 0AMIyc3+1bH/dm88vuCzFw== |
| F9VqTHOFr596cGEq4vJu/w== |
| 6vSFVDrim8UM4Mxh4617Yw== |
+--------------------------+
Fetched 10 row(s) in 4.95s

参考hive官方文档:点击打开链接     点击打开链接

三、编写impala UDF

版本最低:1.2 (Impala support for UDFs is available in Impala 1.2 and higher)

3.1  执行2.1-2.3;

3.2  创建md5函数

     create function mymd5(string) returns string location '/md5Utils.jar' symbol='com.nanine.md5.utils.Md5Utils';

3.3 查询

[evercloud113:21000] > select md5(msisdn) from rong_getCustomers_data  limit 10;  
Query: select md5(msisdn) from rong_getCustomers_data  limit 10
+----------------------------------+
| default.md5(msisdn)              |
+----------------------------------+
| 7526f32681dd10ae7b5b118911dac6e3 |
| 5a15130a51749432e2253a7a88ac57aa |
| 685c53ffac090679349c1291349ef330 |
| 1b8a28eab37f1850ac39de21cc4cb97d |
| 89882c50c2724ef278d8488944358f62 |
| 23e914e7401c2efa39697acdd3e5fe2d |
| bc94a8eec5f35e950b557ec68f447248 |
| d00308c9cdfed5b1ff766f3cbee0b317 |
| 17d56a4c7385af9f7a70612ae2f26eff |
| eaf485543ae29bc50ce0cc61e3ad7b63 |
+----------------------------------+
Fetched 10 row(s) in 0.16s

kill impala客户端,之后再启动impala也能正常用;

但是impala服务器重启之后需要重新指定路径;因此最好在脚本中预先执行一次次函数;

参考impala官方文档:点击打开链接





猜你喜欢

转载自blog.csdn.net/jeffiny/article/details/80449422