Hive之——UDF函数:根据日期计算星座

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/l1028386804/article/details/88530022

转载请注明出处:https://blog.csdn.net/l1028386804/article/details/88530022

下面是一个样本数据集,我们将其放到用户跟目录下一个名为littlebigdata.txt文件中:

edward capriolo,[email protected],1981-02-12,209.191.139.200,M,10
bob,[email protected],2004-10-10,10.10.10.1,M,50
sara connor,[email protected],1974-05-04,64.64.5.1,F,2

将样本数据载入到名为littlebigdata的表中:

create table if not exists littlebigdata(
name string,
email string,
bday string,
ip string,
gender string,
anum int)
row format delimited fields terminated by ',';

load data local inpath '/root/littlebigdata.txt' into table littlebigdata;

编写我们Java类,根据一个日期,输出日期对应的星座字符串

package com.lyz.hadoop.hive.udf;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
 
import java.util.Date;
 
/**
 * @author liuyazhuang
   * 此函数实现用户输入字符串格式为yyyy-MM-dd形式的日期,返回该用户的星座类型
 */
@Description(name = "zodiac_cn"
        , value = "_FUNC_(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac."
        , extended = "Example:\n > select _FUNC_(date_string) from src;\n > select _FUNC_(month, day) from src;")
public class UDFZodiacSignCn extends UDF {
    //日期的输入格式固定为:yyyy-MM-dd
	public final static DateTimeFormatter DEFAULT_DATE_FORMATTER = DateTimeFormat.forPattern("yyyy-MM-dd");
   
    private Text result = new Text();
 
    public UDFZodiacSignCn() {
    }
 
    public Text evaluate(Text birthday) {
        DateTime dateTime = null;
        try {
            dateTime = DateTime.parse(birthday.toString(), DEFAULT_DATE_FORMATTER);
        } catch (Exception e) {
            return null;
        }
 
        return evaluate(dateTime.toDate());
    }
 
    public Text evaluate(Date birthday) {
        DateTime dateTime = new DateTime(birthday);
        return evaluate(new IntWritable(dateTime.getMonthOfYear()), new IntWritable(dateTime.getDayOfMonth()));
    }
 
    public Text evaluate(IntWritable month, IntWritable day) {
        result.set(getZodiac(month.get(), day.get()));
        return result;
    }
 
    private String getZodiac(int month, int day) {
        String[] zodiacArray = {"魔羯座", "水瓶座", "双鱼座", "白羊座", "金牛座", "双子座", "巨蟹座", "狮子座", 
                                 "处女座", "天秤座", "天蝎座", "射手座"};
        int[] splitDay = {19, 18, 20, 20, 20, 21, 22, 22, 22, 22, 21, 21}; // 两个星座分割日
        int index = month;
        // 所查询日期在分割日之前,索引-1,否则不变
        if (day <= splitDay[month - 1]) {
            index = index - 1;
        } else if (month == 12) {
            index = 0;
        }
        // 返回索引指向的星座string
        return zodiacArray[index];
    }
    
    public static void main(String[] args) {
		UDFZodiacSignCn udfZodiacSignCn = new UDFZodiacSignCn();
		System.out.println("1990-11-02:     "+udfZodiacSignCn.evaluate(new Text("1990-11-02")));
		//错误格式的日期,返回值为null
		System.out.println(udfZodiacSignCn.evaluate(new Text("19901102")));
		System.out.println("2000-11-02:     "+udfZodiacSignCn.evaluate(new Text("2000-11-02")));
		System.out.println("2000-01-02:     "+udfZodiacSignCn.evaluate(new Text("2000-01-02")));
		
	}
}

注意:这里我新建的是Maven工程,依赖项如下:

<dependencies>
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-common</artifactId>
		<version>2.9.2</version>
	</dependency>
	
	<dependency>
		<groupId>org.apache.hive</groupId>
		<artifactId>hive-exec</artifactId>
		<version>3.1.1</version>
	</dependency>
</dependencies>

编写好这个类后通过eclipse导入成udf.jar

将这个jar上传到服务器的/usr/local/src目录下。
然后在Hive命令行中执行:

hive> add jar /usr/local/src/udf.jar
hive> create temporary function zodiac as 'com.lyz.hadoop.hive.udf.UDFZodiacSignCn';

注意:create function语句中的temporary关键字。当前会话中声明的函数只会在当前会话有效。因此用户需要在每个会话中都增加Jar然后创建函数。不过,如果用户频繁的使用同一个Jar文件和函数的话,可以将相关的语句增加到$HOME/.hiverc文件中。

hive> describe function zodiac;
OK
zodiac(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
Time taken: 0.017 seconds, Fetched: 1 row(s)

hive> describe function extended zodiac;
OK
zodiac(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
Example:
 > select zodiac(date_string) from src;
 > select zodiac(month, day) from src;
Function class:com.lyz.hadoop.hive.udf.UDFZodiacSignCn
Function type:TEMPORARY
Time taken: 0.017 seconds, Fetched: 6 row(s)

hive> select name, bday, zodiac(bday) from littlebigdata;
OK
edward capriolo 1981-02-12      水瓶座
bob     2004-10-10      天秤座
sara connor     1974-05-04      金牛座
Time taken: 0.137 seconds, Fetched: 3 row(s)

当使用完自定义UDF后,可以通过下面的命令删除函数:

hive> drop temporary function if exists zodiac;

 

猜你喜欢

转载自blog.csdn.net/l1028386804/article/details/88530022
0条评论
添加一条新回复