The UDF functions hive

1, UDF definition

  • UDF (User-Defined Functions) hive that is user-defined functions. hive comes with a function and can not fully meet the business needs, then we need a custom function

UDF classification

  1. UDF: one to one, came out a a, row mapping. Is a row-level operations, such as: upper, substr function
  2. UDAF: many to one, came out more than a, row mapping. Is a row-level operations, such as sum / min.
  3. UDTF: one to many, came out of a multiple. As alteral view and explode

These three categories, we are only a function of rewrite UDF class

2, write UDF function

(1) pom configuration file

Copy the code
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.wsk.bigdata</groupId>
<artifactId>g6-hadoop</artifactId>
<version>1.0</version>
<name>g6-hadoop</name>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
<-! Add CDH warehouse ->
</ the Properties>
<hadoop.version> cdh5.7.0-2.6.0 </hadoop.version>
<hive.version>1.1.0-cdh5.7.0</hive.version>
<repositories>
<repository>
<id>nexus-aliyun</id>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
<dependencies>
<!--添加Hadoop的依赖-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<!--添加hive依赖-->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.4</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
Copy the code

(2) UDF functions written

 

(3) jar packaging

 

 

 

(4) Upload the jar package

 [hadoop@hadoop001 lib]$ rz

[hadoop@hadoop001 lib]$ ll  g6-hadoop-1.0.jar

-rw-r--r--. 1 hadoop hadoop 11447 Apr 19 2019 g6-hadoop-1.0.jar

 

Note: If the jar package is uploaded to the $ HIVE_HOME / lib / directory or less, you do not need to run add up

Add jar package to the hive  

Syntax: add jar + directory of a jar where / jar package name

hive> add jar /home/hadoop/data/hive/g6-hadoop-1.0.jar;

(5) create UDF function in the hive

Creating a temporary function ----- only to the current black window

Copy the code
Class_name AS TEMPORARY FUNCTION function_name the CREATE;    

function_name function names                         

package behind the ******* class_name classpath, package name + class name ********* Here is the first line of the function you write UDF the name of something and then add a point add a class
Copy the code

 Example:

Copy the code
hive>CREATE TEMPORARY FUNCTION HelloUDF AS 'org.apache.hadoop.hive.ql.udf.HelloUDF';

 OK Time taken: 0.485 seconds

hive>

hive> show functions; 【查看可以看到HelloUDF】
Copy the code

 

test

Copy the code
Hive> SELECT HelloUDF ('17 '); 

the OK 

the Hello:. 17 

metadata # mysql check in as a temporary function, it is not relevant metadata information 

mysql> SELECT * from funcs; 

Empty SET (0.11 sec)
Copy the code

 

Delete the temporary function:

  • 语法:DROP TEMPORARY FUNCTION [IF EXISTS] function_name; 

test

Copy the code
Hive> the DROP TEMPORARY the FUNCTION the IF EXISTS HelloUDF; 

the OK 

Time taken: 0.003 seconds The 

Hive> SELECT HelloUDF ('17 '); 

FAILED: SemanticException [10011 Error]: Line. 1: Invalid function. 7' HelloUDF ' 

## is not actually deleted does not matter, to re-open a window
Copy the code

Create a permanent function

Copy the code
FUNCTION_NAME the AS TEMPORARY the FUNCTION class_name the CREATE the USING the JAR path; 

FUNCTION_NAME function name                         

class_name class path, 

package name class name + 

path jar package path hdfs
Copy the code

The jar uploaded to the specified directory

Copy the code
[hadoop@hadoop hive-1.1.0-cdh5.7.0]$ hadoop fs -mkdir /lib

[hadoop@hadoop hive-1.1.0-cdh5.7.0]$ hadoop fs -put /home/hadoop/data/hive/hive_UDF.jar /lib/

[hadoop@hadoop001 ~]$ hadoop fs -mkdir /lib
[hadoop@hadoop001 ~]$ hadoop fs -ls /lib
[hadoop@hadoop001 ~]$ hadoop fs -put ~/lib/g6-hadoop-1.0.jar  /lib/   把本地的jar上传到HDFS的/lib/目录下
[hadoop@hadoop001 ~]$ hadoop fs -ls /lib
Copy the code

Create a permanent function of UDF

Copy the code
CREATE FUNCTION HelloUDF AS 'org.apache.hadoop.hive.ql.udf.HelloUDF'

USING JAR 'hdfs://hadoop001:9000/lib/g6-hadoop-1.0.jar';

#测试 hive> select HelloUDF("17") ;

OK hello:17
Copy the code

Metadata mysql check in, test the function of the information has been registered to the metadata

Copy the code
mysql> select * from funcs;
+---------+------------------------------+-------------+-------+------------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+------------------------------+-------------+-------+------------+-----------+------------+------------+
| 1 |org.apache.hadoop.hive.ql.udf.HelloUDF | 1555263915 | 6 | HelloUDF | 1 | NULL | USER |
+---------+------------------------------+-------------+-------+------------+-----------+------------+------------+
Copy the code
  • Permanent function to create a window can be used in any restart function would still be able to use

Refer to the official website address: LanguageManual UDF

Guess you like

Origin www.cnblogs.com/guoyu1/p/12505506.html