1, UDF definition
- UDF (User-Defined Functions) hive that is user-defined functions. hive comes with a function and can not fully meet the business needs, then we need a custom function
UDF classification
- UDF: one to one, came out a a, row mapping. Is a row-level operations, such as: upper, substr function
- UDAF: many to one, came out more than a, row mapping. Is a row-level operations, such as sum / min.
- UDTF: one to many, came out of a multiple. As alteral view and explode
These three categories, we are only a function of rewrite UDF class
2, write UDF function
(1) pom configuration file
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.wsk.bigdata</groupId> <artifactId>g6-hadoop</artifactId> <version>1.0</version> <name>g6-hadoop</name> <properties> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <-! Add CDH warehouse -> </ the Properties> <hadoop.version> cdh5.7.0-2.6.0 </hadoop.version> <hive.version>1.1.0-cdh5.7.0</hive.version> <repositories> <repository> <id>nexus-aliyun</id> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </repository> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> </repository> </repositories> <dependencies> <!--添加Hadoop的依赖--> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <!--添加hive依赖--> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>${hive.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.4</version> <configuration> <source>1.7</source> <target>1.7</target> <encoding>UTF-8</encoding> </configuration> </plugin> </plugins> </build> </project>
(2) UDF functions written
(3) jar packaging
(4) Upload the jar package
[hadoop@hadoop001 lib]$ rz [hadoop@hadoop001 lib]$ ll g6-hadoop-1.0.jar -rw-r--r--. 1 hadoop hadoop 11447 Apr 19 2019 g6-hadoop-1.0.jar
Note: If the jar package is uploaded to the $ HIVE_HOME / lib / directory or less, you do not need to run add up
Add jar package to the hive
Syntax: add jar + directory of a jar where / jar package name
hive> add jar /home/hadoop/data/hive/g6-hadoop-1.0.jar;
(5) create UDF function in the hive
Creating a temporary function ----- only to the current black window
Class_name AS TEMPORARY FUNCTION function_name the CREATE; function_name function names package behind the ******* class_name classpath, package name + class name ********* Here is the first line of the function you write UDF the name of something and then add a point add a class
Example:
hive>CREATE TEMPORARY FUNCTION HelloUDF AS 'org.apache.hadoop.hive.ql.udf.HelloUDF'; OK Time taken: 0.485 seconds hive> hive> show functions; 【查看可以看到HelloUDF】
Hive> SELECT HelloUDF ('17 '); the OK the Hello:. 17 metadata # mysql check in as a temporary function, it is not relevant metadata information mysql> SELECT * from funcs; Empty SET (0.11 sec)
Delete the temporary function:
- 语法:DROP TEMPORARY FUNCTION [IF EXISTS] function_name;
test
Hive> the DROP TEMPORARY the FUNCTION the IF EXISTS HelloUDF; the OK Time taken: 0.003 seconds The Hive> SELECT HelloUDF ('17 '); FAILED: SemanticException [10011 Error]: Line. 1: Invalid function. 7' HelloUDF ' ## is not actually deleted does not matter, to re-open a window
FUNCTION_NAME the AS TEMPORARY the FUNCTION class_name the CREATE the USING the JAR path; FUNCTION_NAME function name class_name class path, package name class name + path jar package path hdfs
The jar uploaded to the specified directory
[hadoop@hadoop hive-1.1.0-cdh5.7.0]$ hadoop fs -mkdir /lib [hadoop@hadoop hive-1.1.0-cdh5.7.0]$ hadoop fs -put /home/hadoop/data/hive/hive_UDF.jar /lib/ [hadoop@hadoop001 ~]$ hadoop fs -mkdir /lib [hadoop@hadoop001 ~]$ hadoop fs -ls /lib [hadoop@hadoop001 ~]$ hadoop fs -put ~/lib/g6-hadoop-1.0.jar /lib/ 把本地的jar上传到HDFS的/lib/目录下 [hadoop@hadoop001 ~]$ hadoop fs -ls /lib
Create a permanent function of UDF
CREATE FUNCTION HelloUDF AS 'org.apache.hadoop.hive.ql.udf.HelloUDF' USING JAR 'hdfs://hadoop001:9000/lib/g6-hadoop-1.0.jar'; #测试 hive> select HelloUDF("17") ; OK hello:17
Metadata mysql check in, test the function of the information has been registered to the metadata
mysql> select * from funcs; +---------+------------------------------+-------------+-------+------------+-----------+------------+------------+ | FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE | +---------+------------------------------+-------------+-------+------------+-----------+------------+------------+ | 1 |org.apache.hadoop.hive.ql.udf.HelloUDF | 1555263915 | 6 | HelloUDF | 1 | NULL | USER | +---------+------------------------------+-------------+-------+------------+-----------+------------+------------+
- Permanent function to create a window can be used in any restart function would still be able to use
Refer to the official website address: LanguageManual UDF