hive custom function udf, the use of hiveserver2

First write a custom function in idea, inherit the udf class, implement execute, and package the program.

In the hive shell, add the jar package---add jar /opt/hive/xxx.jar

Register function: create temporary function xxx as "com.bigdata.hive.udf.xxxUDF";

Use show functions ; all functions can be viewed.

The latest hive can create custom functions directly using directives:

create function xxx as “com.bigdata.hive.udf.xxxUDF” using jar ‘hdfs:///warehouse/user/hive/xxx.jar’

The jar package needs to be placed on hdfs.

Use beeline mode:

When hiveServer2 is enabled, enter the bin directory and open beeline---./beeline;

Link database:

!connect jdbc:hive2://name2:10000 hive

hostname fill in the address of hiveserver2

The username defaults to hive and the password is empty.

Compression format (saved in orc format and compressed with snappy):

stored as orc tblproperties("orc.compress"="SNAPPY")

 

The use of hive's transform:

For the shell operation directly in hive, you can also do not need to write UDF, you can run the script through the transform operator to perform function operation.

1. Write a Python script:

import sys
import datetime
for line in sys.stdin:
    value=line.split('\001')
    for word in value:
        print (word.upper())//Convert fields to lowercase

2. After entering the hive shell, add the script file:

add file /root/hive/lower.py

3. Call the transform operator to make the budget:

select transform(ykd018) using 'python lower.py' 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326068351&siteId=291194637