Summary of the UDF in the hive using python to do in

Mainly using transform statement hive provides.

1. Write a python script, script data python as follows (Reference: https: //dwgeek.com/hive-udf-using-python-use-python-script-into-hive-example.html/):

import sys
for line in sys.stdin:
    line = line.strip('\n\r')
    fname , lname = line.split('\t')
    firstname = fname.title()
    lastname = lname.title()
    print '\t'.join([firstname, lastname])

 

2. Increase in the hive CLI in python script, the script can be placed in a local directory server or hdfs using different add file statements in accordance with the position py script is located.

- in the local directory server 
the Add  File initCap.py 

- when HDFS 
the Add  File HDFS: /// tmp / initCap.py

 

Since our hive configured the sentry access control, it has been tested only hive CLI can execute add file, with beeline or other client will be reported when the problem is not authority, do not know the specific reasons.

 

3. Use the python script we just added in the hive sql, the sql example as follows:

select transform('abc\tdef') using 'python initCap.py' as (col_name,khjd);

 

 

the above.

Guess you like

Origin www.cnblogs.com/vanwoos/p/12667515.html