大数据开发之Hive篇15-Hive之hpl/sql编程

备注:
Hive 版本 2.1.1

一. HPL/SQL概述

Hive 2.0版本开始,已经有了Hive存储过程的解决方案(HPL/SQL –Procedural SQL on hadoop)。该解决方案不仅支持Hive,还支持在SparkSQL,其他NoSQL,甚至是RDBMS中使用类似于Oracle PL/SQL的功能,这将极大的方便数据开发者的工作,Hive中很多之前比较难实现的功能,现在可以很方便的实现,比如自定义变量、基于一个结果集的游标、循环等等。

测试hplsql命令:

[root@hp1 sql]# find / -name *hplsql*
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/lib/hive-hplsql-2.1.1-cdh6.3.1.jar
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/lib/hive-hplsql.jar
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/ext/hplsql.sh
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql.cmd
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-hplsql-2.1.1-cdh6.3.1.jar
[root@hp1 sql]# 
[root@hp1 sql]# /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql -e "CURRENT_DATE + 1"
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2020-12-18

为什么选择HPL/SQL

Hadoop在数据仓库中的作用是巨大的。但是要实现全面的ETL、报告、分析和数据挖掘过程,您不仅需要分布式处理引擎(如MapReduce、Spark或Tez),还需要一种表达全面业务规则的方法。

  1. 业务逻辑驱动和高级错误处理
    HPL/SQL允许您使用变量、表达式、控制流语句和迭代来实现业务逻辑。HPL/SQL支持使用异常和条件处理程序进行错误处理。您可以开发管理和控制分布式进程的程序,但同时不会成为系统的瓶颈。

  2. 使hadoop上的sql更加动态
    HPL/SQL的关键特性之一是它允许您使SQL更加动态。您可以使用高级表达式、各种内置函数、条件来根据用户配置、以前查询的结果、来自文件或非hadoop数据源的数据等动态生成SQL。

3.利用现有的过程SQL技能
传统的数据库管理系统提供广泛用于实现高级数据操作场景和工作流的过程性SQL语言。这种方法对于数据库开发人员和数据分析人员来说既简单又熟悉。
与Python、Java或Linux shell脚本相比,HPL/SQL使Hadoop能够为BI分析师和开发人员提供更广泛的受众。

  1. ETL框架
    HPL/SQL提供函数和语句,使典型的ETL开发效率更高。

  2. 可读性和可维护性
    与Bash脚本、Java、Python或Scala程序相比,HPL/SQL对于BI/SQL开发人员来说简洁、可读和可维护。

  3. 集成和多语言持久性
    Hadoop扩展了使用RDBMS产品构建的传统数据仓库。这意味着你必须集成多个系统,包括Hadoop、RDBMS、NoSQL和其他系统。
    HPL/SQL允许您在一个脚本中处理多个系统,因此您可以针对不同类型的工作负载采用最好的方法,并轻松地集成它们。

  4. 兼容性和迁移
    HPL/SQL试图尽可能地支持所有广泛使用的过程语言的语法。您不需要从头开始学习一种新的过程化语言。这有助于开发新代码以及将现有代码库迁移到Hadoop。

  5. Hadoop快速启动
    HPL/SQL提供了开始使用Hadoop的最快方式。之后,您可以使用Spark、Tez、Storm、Flink和其他框架重新设计和实现高级数据处理工作流,但现在您可以使用您当前的技能和现有的代码在Hadoop上运行业务逻辑。

二.HPL/SQL测试案例

2.1 测试执行简单sql

[root@hp1 ~]# more b.sql
select count(*) from test.ods_fact_sale;

执行b.sql

[root@hp1 ~]# hplsql -f b.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (248 ms)
Starting query
Query executed successfully (444.87 sec)
767830000

2.2 测试函数和循环

测试一个函数和for 循环的例子

代码:

CREATE FUNCTION hello(text STRING)
 RETURNS STRING
BEGIN
 RETURN 'Hello, ' || text || '!';
END;

FOR item IN (
  SELECT empno,ename FROM test.emp limit 10
)
LOOP
PRINT item.empno || '|' || item.ename || '|' || hello(item.empno);
END LOOP;

测试记录:

[root@hp1 ~]# hplsql -f a.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (691 ms)
Starting query
Query executed successfully (639 ms)
7369|smith|Hello, 7369!
7499|allen|Hello, 7499!
7876|adams|Hello, 7876!
7900|james|Hello, 7900!
7902|ford|Hello, 7902!
7934|miller|Hello, 7934!
7521|ward|Hello, 7521!
7566|jones|Hello, 7566!
7654|martin|Hello, 7654!
7698|blake|Hello, 7698!
[root@hp1 ~]# 

2.3 测试include套用存储过程

[root@hp1 ~]# more set_message.sql 
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
print 'name='||name
SET result = 'Hello, ' || name || '!';
END;
[root@hp1 ~]# more include.sql 
INCLUDE set_message.sql
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;
[root@hp1 ~]# 

测试记录:

[root@hp1 ~]# hplsql -f include.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
name=world
Hello, world!
[root@hp1 ~]#

2.4 测试package

代码:

create or replace package users as
session_count int := 0;
function get_count() return int; 
procedure add(name varchar(100));
end;

create or replace package body users as
function get_count() return int
is
begin
return session_count;
end; 
procedure add(name varchar(100))
is 
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();

测试记录:

create or replace package users as
session_count int := 0;
function get_count() return int; 
procedure add(name varchar(100));
end;

create or replace package body users as
function get_count() return int
is
begin
return session_count;
end; 
procedure add(name varchar(100))
is 
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();

2.4 测试游标

代码:

CREATE PROCEDURE spOpenIssues 
DYNAMIC RESULT SETS 1
BEGIN
DECLARE cur CURSOR WITH RETURN FOR
SELECT empno,ename FROM test.emp;
OPEN cur;
END;

DECLARE id INT;
DECLARE name VARCHAR(30);

CALL spOpenIssues;
ALLOCATE c1 CURSOR FOR PROCEDURE spOpenIssues;

FETCH c1 INTO id, name;
WHILE (SQLCODE = 0)
DO
PRINT id || ' - ' || name;
FETCH c1 INTO id, name;
END WHILE;
CLOSE c1;

测试记录:

[root@hp1 ~]# hplsql -f cursor_test.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (255 ms)
Starting query
Query executed successfully (102 ms)
7369 - smith
7499 - allen
7876 - adams
7900 - james
7902 - ford
7934 - miller
7521 - ward
7566 - jones
7654 - martin
7698 - blake
7782 - clark
7788 - scott
7839 - king
7844 - turner

2.5 测试游标2

用我自己plsql习惯的方式来写游标

代码:

vim c.sql 
cursor emp_cur is
 select empno,ename from test.emp;

 for rc in emp_cur loop
   dbms_output.put_line(rc.empno||'-'||rc.ename);
 end loop;

测试记录:

[root@hp1 ~]# hplsql -f c.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (250 ms)
Starting query
Query executed successfully (111 ms)
7369    smith
7499    allen
7876    adams
7900    james
7902    ford
7934    miller
7521    ward
7566    jones
7654    martin
7698    blake
7782    clark
7788    scott
7839    king
7844    turner
-
[root@hp1 ~]# 

最后多了一个’-’ 好奇怪,而且前面的输出都没有连接符

替换成如下就没问题了

 for rc in (select empno,ename from test.emp) loop
   dbms_output.put_line(rc.empno||'-'||rc.ename);
 end loop;

测试记录:

[root@hp1 ~]# hplsql -f c.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (235 ms)
Starting query
Query executed successfully (97 ms)
7369-smith
7499-allen
7876-adams
7900-james
7902-ford
7934-miller
7521-ward
7566-jones
7654-martin
7698-blake
7782-clark
7788-scott
7839-king
7844-turner
[root@hp1 ~]# 

参考

1.http://www.hplsql.org/home

猜你喜欢

转载自blog.csdn.net/u010520724/article/details/112344713