Comparison of Hive's built-in services and hiveserver/hiveserver2

One: Several built-in services of Hive

             Execute bin/hive --service help as follows:      

 

[master@master1 hive]$ bin/hive --service help
ls: cannot access /opt/spark/lib/spark-assembly-*.jar: No such file or directory
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version
Parameters parsed:
  --auxpath : Auxillary jars
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
For help on a particular service:
  ./hive --service serviceName --help
Debug help:  ./hive --debug --help

           We can see the above output item Server List, which shows the list of services supported by Hive, beeline cli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat, the following introduces some of the most useful services

          1. cli: is the abbreviation of Command Line Interface, which is the command line interface of Hive. It is used more and is the default service. It can be used directly in the command line.

          2. hiveserver: This allows Hive to run in the form of a server that provides Thrift services, allowing clients written in many different languages ​​to communicate. To use the HiveServer service that needs to be started to contact the client, we can set the HIVE_PORT environment variable to Set the port that the server listens on. By default, the port number is 10000. This can be used to start Hiverserver in the following ways:

          bin/hive --service hiveserver -p 10002

          The -p parameter is also used to specify the listening port.

          3. hwi: In fact, it is the abbreviation of hive web interface. It is the web interface of hive and a web alternative to hive cli.

          4. jar: Hive interface equivalent to hadoop jar, which is an easy way to run Java applications with both Hadoop and Hive classes in the classpath

          5. Metastore: By default, the metastore and hive services run in the same process. Using this service, the metastore can run as a separate process. We can specify the listening port number through METASTOE-PORT

 

2: Three ways to start Hive

      1, hive command line mode

        Enter the hive installation directory, enter the bin/hive executable program, or enter hive –service cli

        用于linux平台命令行查询,查询语句基本跟mysql查询语句类似

       2, hive  web界面的启动方式

        bin/hive --service hwi  (& 表示后台运行)

        用于通过浏览器来访问hive,感觉没多大用途,浏览器访问地址是:127.0.0.1:9999/hwi

       3, hive  远程服务 (端口号10000) 启动方式

        bin/hive --service hiveserver2  &(&表示后台运行)

        用java,python等程序实现通过jdbc等驱动的访问hive就用这种起动方式了,这个是程序员最需要的方式了

三:hiveServer/HiveServer2

       1:简单介绍     

        两者都允许远程客户端使用多种编程语言,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,连这个和都允许远程客户端使用多种编程语言如java,python等向hive提交请求,取回结果(从hive0.15起就不再支持hiveserver了),但是在这里我们还是要说一下hiveserver

       HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift server,而HiveServer2却不会。既然已经存在HiveServer,为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供更好的支持。

       2:两者的区别

       Hiveserver1 和hiveserver2的JDBC区别: 
       HiveServer version               Connection URL                    Driver Class 

       HiveServer2                          jdbc:hive2://:                          org.apache.hive.jdbc.HiveDriver
       HiveServer1                          jdbc:hive://:                            org.apache.hadoop.hive.jdbc.HiveDriver

       3:学习HiveServer和HiveServer2

       HiveServer:

       在命令行输入hive --service hiveserver –help查看hiveserver的帮助信息:

 

[hadoop@hadoop~]$ hive --service hiveserver --help
Starting Hive Thrift Server
usage:hiveserver
-h,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --maxWorkerThreads <arg>      maximum number of worker threads,
                                 default:2147483647
    --minWorkerThreads <arg>      minimum number of worker threads,
                                  default:100
-p <port>                        Hive Server portnumber, default:10000
-v,--verbose                     Verbose mode

 

       启动hiveserver服务,可以得知默认hiveserver运行在端口10000,最小100工作线程,最大2147483647工作线程。

 

[hadoop@hadoop~]$ hive --service hiveserver -v
Starting Hive Thrift Server
14/08/01 11:07:09WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has anyeffect.  Use hive.hmshandler.retry.*instead
Starting hive serveron port 10000 with 100 min worker threads and 2147483647 maxworker threads

 

       以上的hiveserver在hive1.2.1中并不会出现,官网的说法是:

       HiveServer is scheduled to be removed from Hive releases starting Hive 0.15. See HIVE-6977. Please switch over to HiveServer2.

       Hiveserver2

       Hiveserver2允许在配置文件hive-site.xml中进行配置管理,具体的参数为:

 

hive.server2.thrift.min.worker.threads– 最小工作线程数,默认为5。
hive.server2.thrift.max.worker.threads – 最小工作线程数,默认为500。
hive.server2.thrift.port– TCP 的监听端口,默认为10000。
hive.server2.thrift.bind.host– TCP绑定的主机,默认为localhost

 

       也可以设置环境变量HIVE_SERVER2_THRIFT_BIND_HOST和HIVE_SERVER2_THRIFT_PORT覆盖hive-site.xml设置的主机和端口号。从Hive-0.13.0开始,HiveServer2支持通过HTTP传输消息,该特性当客户端和服务器之间存在代理中介时特别有用。与HTTP传输相关的参数如下:

 

hive.server2.transport.mode – 默认值为binary(TCP),可选值HTTP。
hive.server2.thrift.http.port– HTTP的监听端口,默认值为10001。
hive.server2.thrift.http.path – 服务的端点名称,默认为 cliservice。
hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程,默认为5。
hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程,默认为500。

 

        启动Hiveserver2有两种方式,一种是上面已经介绍过的hive --service hiveserver2,另一种更为简洁,为hiveserver2。使用hive--service hiveserver2 –H或hive--service hiveserver2 –help查看帮助信息:

Starting HiveServer2
Unrecognizedoption: -h
usage:hiveserver2
-H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property

       默认情况下,HiveServer2以提交查询的用户执行查询(true),如果hive.server2.enable.doAs设置为false,查询将以运行hiveserver2进程的用户运行。为了防止非加密模式下的内存泄露,可以通过设置下面的参数为true禁用文件系统的缓存:

fs.hdfs.impl.disable.cache – 禁用HDFS文件系统缓存,默认值为false。
fs.file.impl.disable.cache – 禁用本地文件系统缓存,默认值为false。

      4:配置使用hiveserver2(Hive 2.0为例)

        sudo vim hive-site.xml

       1):配置监听端口和路径

<property><name>hive.server2.thrift.port</name><value>10000</value></property><property><name>hive.server2.thrift.bind.host</name><value>192.168.48.130</value></property>

       2):设置impersonation

      这样hive server会以提交用户的身份去执行语句,如果设置为false,则会以起hive server daemon的admin user来执行语句 

 

<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>
3):hiveserver2节点配置
Hiveserver2已经不再需要hive.metastore.local这个配置项了(hive.metastore.uris为空,则表示是metastore在本地,否则
就是远程)远程的话直接配置hive.metastore.uris即可
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xxx.xxx.xxx.xxx:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to con
nect to remote metastore.</description>
  </property>
4):zookeeper配置
<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>
<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>master1:2181,slave1:2181,slave2:2181</value>
</property> 
注意:没有配置hive.zookeeper.quorum会导致无法并发执行hive ql请求和导致数据异常

       5):hiveserver2的Web UI配置

       Hive 2.0 以后才支持Web UI的,在以前的版本中并不支持

 

<property>
    <name>hive.server2.webui.host</name>
    <value>192.168.48.130</value>
    <description>The host address the HiveServer2 WebUI will listen on</description>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
    <description>The port the HiveServer2 WebUI will listen on. This can beset to 0 o
r a negative integer to disable the web UI</description>
  </property>

 

       启动服务:

       1):启动metastore

       bin/hive --service metastore &

       默认端口为9083

       2):启动hiveserver2

       bin/hive --service hiveserver2 &

       3):测试

       Web UI:http://192.168.48.130:10002/

   

          使用beeline控制台控制hiveserver2

          启动beeline :bin/beeline

          连接:!connect jdbc:hive2://192.168.48.130:10000 hive hive   

          出现错误: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: master is not allowed to impersonate hive (state=,code=0)

          解决办法:http://www.aboutyun.com/blog-331-2956.html

          PS:小编在这里并没有解决,因为这个beeline基本用不到,所以就暂时放放了,后期如果需要的话再来解决它

======2016.09.14更======================================================

由于最近要拿python写一个hive的客户端,于是重新看了下这篇博客,试着解决beeline这个问题

hiveserver2提供了一个新的命令行工具Beeline,他是基于SQLLine CLI的JDBC客户端,Beeline工作模式有两种,即本地嵌入模式和远程模式,嵌入模式情况下,他返回一个嵌入式的Hive,类似于Hive CLI,而远程模式则是通过Thrift协议与某个单独的hiveserver2进程进行连接通信,下面看一个Beeline的例子:

 

[root@master1 hive]# bin/beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/spark/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/spark/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://192.168.132.27:10000
Connecting to jdbc:hive2://192.168.132.27:10000
Enter username for jdbc:hive2://192.168.132.27:10000: hive        (这里输入账号)
Enter password for jdbc:hive2://192.168.132.27:10000: ****        (这里输入密码)
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.132.27:10000> show databases;              (查看数据库)
OK
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (0.274 seconds)
0: jdbc:hive2://192.168.132.27:10000> use default;                  (选定数据库)
OK 
No rows affected (0.069 seconds)
0: jdbc:hive2://192.168.132.27:10000> show tables;                  (查看表)
OK
+-----------+--+
| tab_name  |
+-----------+--+
+-----------+--+
No rows selected (0.093 seconds)
0: jdbc:hive2://192.168.132.27:10000> create table test(name string); (创建表)
OK
No rows affected (0.961 seconds)
0: jdbc:hive2://192.168.132.27:10000> show tables;                    (查看表)
OK
+-----------+--+
| tab_name  |
+-----------+--+
| test      |
+-----------+--+
1 row selected (0.129 seconds)
0: jdbc:hive2://192.168.132.27:10000> desc test;                       (描述表)
OK
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| name      | string     |          |
+-----------+------------+----------+--+
1 row selected (0.258 seconds)

 

 

OK!!!

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327038772&siteId=291194637