Big data--python remote connection to Hive

step:

start metastore

start hiveserver2

Use beeline to test the connection, check whether the address can be successfully connected (you can skip it after confirming it is correct)

Use python to connect to hive

(For rough file configuration and complete operation, see the end of the article)

1. Start hiveserver2

1. Configure the mode as http and the port as 10001 (default)

<property>
  <name>hive.server2.transport.mode</name>
  <value>http</value>
</property>
<property>
      <name>hive.server2.thrift.http.port</name>
      <value>10001</value>
  </property>

  <property>
      <name>hive.server2.thrift.http.path</name>
      <value>cliservice</value>
  </property>

2. start

start metastore

nohup hive --service metastore &

start hiveserver2

nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.root.logger=INFO,console &

image

3. View the port where hiveserver2 is located

The default mode of unmodified hive-site is TCP, and HiveServer2 runs on port 10000.hive.server2.thrift.port If you want to change the port, you can change   the property value in the hive-site.xml file. Use the hive command to enter the hive command line and use the following commands to view various configurations in the hive-site (the format is set attribute name), for example:

set hive.server2.thrift.port

image

(If it is set to the http protocol, which is the operation at the beginning of the article, the default port is 10001. After stepping on the pit, set the mode to http, and the tcp connection cannot be connected on my computer)

netstat -anp | grep 1000

image

4. Use jpsthe command to check whether HiveServer2 is running (RunJar service)

image

5. Browser viewing

HiveServer2 also starts a Jetty Http server on port 10002 (default), providing the Web UI. When there is an error in the hive connection, you can look at the hive log in the log.

If the startup address is not set in the configuration file, you can directly use the address where hive is located (the address of the virtual machine) + 10002 to access. For example, my access address is http://192.168.121.130:10002/ (note: there are many The tutorial directly uses localhost and the like, because their hive is deployed locally)

image

假如修改地址配置,在hive-site中添加如下配置,则访问地址为http://node01:10002/(但是使用ip+端口号的形式也可以正常访问)

<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>node01</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>

二.beeline测试连接Hive

1.启动beeline

$HIVE_HOME/bin/beeline

image

2.连接hive

(1)使用地址连接

连接地址直接把浏览器访问地址的10002改成10001即可,注意后面要加上hive的数据库,例如default,这里使用了之前创建好的study数据库。后面的一大串字符也是必要的,在python连接时都要加入(对于我的操作来说)

!connect jdbc:hive2://192.168.121.130:10001/study;transportMode=http;httpPath=cliservice

image

会提示user not allowed to XXX,这个是权限设置什么什么的,具体的感兴趣可以自行搜索报错,这里直接给出解决方案:在每台虚拟机的hadoop的配置文件/etc/hadoop/core-site.xml中增加如下配置,三台都要重启生效

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

(2)直接默认连接

在不清楚连接的Hive地址时,通过以下命令连接,然后按照提示输入用户名和密码即可(连接的用户名和密码在hive的hive-site.xml文件中设置,见Hive部署这篇文章),这里因为没有在配置文件中设置metastore的位置,因此会有警告。

!connect jdbc:hive2://

image

image

添加metastore配置

在hive的hive-site.xml中添加以下内容,(属性值为空,则表示是 metastore 在本地,否则就是远程),这里设置为虚拟机的地址以及默认的9083端口,注意修改完成后要重启。

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://192.168.121.130:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

image

三.python连接hive

1.安装包

安装 pure-sasl
pip install pure-sasl
安装 thrift_sasl
pip install thrift_sasl==0.2.1 --no-deps
安装thrift
pip install thrift_sasl==0.2.1 --no-deps
安装最终的:impyla
pip install impyla
pip install thriftpy

2.python

输出study数据库中的所有表

from impala.dbapi import connect


conn = connect(host='192.168.121.130', port=10001, auth_mechanism='PLAIN', user='用户名',
               password='密码', database='study', use_http_transport='http', http_path='cliservice')
cursor = conn.cursor()
cursor.execute('show tables')
for row in cursor:
    print(row)

image

四.快速完成配置与连接

1.配置文件

(1)hive-site.xml(node01上修改)

<configuration>
<property>
    <name>hive.metastore.warehouse.dir</name>        
    <value>/user/hive_local/warehouse</value>    
</property>
<property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp_local/hive</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;usessL=false</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>    
    <value>com.mysql.jdbc.Driver</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>    
    <value>MySQL@2022</value>
</property>
<property>
    <name>hive.cli.print.header</name>            
    <value>true</value>
</property>
<property>
    <name>hive.cli.print.current.db</name>
       <value>true</value>
</property>
<property>
    <name>hive.exec.mode.local.auto</name>
       <value>true</value>
</property>
<property>
  <name>hive.server2.authentication</name>
  <value>NOSASL</value>
</property>
<property>
  <name>hive.server2.use.SSL</name>
  <value>false</value>
</property>
<property>
  <name>hive.server2.transport.mode</name>
  <value>http</value>
</property>
<property>
      <name>hive.server2.thrift.http.port</name>
      <value>10001</value>
  </property>

  <property>
      <name>hive.server2.thrift.http.path</name>
      <value>cliservice</value>
  </property>
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://192.168.121.130:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>   
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>node01</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
</configuration>

(2)core-site.xml(node01,02,03上修改)

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://master</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/export/servers/hadoop-2.7.4/tmp</value>
</property>
<property>
    <name>ha.zookeeper.quorum</name>
    <value>node01:2181,node02:2181,node03:2181</value>
</property>
<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>
</configuration>

2.连接测试

启动hiveserver2服务,在beeline上进行测试是否可以连接

#node01,02,03分别依次执行
zkServer.sh start
zkServer.sh status
hadoop-daemon.sh start journalnode

#node01执行
start-dfs.sh
start-yarn.sh
nohup hive --service metastore &
nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.root.logger=INFO,console &
netstat -anp | grep 1000
$HIVE_HOME/bin/beeline
!connect jdbc:hive2://192.168.121.130:10001/study;transportMode=http;httpPath=cliservice

3.测试无误后执行三中的python即可

Guess you like

Origin blog.csdn.net/qq_51641196/article/details/128405980