Summary of Python 3 using Hive

Start the HiveServer2 service

HiveServer2 is an optional Hive built-in service that allows remote clients to submit requests to Hive using different programming languages ​​and return results.

Thrift service configuration

Assume that we have successfully installed Hive. If not, please refer to: Hive article  . Before starting HiveServer2, we need to do some configuration:

configuration item

Defaults

illustrate

hive.server2.transport.mode

binary

HiveServer2 transmission mode, binary or http

hive.server2.thrift.port

10000

When the HiveServer2 transmission mode is set to binary, the port number of the Thrift interface

hive.server2.thrift.http.port

10001

When the HiveServer2 transmission mode is set to http, the port number of the Thrift interface

hive.server2.thrift.bind.host

localhost

Host bound to the Thrift service

hive.server2.thrift.min.worker.threads

5

Thrift minimum number of worker threads

hive.server2.thrift.max.worker.threads

500

Thrift maximum number of worker threads

hive.server2.authentication

NONE

Client authentication type, NONE, LDAP, KERBEROS, CUSTOM, PAM, NOSASL

hive.server2.thrift.client.user

anonymous

Thrift client username

hive.server2.thrift.client.password

anonymous

Thrift client password

Start the HiveServer2 service 

Method 1: $HIVE_HOME/bin/hiveserver2 

[root@Hadoop3-master bin]# hiveserver2
2023-08-16 23:14:00: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 0ba8eb07-5f63-43a1-aa4d-61954f6e244f

Method 2: $HIVE_HOME/bin/hive --service hiveserver2

Check whether HiveServer2 starts successfully

netstat -nl | grep 10000

Start hiveserver2 and access the Hive management platform

Default access address: http://192.168.43.11:10002/

Effect screenshot:

Python connects to Hive

Dependent third-party library packages 

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

Warm reminder: The first is the installation of pyhive: the pyhive package depends on the three packages sasl, thrift, and thrift-sasl.

Problems encountered when installing Sasl library table package 

The cause of the error is:

 saslwrapper.cpp
      C:\Users\zzg\AppData\Local\Temp\pip-install-1vw7hyr4\sasl_05859569d9c14648abbe3a8901ed3627\sasl\saslwrapper.h(22): fatal error C1083: 无法打开包括文件: “sasl/sasl.h”: No such file or directory

 The sasl/sasl.h header file cannot be found in the saslwrapper.cpp file.

Solutions from Google and Baidu

 Download the sasl.whl file through the mirror address of the University of California: https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl . Situation: The current situation is that the website has been shut down.

Download the sasl.whl file through the mirror address of Tsinghua University: https://pypi.tuna.tsinghua.edu.cn/simple/sasl/  . Situation: There is no Python-3.10 library package that supports the windows 64 architecture.

 Warm reminder: Tsinghua’s mirror address provides content about sasl.whl, which mainly includes:

  • Supports python-3.5.0 to python-3.9.0 versions and the system architecture only supports Linux architecture.
  • Provide sasl third-party library source code: support 0.1.1 to 0.3.1

Compile Sasl-0.3.1 source code and generate Sasl.whl file

Download the Sasl source code through the Tsinghua mirror, and take a screenshot of the decompressed effect:

Switch to the Sasl source code and execute the command: python setup.py bdist_wheel 

The source code compilation error is the same as that of pip installing the sasl library.

Learn from other installations of Sasl successfully 

Environment description:

The python version is python 3.10

cp310: expressed as python version, for python 3 10

win_amd64: indicates that the driver is a Windows  64-bit driver

Corresponding sasl.whl package =  sasl-0.3.1-cp310-cp310-win_amd64.whl

Execute the following instructions:

pip install  sasl-0.3.1-cp310-cp310-win_amd64.whl

Install thrift

pip install thrift

Install thrift_sasl

pip install thrift_sasl

 Install pyHive

pip install pyhive

Python connection Hive code 

from pyhive import hive
 
# 读取数据
def select_pyhive(sql):
    # 创建hive连接
    conn = hive.Connection(host='192.168.43.11', port=10000, username='默认', database='user')
    cur = conn.cursor()
    try:
        #c = cur.fetchall()
        df = pd.read_sql(sql, conn)
        return df
    finally:
        if conn:
            conn.close()
 
sql = "show databases"
df = select_pyhive(sql)

Guess you like

Origin blog.csdn.net/zhouzhiwengang/article/details/132330997