【Python】window10 python connect hive

You need to use python to connect to hive under window10, get the data in hive, and then train the model. After the model is trained, the relevant model results are written back to hive for persistence. The purpose is to prevent data from being stored in the local modeling environment! ! !

 

surroundings:

operating system window 10
python python 3.6.5
hive 1.2.1

 

Third-party dependencies required by python

Package names version Install command
bitarray 0.8.1 pip install bitarray==0.8.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
impyla 0.16.0 pip install impyla==0.16.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pure_sasl 0.5.1 pip install pure_sasl==0.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
thrift 0.9.3 pip install thrift==0.9.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
thrift_sasl 0.2.1 pip install thrift_sasl==0.2.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
thriftpy 0.3.9 pip install thriftpy==0.3.9 -i https://pypi.tuna.tsinghua.edu.cn/simple
thriftpy2 0.4.7 pip insta thriftpy2==0.4.7 -i https://pypi.tuna.tsinghua.edu.cn/simple

The package conflict problems discovered by ps temporarily in practice are as follows:

1. Impyla's requirement for thrift library is <=0.9.3, and pyhive 0.6.1 is not compatible with thrift 0.9.3

2. Thrift_sasl is guaranteed to be 0.2.1 and below, otherwise an error will be reported. The reported error will be introduced later

3. With thrift_sasl, another third-party package needs to be uninstalled. sasl needs to be uninstalled. This package will be used when using pyhive.

 

Code to connect to hive

from impala.dbapi import connect
from impala.util import as_pandas

conn = connect(host='*.*.*.*', port=10000, auth_mechanism='PLAIN', database='src')
cursor = conn.cursor()
cursor.execute('select * from table_name limit 10')

print(cursor.description) #打印字段名
#print(as_pandas(cursor)) #打印结果

#转化为dataframe
df = as_pandas(cursor)
print(df)

cursor.close()
conn.close()

The results are as follows:

 

In the process as above. Encountered the following pits. Record it in case of follow-up notes

 

Question one:

solution:

conn = connect(host='*.*.*.*', port=10000, database='src')

#在这句话中,添加上auth_mechanism='NONE'

conn = connect(host='*.*.*.*', port=10000, auth_mechanism='NONE', database='src')

 

Question two:

solution:

conn = connect(host='*.*.*.*', port=10000, auth_mechanism='NONE', database='src')

#将如上这句代码中的auth_mechanism='NONE'修改为auth_mechanism='PLAIN'

conn = connect(host='*.*.*.*', port=10000, auth_mechanism='PLAIN', database='src')

The reason why I set auth_mechanism='NONE' is because the following parameters in the hive-site.xml file are NONE

The code should be modified to auth_mechanism='PLAIN'

 

Question three:

 

solution:

#将sasl该第三方包卸载。
pip uninstall sasl

 

Question four:

solution:

Modify the code on line 94 of __init__.py in the thrift_sasl package

After modification as follows:

 

Question five:

solution

#将thrift_sasl 0.3.0卸载,安装0.2.1
pip install thrift_sasl==0.2.1

 

So far. The problems are solved. See hope! ! !

Guess you like

Origin blog.csdn.net/xiezhen_zheng/article/details/102798405