Big data learning-python connects hive database through Pyhive

1. Combination of hbase and hive

(1) Hbase build table to add data

#test是表名,name是列族
#hbase可以一个列族里边多个字段
create 'test','name'

#添加数据
put 'test','1','name:t1','1'
put 'test','1','name:t2','2'

#查询
scan 'test'

#查询 get 表名,row-key,列族
get 'test','1','name:t1'


#删除表
disable 'test'
drop 'test'

#查看表信息
desc 'test'

(2) Create an external table on hive and map hbase

CREATE EXTERNAL TABLE  test( key string,t1 int,
t2 int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'  
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,name:t1,name:t2"
 )
TBLPROPERTIES ("hbase.table.name" = "test", "hbase.mapred.output.outputtable" = "test");

Test whether the data of the two platforms are connected. And the data is updated synchronously.

2. hive connection and read data with pandas

(1) Configure the hive-site.xml file

<property>
        <name>hive.server2.thrift.bind.host</name>
        <value>192.168.99.250</value>
</property>
<property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
</property>

(2) Start hive

hive --service metastore &
hiveserver2 &

(3) Read data

from pyhive import hive
import pandas as pd
conn = hive.Connection(host = IP地址, port = 10000, username = 'hive')
#host主机ip,port:端口号,username:用户名,database:使用的数据库名称


cursor = conn.cursor()
cursor.execute('show databases')


# 打印结果
for result in cursor.fetchall():
     print(result) 

或者pandas读取
sql = 'select * from default.employees'

df = pd.read_sql(sql,conn)
     

 

Guess you like

Origin blog.csdn.net/qq_28409193/article/details/113744823