Java通过JDBC操作Hive

http://www.cnblogs.com/netbloomy/p/6688670.html

0、概述

使用的都是CLI或者hive –e的方式仅允许使用HiveQL执行查询、更新等操作。然而Hive也提供客户端的实现，通过HiveServer或者HiveServer2，客户端可以在不启动CLI的情况下对Hive中的数据进行操作，两者都允许远程客户端使用多种编程语言如Java、Python向Hive提交请求，取回结果。

HiveServer与HiveServer2的异同？

HiveServer和HiveServer2都是基于Thrift。既然已经存在HiveServer为什么还需要HiveServer2呢？因为HiveServer不能处理多于一个客户端的并发请求，这是由于HiveServer使用的Thrift接口所导致的限制，不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2，进而解决了该问题。HiveServer2支持多客户端的并发和认证，为开放API客户端如JDBC、ODBC提供了更好的支持。

1、启动服务

1）、hive-site.xml的关键配置

<name>hive.metastore.warehouse.dir</name>

<value>/usr/hive/warehouse</value> //（hive中的数据库和表在HDFS中存放的文件夹的位置）

<description>location of default database for the warehouse</description>

</property>

<name>hive.server2.thrift.port</name>

<value>10000</value> //（HiveServer2远程连接的端口，默认为10000）

<description>Port number of HiveServer2 Thrift interface.

Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>

</property>

<name>hive.server2.thrift.bind.host</name>

<value>**.**.**.**</value> //（hive所在集群的IP地址）

<description>Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>

</property>

<name>hive.server2.long.polling.timeout</name>

<value>5000</value> // (默认为5000L,此处修改为5000，不然程序会报错)

<description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description>

</property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> //（Hive的元数据库，我采用的是本地Mysql作为元数据库）

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name> //（连接元数据的驱动名）

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name> //（连接元数据库用户名）

<description>username to use against metastore database</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name> // （连接元数据库密码）

<description>password to use against metastore database</description>

</property>

2）、启动元数据库

先启动元数据库，在命令行中键入：hive --service metastore &

3）、启动服务

#hive --service hiveserver2 >/dev/null &

以上命令启动hiveserver2服务。

Hive提供了jdbc驱动，使得我们可以用java代码来连接Hive并进行一些类关系型数据库的sql语句查询等操作。首先，我们必须将Hive的服务，也就是HiveServe打开。如果启动hiveserver就把上面命令改为

#hive --service hiveserver >/dev/null &

2、将所需Jar包放到

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.8.0.jar

$HIVE_HOME/lib/hive-exec-2.1.1.jar

$HIVE_HOME/lib/hive-jdbc-2.1.1.jar

$HIVE_HOME/lib/hive-metastore-2.1.1.jar

$HIVE_HOME/lib/hive-service-2.1.1.jar

$HIVE_HOME/lib/libfb303-0.9.3.jar

$HIVE_HOME/lib/commons-logging-1.2.jar

$HIVE_HOME/lib/slf4j-api-1.6.1.jar

3、java连接程序

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.SQLException;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.Statement;

public class HiveClientUtils {

private static String driverName ="org.apache.hive.jdbc.HiveDriver";

//填写hive的IP，之前在配置文件中配置的IP

private static String Url="jdbc:hive2://localhos:10000/default";

private static Connection conn;

private static PreparedStatement ps;

private static ResultSet rs;

//创建连接

public static Connection getConnnection(){

try {

Class.forName(driverName);

//此处的用户名一定是有权限操作HDFS的用户，否则程序会提示"permission deny"异常

conn = DriverManager.getConnection(Url,"vagrant","vagrant");

} catch(ClassNotFoundException e) {

e.printStackTrace();

System.exit(1);

} catch (SQLException e) {

e.printStackTrace();

}

return conn;

}

public static PreparedStatement prepare(Connection conn, String sql) {

PreparedStatement ps = null;

try {

ps = conn.prepareStatement(sql);

} catch (SQLException e) {

e.printStackTrace();

}

return ps;

}

public static void getAll(String tablename) {

conn=getConnnection();

String sql="select * from "+tablename;

System.out.println(sql);

try {

ps=prepare(conn, sql);

rs=ps.executeQuery();

int columns=rs.getMetaData().getColumnCount();

while(rs.next()) {

for(int i=1;i<=columns;i++) {

System.out.print(rs.getString(i));

System.out.print("\t\t");

}

System.out.println();

}

} catch (SQLException e) {

e.printStackTrace();

}

public static void main(String[] args) {

String tablename="test1";

getAll(tablename);

}

上面代码是针对hiveserver2的。如果是hiveserver。那有两处需要修改，具体修改如下：

org.apache.Hive.jdbc.HiveDriver 改为：org.apache.Hadoop.hive.jdbc.HiveDriver

jdbc:hive2://localhost:10000/default 改为：jdbc:hive://localhost:10000/default

其中'localhost'是主机地址，10000是端口后，default是默认的db。

Java通过JDBC操作Hive

猜你喜欢