在Liunx上安装Hive

在Liunx上安装Hive以及如何与Hadoop集成和将Hive的元数据存储到MySQL里，今天散仙就来看下，如何在Eclipse里通过JDBC的方式操作Hive.

我们都知道Hive是一个类SQL的框架，支持HSQL语法操作Hive，而Hive内部，会转成一个个MapReduce作业来完成具体的数据统计，虽然我们可以直接在Hive的shell里，向Hive发起命令，但这样做受限制比较多，如果我们能把它的操作结合在编程里，这样以来我们的Hive就会变得非常灵活了。

Hive是支持JDBC操作的，所以我们就可以像操作MySQL一样，在JAVA代码里，操作Hive，进行数据统计。

下面详细看下，操作步骤：
软件环境

序号

说明

系统

centos6.5安装hadoop2.2.0

linux

centos6.5安装Hive0.13

linux

Eclipse4.2

Windows7

序号

步骤

说明

hadoop2.2.0安装，启动

Hive依赖Hadoop环境

hive安装

类SQL方式操作MapReduce

启动hiveserver2

远程操作Hive的服务端程序

在win上新建一个java项目，并导入Hive所需jar包

远程操作必需步骤

在eclipse里编码，测试

测试连接hive是否成功

在hiveserver2端查看

检查是否对接成功和任务打印日志

在hadoop的8088端口上查看MR执行任务

查看MR执行调度

一些HIVE操作语句：

导入数据到一个表中：
LOAD DATA LOCAL INPATH '/home/search/abc1.txt' OVERWRITE INTO TABLE info;

show tables;//显示当前的所有的表
desc talbeName;查看当前表的字段结构
show databases;//查看所有的已有的数据库
建表语句
create table mytt (name string ,count int) row format delimited fields terminated by '#' stored as textfile ;

jar包，截图

Hive依赖Hadoop，因此客户端最好把hadoop的jar包夜引入项目中，下面是调用源码，运行前，确定你在服务端的hiversver2已经开启。

Java代码

package com.test;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import org.apache.hadoop.conf.Configuration;
/**
* 在Win7上，使用JDBC操作Hive
* @author qindongliang
*
* 大数据技术交流群：376932160
* **/
public class HiveJDBClient {
/**Hive的驱动字符串*/
private static String driver="org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws Exception{
//加载Hive驱动
Class.forName(driver);
//获取hive2的jdbc连接，注意默认的数据库是default
Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.46.32/default", "search", "dongliang");
Statement st=conn.createStatement();
String tableName="mytt";//表名
ResultSet rs=st.executeQuery("select avg(count) from "+tableName+" ");//求平均数,会转成MapReduce作业运行
//ResultSet rs=st.executeQuery("select * from "+tableName+" ");//查询所有,直接运行
while(rs.next()){
System.out.println(rs.getString(1)+" ");
}
System.out.println("成功!");
st.close();
conn.close();
}
}

package com.test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

import org.apache.hadoop.conf.Configuration;

 
/**
 * 在Win7上，使用JDBC操作Hive
 * @author qindongliang
 * 
 * 大数据技术交流群：376932160
 * **/
public class HiveJDBClient {
	
	/**Hive的驱动字符串*/
	private static String driver="org.apache.hive.jdbc.HiveDriver";
	
	
	
	public static void main(String[] args) throws Exception{
		//加载Hive驱动
		Class.forName(driver);
		//获取hive2的jdbc连接，注意默认的数据库是default
		Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.46.32/default", "search", "dongliang");
	    Statement st=conn.createStatement();
	    String tableName="mytt";//表名
	    ResultSet rs=st.executeQuery("select  avg(count) from "+tableName+" ");//求平均数,会转成MapReduce作业运行
	    //ResultSet rs=st.executeQuery("select  * from "+tableName+" ");//查询所有,直接运行
	    while(rs.next()){
	    	System.out.println(rs.getString(1)+"   ");
	    }
	    System.out.println("成功!");
	    st.close();
	    conn.close();
		
	}
	
	
	

}

结果如下：

Java代码

48.6
成功!

48.6   
成功!

Hive的hiveserver2 端log打印日志：

Java代码

[search@h1 bin]$ ./hiveserver2
Starting HiveServer2
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/08/05 04:00:02 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
OK
OK
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1407179651448_0001, Tracking URL = http://h1:8088/proxy/application_1407179651448_0001/
Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1407179651448_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-05 04:03:49,951 Stage-1 map = 0%, reduce = 0%
2014-08-05 04:04:19,118 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.74 sec
2014-08-05 04:04:30,860 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.7 sec
MapReduce Total cumulative CPU time: 3 seconds 700 msec
Ended Job = job_1407179651448_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.7 sec HDFS Read: 253 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 700 msec
OK

[search@h1 bin]$ ./hiveserver2 
Starting HiveServer2
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/08/05 04:00:02 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
OK
OK
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1407179651448_0001, Tracking URL = http://h1:8088/proxy/application_1407179651448_0001/
Kill Command = /home/search/hadoop/bin/hadoop job  -kill job_1407179651448_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-05 04:03:49,951 Stage-1 map = 0%,  reduce = 0%
2014-08-05 04:04:19,118 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.74 sec
2014-08-05 04:04:30,860 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.7 sec
MapReduce Total cumulative CPU time: 3 seconds 700 msec
Ended Job = job_1407179651448_0001
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.7 sec   HDFS Read: 253 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 700 msec
OK

hadoop的8088界面截图如下：

下面这条SQL语句，不会转成MapReduce执行，select * from mytt limit 3；
结果如下：

Java代码

中国
美国
中国
成功!

 
中国   
美国   
中国   
成功!

至此，我们的JDBC调用Hive已经成功运行，我们可以在客户端执行，一些建表，建库，查询等操作，但是有一点需要注意的是，如果在win上对Hive的表，执行数据导入表的操作，那么一定确保你的数据是在linux上的，导入的路径也是linux路径，不能直接把win下面的数据，给导入到linux上的hive表里面

猜你喜欢