Java operates Hive through JDBC

http://www.cnblogs.com/netbloomy/p/6688670.html

0. Overview

The way of using CLI or hive -e only allows HiveQL to perform query, update and other operations. However, Hive also provides the implementation of the client. Through HiveServer or HiveServer2, the client can operate the data in Hive without starting the CLI. Both allow remote clients to submit to Hive using multiple programming languages ​​such as Java and Python. Request, get back the result.

Similarities and differences between HiveServer and HiveServer2?

Both HiveServer and HiveServer2 are based on Thrift. Why do you need HiveServer2 when HiveServer already exists? Because HiveServer cannot handle concurrent requests from more than one client, this is due to the limitation caused by the Thrift interface used by HiveServer and cannot be corrected by modifying the code of HiveServer. Therefore, the HiveServer code was rewritten in the Hive-0.11.0 version to get HiveServer2, which solved the problem. HiveServer2 supports multi-client concurrency and authentication, and provides better support for open API clients such as JDBC and ODBC.

1. Start the service

1 ), the key configuration of hive - site.xml

<property>

  <name>hive.metastore.warehouse.dir</name>

  <value>/usr/hive/warehouse</value> //(The location of the folder where the database and table in hive are stored in HDFS)

  <description>location of default database for the warehouse</description>

</property>

<property>

  <name>hive.server2.thrift.port</name>

  <value>10000</value> //(The port for HiveServer2 remote connection, the default is 10000)

  <description>Port number of HiveServer2 Thrift interface.

  Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>

</property>

<property>

  <name>hive.server2.thrift.bind.host</name>

  <value>**.**.**.**</value> //(IP address of the cluster where hive is located)

  <description>Bind host on which to run the HiveServer2 Thrift interface.  Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>

</property>

<property>

  <name>hive.server2.long.polling.timeout</name>

  <value>5000</value> // (The default is 5000L, here it is modified to 5000, otherwise the program will report an error)

  <description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description>

</property>

<property>

  <name>javax.jdo.option.ConnectionURL</name>

  <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> //(The metadata database of Hive, I use local Mysql as the metadata database)

  <description>JDBC connect string for a JDBC metastore</description>

</property>

<property>                        

  <name>javax.jdo.option.ConnectionDriverName</name> //(connection metadata driver name)

  <value>com.mysql.jdbc.Driver</value>

  <description>Driver class name for a JDBC metastore</description>

</property>

<property>

  <name>javax.jdo.option.ConnectionUserName</name> //(connection metabase username)

  <value>hive</value>

  <description>username to use against metastore database</description>

</property>

<property>

  <name>javax.jdo.option.ConnectionPassword</name> // (connection metabase password)

  <value>hive</value>

  <description>password to use against metastore database</description>

</property>

2 ), start the metabase

First start the metastore, type in the command line: hive --service metastore & 

3 ), start the service

#hive --service hiveserver2 >/dev/null & 

The above command starts the hiveserver2 service.

Hive provides a jdbc driver, which enables us to use java code to connect to Hive and perform operations such as sql statement query of some relational databases. First, we must open the Hive service, that is, HiveServe. If you start hiveserver, change the above command to

#hive --service hiveserver >/dev/null &  

2. Put the required Jar package in

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.8.0.jar

$HIVE_HOME/lib/hive-exec-2.1.1.jar

$HIVE_HOME/lib/hive-jdbc-2.1.1.jar

$HIVE_HOME/lib/hive-metastore-2.1.1.jar

$HIVE_HOME/lib/hive-service-2.1.1.jar

$HIVE_HOME/lib/libfb303-0.9.3.jar

$HIVE_HOME/lib/commons-logging-1.2.jar

$HIVE_HOME/lib/slf4j-api-1.6.1.jar

3. java connection program

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.SQLException;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.Statement;

public class HiveClientUtils {

    private static String driverName ="org.apache.hive.jdbc.HiveDriver";

    //Fill in the IP of hive, the IP previously configured in the configuration file

    private static String Url="jdbc:hive2://localhos:10000/default";   

    private static Connection conn;

    private static PreparedStatement ps;

    private static ResultSet rs; 

    // create connection

    public static Connection getConnnection(){

        try {

            Class.forName(driverName);

            //The user name here must be a user who has permission to operate HDFS, otherwise the program will prompt "permission deny" exception

            conn = DriverManager.getConnection(Url,"vagrant","vagrant");

        } catch(ClassNotFoundException e)  {

           e.printStackTrace ();

           System.exit(1);

        } catch (SQLException e) {

            e.printStackTrace ();

        }

        return conn;

    }

    public static PreparedStatement prepare(Connection conn, String sql) {

        PreparedStatement ps = null;

        try {

            ps = conn.prepareStatement(sql);

        } catch (SQLException e) {

            e.printStackTrace ();

        }

        return ps;

    }

    public static void getAll(String tablename) {

          conn=getConnnection();

        String sql="select * from "+tablename;

        System.out.println(sql);

        try {

            ps=prepare(conn, sql);

            rs=ps.executeQuery();

            int columns=rs.getMetaData().getColumnCount();

            while(rs.next()) {

                for(int i=1;i<=columns;i++) {

                    System.out.print(rs.getString(i));

                    System.out.print("\t\t");

                }

                System.out.println();

            }

        } catch (SQLException e) {

            e.printStackTrace ();

        }

    }

    public static void main(String[] args) {

        String tablename="test1";

        getAll(tablename);

    }

}

The above code is for hiveserver2. If it is hiveserver. There are two places that need to be modified, the specific modifications are as follows:

org.apache.Hive.jdbc.HiveDriver改为:org.apache.Hadoop.hive.jdbc.HiveDriver  

jdbc:hive2://localhost:10000/default改为:jdbc:hive://localhost:10000/default  

where ' localhost ' is the host address, after 10000 is the port, and default is the default db .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325347440&siteId=291194637