Hive series (two) - Installation under Linux deployment Hive

First, install Hive

1.1 Download and unzip

Download the required version of Hive, here I downloaded version cdh5.15.2. Download: http://archive.cloudera.com/cdh5/cdh/5/

# 下载后进行解压
 tar -zxvf hive-1.1.0-cdh5.15.2.tar.gz

1.2 Configuration Environment Variables

# vim /etc/profile

Add environment variables:

export HIVE_HOME=/usr/app/hive-1.1.0-cdh5.15.2
export PATH=$HIVE_HOME/bin:$PATH

It makes the configuration of environment variables to take effect immediately:

# source /etc/profile

1.3 modify the configuration

1. hive-env.sh

Enter the installation directory conf/directory, copy Hive environment configuration templateflume-env.sh.template

cp hive-env.sh.template hive-env.sh

Modification hive-env.sh, specify the installation path of Hadoop:

HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2

2. hive-site.xml

New hive-site.xml file, as follows, mainly configured to store the metadata MySQL addresses, driver, user name and password and other information:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop001:3306/hadoop_hive?createDatabaseIfNotExist=true</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
  </property>

</configuration>

1.4 copy database-driven

MySQL driver package will be copied to the installation directory Hive libdirectory, MySQL driver download address is: https://dev.mysql.com/downloads/connector/j/ , in this warehouse resources directory I also upload a there may need to download.

1.5 yuan database initialization

  • When the hive using version 1.x, you can not initiate operation, Hive will be initialized automatically when you first start, but will not generate all metadata information table, only part of the necessary initialization, in after use automatically created when used in the rest of the table;

  • When the hive using a version 2.x, you must manually initialize the metadata database. Initialization commands:

    # schematool 命令在安装目录的 bin 目录下,由于上面已经配置过环境变量,在任意位置执行即可
    schematool -dbType mysql -initSchema

I'm using here is the CDH hive-1.1.0-cdh5.15.2.tar.gz, the corresponding Hive 1.1.0version, you can skip this step.

1.6 start

Having the Hive's bin directory to configure the environment variables, use the following command to start directly and successfully enter the interactive command line after the show databasescommand, without exception represents build success.

# hive

Mysql can be seen in the table library and store metadata information created Hive

二、HiveServer2/beeline

Hive built HiveServer HiveServer2 and services, both of which allow the client to connect using a variety of programming languages, but can not handle concurrent requests HiveServer plurality of clients, thus creating HiveServer2. HiveServer2 (HS2) to allow remote clients can use a variety of programming languages ​​to submit a request to the Hive and search results, multi-client support concurrent access and authentication. HS2 is a single process composed of a plurality of services, including the Hive Thrift-based service (TCP or HTTP) and Web UI for the Jetty Web service.

HiveServer2 has its own CLI tool --Beeline. Beeline is based SQLLine of a JDBC client. Due to the current HiveServer2 Hive is the focus of development and maintenance, it is recommended to use more official Beeline instead Hive CLI. The following mainly on the configuration of Beeline.

2.1 modify the configuration of Hadoop

Modified core-site.xml hadoop cluster configuration file, add the following configuration, the user can specify the proxy root hadoop of all users on the machine.

<property>
 <name>hadoop.proxyuser.root.hosts</name>
 <value>*</value>
</property>
<property>
 <name>hadoop.proxyuser.root.groups</name>
 <value>*</value>
</property>

The reason to configure this step, because after the introduction of the security mechanism camouflage hadoop 2.0, such hadoop allowed upper system (e.g. Hive) is transmitted directly to the actual user hadoop layer, but should be actually transmitted to a super user agent, by that Acting perform operations on hadoop, in order to avoid any client operate freely hadoop. If you do not configure this step, after the connection may throw AuthorizationExceptionan exception.

关于 Hadoop 的用户代理机制,可以参考:hadoop 的用户代理机制Superusers Acting On Behalf Of Other Users

2.2 启动hiveserver2

由于上面已经配置过环境变量,这里直接启动即可:

# nohup hiveserver2 &

2.3 使用beeline

可以使用以下命令进入 beeline 交互式命令行,出现 Connected 则代表连接成功。

# beeline -u jdbc:hive2://hadoop001:10000 -n root

更多大数据系列文章可以参见 GitHub 开源项目大数据入门指南

Guess you like

Origin www.cnblogs.com/heibaiying/p/11386760.html