Hive remote mode Remote installation

Hive installation and configuration

1. Introduction to Hive

Uses of Hive

  • Conveniently manage the metadata of files and data, and provide a unified metadata management method

  • Provide a simpler way to access large-scale data sets, and use SQL language for data analysis

metadata concept

  • The metadata of hdfs is stored in the namenode, and the metadata of hive should be stored in the database. It is a table-like format, which is convenient for later use when doing SQL conversion. SQL statements can be used to query and access directly.

  • metadata contains metadata about databases, tables, etc. created with Hive

  • Metadata is stored in relational databases, such as Derby, MySQL, etc.

metadata function

  • The client connects to the metastore service, and the metastore connects to the database to access metadata

  • With the metastore service, multiple clients can connect at the same time, and these clients do not need to know the user name and password of the database, only need to connect to the metastore service.

Hive's metadata supports local storage (local derby, local MySQL) and remote storage (remote access to MySQL)

  • Embedded mode : hive service and metastore service run in the same process, and Derby also runs in this process

    Use Derby database to store metadata, no additional Metastore service is required.

    Disadvantage: A metastore service will be embedded every time you start one

  • Local mode Local : an external database is used to store metadata. The hive service and the metastore service run in the same process. The database is a separate process, and the database is connected locally or remotely

  • Remote mode Remote : hive and metastore are in a separate process, which can be different nodes, and the database can be accessed locally or remotely

    Each client connects to the metastore service through the configuration in the configuration file

In the process of using Hive, SQL statements are used for data analysis. From SQL statements to specific task execution, it needs to go through four parts: interpreter, compiler, optimizer, and executor.

  • Interpreter: call the syntax interpreter and semantic analyzer to convert SQL statements into corresponding executable java code or business code

  • Compiler: Convert the corresponding java code into bytecode file or jar package

  • Optimizer: In the process of parsing and converting from SQL statements to java code, you need to call the optimizer to optimize related strategies to achieve optimal query performance

  • Executor: When the business code conversion is completed, it needs to be uploaded to the MapReduce cluster for execution

2. Hive installation

Remote Metastore Server is more suitable for a real production environment, so this article uses this mode to install Hive

Among them, master installs hive server, slave1 installs mysql, slave2, slave3 installs hive client

Unzip Hive and move it to the Hadoop folder

tar -xvf apache-hive-2.3.7-bin.tar.gz -C ../hadoop
mv apache-hive-2.3.7-bin/ hive-2.3.7

Add Hive to system environment variables

vi /etc/profile

Add the following code at the end

export HIVE_HOME=/usr/hadoop/hive-2.3.7
export PATH=$HIVE_HOME/bin:$PATH

Effective configuration

source /etc/profile

Create a hive directory in hdfs, and modify the write permission of the user group currently belonging to

hadoop fs -mkdir /tmp   #存hive临时数据
hadoop fs -mkdir /hive
hadoop fs -mkdir /hive/warehouse  
hadoop fs -chmod g+w /tmp   # 对用户组写入操作
hadoop fs -chmod g+w /hive/warehouse   

Enter /usr/hadoop/hive-2.3.7/conf directory

cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh

Modify the hive-site.xml file and query ** system: java. Io. Tmpdir ∗ ∗, ∗ ∗ {system:java.io.tmpdir}** ,**system:java.io.tmpdir {system:user.name} **and modify the following value

**${system:user.name}, can be changed or not.

<property>
    <name>hive.exec.local.scratchdir</name>
    <value/>
    <description>Local scratch space for Hive jobs</description>
</property>

<!-- Hive的struct集合类型 -->
<property>
    <name>hive.querylog.location</name>
    <value>/usr/hadoop/hive-2.3.7/logs/root_query_logs</value>
    <description>Location of Hive run time structured log file</description>
</property>
 
<property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/usr/hadoop/hive-2.3.7/logs/root_operation_logs</value>
    <description>如果启用了日志记录功能,则存储操作日志的顶级目录</description>
</property>

<!-- 官网暂没找到该详细解释 -->
<property>
    <name>hive.downloaded.resources.dir</name>
    <value>/usr/hadoop/hive-2.3.7/tmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.	  </description>
</property>

For the configuration of hive-site.xml on the server side, the server side is only equipped with the following:

<property>
	<name>javax.jdo.option.ConnectionURL</name>
	<value>jdbc:mysql://slave1:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>wang</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>wang123@</value>
</property>

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://master:8020/hive/warehouse</value>
    <description>数据仓库存储位置</description>
</property>

Modify the hive-log4j2.properties file

property.hive.log.dir = /usr/hadoop/hive-2.3.7/logs/${sys:user.name}

Modify the hive-env.sh file

HADOOP_HOME=/usr/hadoop/hadoop-2.10.1

Download the driver jar package from the mirror station of Tsinghua University

https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/

Copy to the /usr/hadoop/hive-2.3.7/lib directory

cp mysql-connector-java-5.1.49.jar  /usr/hadoop/hive-2.3.7/lib

Send hive and environment configuration to slave2 and slave3

scp -r hive-2.3.7 slave2:/usr/hadoop
scp /etc/profile slave2:/etc/profile
source /etc/profile

scp -r hive-2.3.7 slave3:/usr/hadoop
scp /etc/profile slave3:/etc/profile
source /etc/profile

Modify the hive-site.xml on the client, the client is only equipped with the following two! The above configuration is the server configuration.

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://master:8020/hive/warehouse</value>
</property>

<property>
	<name>hive.metastore.uris</name>
	<value>thrift://master:9083</value>
	<description>客户端利用thrift协议通过metastoreServer访问元数据库</description>
</property>

On the server, perform initialization (do it once)

schematool -dbType mysql -initSchema

The following shows that the initialization is successful. If the prompt fails, it may have been initialized, just delete the database.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hadoop/hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://slave1:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 yoseng
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

Start the hive service on the server, & representatives can continue to input, this is the old version of the startup method,
this is outdated! Start with 3. Beeline.

hive --service metastore &

Start hive on the client respectively

hive

When using HQL to operate, you can view the meta information in the "database hive"

3. Beeline

The Hive client tool will use Beeline to replace HiveCLI (Hive command line interface) in the future. It is a new command line client tool, which is a JDBC client based on SQLLine CLI.

Beeline supports embedded mode and remote mode. In embedded mode, run embedded Hive (similar to Hive CLI), and remote mode can connect to independent HiveServer2 process through Thrift

On the server side, start the beeline server

hiveserver2 &

On the client side, start beeline

# 使用anonymous登录
beeline -u jdbc:hive2://master:10000/default
# 使用root登录 
beeline -u jdbc:hive2://master:10000/default -n root

Beeline login successfully

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hadoop/hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://master:10000/default
Connected to: Apache Hive (version 2.3.7)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.7 by Apache Hive
0: jdbc:hive2://master:10000/default> 

Exit beeline

!exit

Beeline Tips:

If there is an error similar to the following

Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root (state=08S01,code=0)

Need to modify hadoop's core-site.xml file to add root user permissions

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

Guess you like

Origin blog.csdn.net/qq_46009608/article/details/112796030