What is Hive?
Open sourced by Facebook to solve the data statistics of massive structured logs, later known as the open source project of Apache Hive.
Hive is based on the data warehouse architecture on the Hadoop file system. HDFS is used for storage and MR is used for calculation. It provides a large number of functions for data warehouse management, such as data ETL (see above for details) tools, data storage management and large-scale data query and analysis capabilities.
At the same time, Hive also defines a SQL-like language-Hive QL, which allows users to perform operations similar to SQL. He can convert a data file of structured data into a data table and provide simple query functions. You can convert SQL into MapReduce statements.
Hive can be understood as a tool. There is no master-slave structure. You don't need to install it on each machine. You only need to install a few.
Default database: derby, later converted to relational database mysql.
Take a look at the position of hive in the Hadoop ecosystem in the figure below.
Then look at the Hive architecture.
Meta store introduces metadata.
HDFS and MapReduce. The most primitive data is actually on HDFS, and when running some SQL statements, internally it actually runs MapReduce.
Client client. Tasks can be submitted to the Driver to run through JDBC or CLI, SQL Parser changes the data into an abstract syntax tree, and then parses it into a Physical Plan and puts it on Execution for execution (in fact, it is also executed in MR).
Note here that the meta store is not a database, but data! Store specific information of metadata.
The following is a detailed disassembly of the Driver.
Here can be combined with the above figure to understand.
The following words are the overall process.
Here, the client submits the task, first reads the meta data to find the metadata information, including storage location, size, etc., and then puts it on the Driver for analysis, and then puts it on the map reduce to run.
Source:
References
Summary: Hive must be installed in the Hadoop environment
1 hive2.3.6 data warehouse installation
1.1 Install MySQL software on the Master node
sudo apt update
1.1.1 Install MySQL
Note: apt is an online update, sometimes it cannot be updated.
Reason: the dns of the computer connected to the wifi and the network cable are different.
Solution: check the dns, if you need to connect to the network cable, change the dns of the network cable; if you need to connect to the wifi, change the dns of the wifi.
1.1.2 Set MySQL parameters
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
bind-address = 127.0.0.1 changed to 0.0.0.0 and
finally added:
default-storage-engine = innodb
innodb_file_per_table=on
collation-server = utf8_general_ci
character-set- server = utf8
1.1.3 Set automatic startup when booting
sudo systemctl enable mysql.service
1.1.4 Start MySQL service
sudo systemctl start mysql.service
1.1.5 Initialize MySQL
sudo mysql_secure_installation
1.1.6 Authorize MySQL user hive
sudo mysql -uroot -p123456
show databases;
use mysql;
create user ‘hive’@’%’ identified by ‘Yun@123456’;
grant all privileges on *.* to 'hive'@'%';
create user ‘hive’@‘localhost’ identified by ‘Yun@123456’;
grant all privileges on *.* to 'hive'@'localhost';
alter user ‘hive’@’%’ require none;
1.1.7 Restart MySQL service
sudo systemctl restart mysql.service
2 Install the Hive software on the Master node
2.1 Log in to the Master node as user angel and install the Hive software
Upload Hive to the node machine via winscp
sudo tar xzvf /home/angel/apache-hive-2.3.6-bin.tar.gz
sudo chown -R angel:angel /app/apache-hive-2.3.6-bin
3 Master node setting Hive parameters
3.1 Rename the Hive configuration file
cd /app/apache-hive-2.3.6-bin/conf
3.2 Modify hive-env.sh file
vim.tiny hive-ens.sh
add:
HADOOP_HOME=/app/hadoop-2.8.5/
export HIVE_CONF_DIR=/app/apache-2.3.6-bin/conf/
export HIVE_AUX_JARS_PATH=/app/apache-hive-2.3.6-bin/lib/
3.3 Create a new hbase-site.xml file
Add content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true</value>
<description>JDBC connect string for a JDBC metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Yun@123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/app/apache-hive-2.3.6-bin/logs</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
</configuration>
3.4 Upload the "mysql-connector-java-5.1.48.jar" connection package to the /app/apache-hive-2.3.6-bin/lib directory
3.5 Modify environment variables
3.6 Environment variables take effect
source .profile
4 Master node starts Hive service
4.1 Create database hive and import hive-schema
cd /app/apache-hive-2.3.6-bin/scripts/metastore/upgrade/mysql
mysql -hmaster -uhive -pYun@123456
source hive-schema-2.3.0.mysql.sql;
ctrl+D 退出。
4.2 Start hive metastore service
hive --service metastore &
press enter, and then enter hive, you can enter the jice operation interface.
4.3 Start hiveserver2 service
Hiveserver 2 & After
starting the service, jps checks the background process.
5 Install the Hive software on the client host
5.1 Log in to the client host as the user angel to install the Hive software. The client host can be any slave node machine or desktop host
sudo scp -r angel@master:/app/apache-hive-2.3.6-bin /app/apache-hive-2.3.6-bin
input angel user password
input master root password
sudo chown -R angel:angel /app /apache-hive-2.3.6-bin
5.2 Modify hive-site.xml file
cd /app/apache-hive-2.3.6-bin/conf/
vim.tiny hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/app/apache-hive-2.3.6-bin/logs</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
</configuration>
5.3 Modify environment variables
Add content:
export HIVE_HOME=/app/apache-hive-2.3.6-bin
export PATH=$PATH:$HIVE_HOME/bin
Activate the environment variable
source.profile
6 Test Hive
6.1 Client host test Hive
6.2 View database
6.3 Open the browser and enter "http://172.25.0.10:10002/hiveserver2.jsp" to view the hiveserver2 service
At this point, the hive data warehouse is successfully installed!