Big data combat Linux Ubuntu 20.04.1 hive data warehouse installation

What is Hive?
Open sourced by Facebook to solve the data statistics of massive structured logs, later known as the open source project of Apache Hive.
Hive is based on the data warehouse architecture on the Hadoop file system. HDFS is used for storage and MR is used for calculation. It provides a large number of functions for data warehouse management, such as data ETL (see above for details) tools, data storage management and large-scale data query and analysis capabilities.
At the same time, Hive also defines a SQL-like language-Hive QL, which allows users to perform operations similar to SQL. He can convert a data file of structured data into a data table and provide simple query functions. You can convert SQL into MapReduce statements.
Hive can be understood as a tool. There is no master-slave structure. You don't need to install it on each machine. You only need to install a few.
Default database: derby, later converted to relational database mysql.
Take a look at the position of hive in the Hadoop ecosystem in the figure below.

Insert picture description here
Then look at the Hive architecture.
Insert picture description here
Meta store introduces metadata.
HDFS and MapReduce. The most primitive data is actually on HDFS, and when running some SQL statements, internally it actually runs MapReduce.
Client client. Tasks can be submitted to the Driver to run through JDBC or CLI, SQL Parser changes the data into an abstract syntax tree, and then parses it into a Physical Plan and puts it on Execution for execution (in fact, it is also executed in MR).
Insert picture description here
Note here that the meta store is not a database, but data! Store specific information of metadata.
The following is a detailed disassembly of the Driver.
Insert picture description here
Here can be combined with the above figure to understand.
The following words are the overall process.
Insert picture description here
Here, the client submits the task, first reads the meta data to find the metadata information, including storage location, size, etc., and then puts it on the Driver for analysis, and then puts it on the map reduce to run.
Source:
References

Summary: Hive must be installed in the Hadoop environment

1 hive2.3.6 data warehouse installation

1.1 Install MySQL software on the Master node

sudo apt update
Insert picture description here

1.1.1 Install MySQL

Insert picture description here
Insert picture description here
Note: apt is an online update, sometimes it cannot be updated.
Reason: the dns of the computer connected to the wifi and the network cable are different.
Solution: check the dns, if you need to connect to the network cable, change the dns of the network cable; if you need to connect to the wifi, change the dns of the wifi.

1.1.2 Set MySQL parameters

sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
Insert picture description here
bind-address = 127.0.0.1 changed to 0.0.0.0 and
Insert picture description here
finally added:
default-storage-engine = innodb
innodb_file_per_table=on
collation-server = utf8_general_ci
character-set- server = utf8
Insert picture description here

1.1.3 Set automatic startup when booting

sudo systemctl enable mysql.service
Insert picture description here

1.1.4 Start MySQL service

sudo systemctl start mysql.service
Insert picture description here

1.1.5 Initialize MySQL

sudo mysql_secure_installation
Insert picture description here
Insert picture description here
Insert picture description here

1.1.6 Authorize MySQL user hive

sudo mysql -uroot -p123456

Insert picture description here
show databases;
Insert picture description here
use mysql;
Insert picture description here
create user ‘hive’@’%’ identified by ‘[email protected]’;
Insert picture description here

grant all privileges on *.* to 'hive'@'%';

Insert picture description here
create user ‘hive’@‘localhost’ identified by ‘[email protected]’;
Insert picture description here

grant all privileges on *.* to 'hive'@'localhost';

Insert picture description here
alter user ‘hive’@’%’ require none;

Insert picture description here

1.1.7 Restart MySQL service

sudo systemctl restart mysql.service
Insert picture description here

2 Install the Hive software on the Master node

2.1 Log in to the Master node as user angel and install the Hive software

Upload Hive to the node machine via winscp
Insert picture description here
sudo tar xzvf /home/angel/apache-hive-2.3.6-bin.tar.gz
Insert picture description here
sudo chown -R angel:angel /app/apache-hive-2.3.6-bin
Insert picture description here

3 Master node setting Hive parameters

3.1 Rename the Hive configuration file

cd /app/apache-hive-2.3.6-bin/conf
Insert picture description here
Insert picture description here

3.2 Modify hive-env.sh file

vim.tiny hive-ens.sh
Insert picture description here
Insert picture description here
add:

HADOOP_HOME=/app/hadoop-2.8.5/
export HIVE_CONF_DIR=/app/apache-2.3.6-bin/conf/
export HIVE_AUX_JARS_PATH=/app/apache-hive-2.3.6-bin/lib/

3.3 Create a new hbase-site.xml file

Insert picture description here
Add content:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   	<property>
        		<name>hive.metastore.warehouse.dir</name>
		<value>/hive/warehouse</value>
		<description>location of default database for the warehouse</description>
	</property> 
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;allowPublicKeyRetrieval=true</value>
		<description>JDBC connect string for a JDBC metastore.</description>
	</property> 
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
		<description>Driver class name for a JDBC metastore</description>
	</property> 
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>hive</value>
		<description>Username to use against metastore database</description>
	</property> 
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>[email protected]123456</value>
		<description>password to use against metastore database</description> 
	</property> 
	<property>
	<name>hive.querylog.location</name>
	<value>/app/apache-hive-2.3.6-bin/logs</value>
	<description>Location of Hive run time structured log file</description>
	</property> 
	<property>
	<name>hive.metastore.uris</name>
	<value>thrift://master:9083</value>
	<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
	</property> 
	<property>
	<name>hive.server2.webui.host</name>
	<value>0.0.0.0</value>
	</property> 
	<property>
	<name>hive.server2.webui.port</name>	
	<value>10002</value>
	</property> 
	<property>
	<name>hive.metastore.schema.verification</name>
	<value>false</value>
	</property>
</configuration>

3.4 Upload the "mysql-connector-java-5.1.48.jar" connection package to the /app/apache-hive-2.3.6-bin/lib directory

Insert picture description here

3.5 Modify environment variables

Insert picture description here

Insert picture description here

3.6 Environment variables take effect

source .profile
Insert picture description here

4 Master node starts Hive service

4.1 Create database hive and import hive-schema

cd /app/apache-hive-2.3.6-bin/scripts/metastore/upgrade/mysql
Insert picture description here
mysql -hmaster -uhive [email protected]
Insert picture description here
source hive-schema-2.3.0.mysql.sql;
Insert picture description here
Insert picture description here
ctrl+D 退出。

4.2 Start hive metastore service

hive --service metastore &
Insert picture description here
Insert picture description here
press enter, and then enter hive, you can enter the jice operation interface.
Insert picture description here

4.3 Start hiveserver2 service

Hiveserver 2 & After
Insert picture description here
starting the service, jps checks the background process.
Insert picture description here

5 Install the Hive software on the client host

5.1 Log in to the client host as the user angel to install the Hive software. The client host can be any slave node machine or desktop host

sudo scp -r [email protected]:/app/apache-hive-2.3.6-bin /app/apache-hive-2.3.6-bin
Insert picture description here
input angel user password
input master root password
Insert picture description here
sudo chown -R angel:angel /app /apache-hive-2.3.6-bin
Insert picture description here

5.2 Modify hive-site.xml file

cd /app/apache-hive-2.3.6-bin/conf/
Insert picture description here

vim.tiny hive-site.xml

Insert picture description here

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>/hive/warehouse</value>
		<description>location of default database for the warehouse</description>
	</property> 
	<property>
		<name>hive.querylog.location</name>
		<value>/app/apache-hive-2.3.6-bin/logs</value>
		<description>Location of Hive run time structured log file</description>
	</property> 
	<property>
		<name>hive.metastore.uris</name>
		<value>thrift://master:9083</value>
		<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
	</property> 
	<property>
		<name>hive.metastore.schema.verification</name>
		<value>false</value>
	</property>
</configuration>

5.3 Modify environment variables

Insert picture description here

Insert picture description here
Add content:

export HIVE_HOME=/app/apache-hive-2.3.6-bin
export PATH=$PATH:$HIVE_HOME/bin

Activate the environment variable
source.profile
Insert picture description here

6 Test Hive

6.1 Client host test Hive

Insert picture description here

6.2 View database

Insert picture description here

6.3 Open the browser and enter "http://172.25.0.10:10002/hiveserver2.jsp" to view the hiveserver2 service

Insert picture description here
Insert picture description here
At this point, the hive data warehouse is successfully installed!

Guess you like

Origin blog.csdn.net/qq_45059457/article/details/110365831