Note: This article is carried out after the Hadoop stand-alone environment is deployed and MySQL is installed. Click the link below to view. (I feel that the notes I made before are a bit messy... )
Hadoop stand-alone environment deployment
mysql install (version 5.7)
content:
5. Use Mysql to store hive metadata
Ready to work
1. Start hadoop related processes. hdfs and yarn and history server
2. Start the Mysql service
service mysqld start
Introduction to hive
Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and provides simple SQL query functions, which can convert SQL statements into MapReduce tasks for operation. Its advantages are that the learning cost is low, and simple MapReduce statistics can be quickly implemented through SQL-like statements, without the need to develop special MapReduce applications, which is very suitable for statistical analysis of data warehouses .
Official website: hive.apache.org part of the configuration reference official website case.
hive installation
1. Upload, unzip and rename
Upload the hive compressed package with the upload tool that comes with Linux.
After decompression, rename it to hive1.2. with the mv command
2. Create a data warehouse and grant write permissions. First go to the hadoop-2.7.3 installation directory and execute the following command
bin/hdfs dfs -mkdir /tmp bin/hdfs dfs -mkdir /user/hive/warehouse bin/hdfs dfs -chmod g+w /tmp bin/hdfs dfs -chmod g+w /user/hive/warehouse
3. Configuration file
Copy hive-env.sh.template to the conf folder in the hive installation directory and paste it as hive-env.sh
Edit hive-env.sh. Configure hadoop installation path and hive configuration directory
4. Configure global environment variables
vim /etc/profile
Add the following configuration at the end:
# HIVE HOME export HIVE_HOME=/opt/modules/hive1.2 export PATH=${PATH}:${HIVE_HOME}/bin:${HIVE_HOME}/conf
make the configuration take effect
source /etc/profile
5. Dynamic hive bin / hive
Jump into the pit. . . A series of errors indicate that the metadata database metastore_db cannot be created
Later, I found that the file permissions of the hive installation directory are all root
Modify user and user group to huadian user
chown -R huadian:huadian /opt/modules/hive1.2
The following picture appears successfully
Getting started with hive
1. Create a table
create database
create table
create table db_hive.tb_word( id INT, word STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
2. Import data
First create a test file in the datas directory and edit
Finally import the data
LOAD DATA LOCAL INPATH '/opt/datas/word.data' INTO TABLE db_hive.tb_word;
3. Implement business (write sql)
Use hive to count word occurrences
select word,count(word) from db_hive.tb_word GROUP BY word
Storing metadata with Mysql
Because hive uses derby by default for embedded data, it only supports one session at a time. Usually the metadata metaStore is stored in Mysql, and multiple sessions are supported to enter hive at the same time
1. Modify the hive configuration file, copy the template and rename it to hive-site.xml
Specific placement:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://bigdata-hpsk01.huadian.com/metaStore?createDatabaseIfNotExist=true<value/> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver<value/> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root<value/> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456<value/> </property> </configuration>
2. Import the mysql database driver jar package under the lib folder in the hive installation directory
3. Re-enter hive