Hadoop+Hive environment deployment

Note: This article is carried out after the Hadoop stand-alone environment is deployed and MySQL is installed. Click the link below to view. (I feel that the notes I made before are a bit messy... )

Hadoop stand-alone environment deployment

mysql install (version 5.7)

content:

1. Preparations

2. Introduction to hive

3.hive installation

4. Getting started with hive

5. Use Mysql to store hive metadata

 

Ready to work

1. Start hadoop related processes. hdfs and yarn and history server

2. Start the Mysql service

service mysqld start

  

Introduction to hive

Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and provides simple SQL query functions, which can convert SQL statements into MapReduce tasks for operation. Its advantages are that the learning cost is low, and simple MapReduce statistics can be quickly implemented through SQL-like statements, without the need to develop special MapReduce applications, which is very suitable for statistical analysis of data warehouses .

Official website: hive.apache.org part of the configuration reference official website case.

 

hive installation

1. Upload, unzip and rename

Upload the hive compressed package with the upload tool that comes with Linux.

After decompression, rename it to hive1.2. with the mv command

2. Create a data warehouse and grant write permissions. First go to the hadoop-2.7.3 installation directory and execute the following command

bin/hdfs dfs -mkdir /tmp
bin/hdfs dfs -mkdir /user/hive/warehouse
bin/hdfs dfs -chmod g+w   /tmp
bin/hdfs dfs -chmod g+w  /user/hive/warehouse

3. Configuration file

Copy hive-env.sh.template to the conf folder in the hive installation directory and paste it as hive-env.sh

Edit hive-env.sh. Configure hadoop installation path and hive configuration directory

4. Configure global environment variables

vim /etc/profile

Add the following configuration at the end:

# HIVE HOME
 export HIVE_HOME=/opt/modules/hive1.2
 export PATH=${PATH}:${HIVE_HOME}/bin:${HIVE_HOME}/conf

 make the configuration take effect

source /etc/profile

5. Dynamic hive bin / hive   

Jump into the pit. . . A series of errors indicate that the metadata database metastore_db cannot be created

Later, I found that the file permissions of the hive installation directory are all root

Modify user and user group to huadian user

chown -R huadian:huadian /opt/modules/hive1.2

The following picture appears successfully

 

Getting started with hive

1. Create a table

create database

create table

create table db_hive.tb_word(
id  INT,
word  STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY  '\t'
STORED AS TEXTFILE;

2. Import data

First create a test file in the datas directory and edit

Finally import the data

LOAD DATA LOCAL INPATH '/opt/datas/word.data' INTO TABLE db_hive.tb_word;

 

3. Implement business (write sql)

 Use hive to count word occurrences

select
word,count(word)
from
db_hive.tb_word
GROUP BY
word

 

Storing metadata with Mysql

Because hive uses derby by default for embedded data, it only supports one session at a time. Usually the metadata metaStore is stored in Mysql, and multiple sessions are supported to enter hive at the same time

1. Modify the hive configuration file, copy the template and rename it to hive-site.xml

Specific placement:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://bigdata-hpsk01.huadian.com/metaStore?createDatabaseIfNotExist=true<value/>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver<value/>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root<value/>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456<value/>
  </property>
</configuration>

2. Import the mysql database driver jar package under the lib folder in the hive installation directory

3. Re-enter hive

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325170422&siteId=291194637