News Analysis System Hive and HBase real-time integrated data analysis

(A) Hive Overview

 

(B) the position Hive Hadoop ecosystem in

 

(C) Hive architecture

 

 

(Iv) application scenarios and advantages Hive

 

(E) Hive download and install deployment

1.Hive Download

Apache version of Hive.

Cloudera version of Hive.

Here choose to download the stable version of Apache apache-hive-0.13.1-bin.tar.gz, and uploaded to / opt / softwares / directory bigdata-pro03.kfk.com nodes.

2. Extract the installation hive

tar -zxf apache-hive-0.13.1-bin.tar.gz -C /opt/modules/

3. Modify hive-log4j.properties profile

cd /opt/modules/hive-0.13.1-bin/conf

mv hive-log4j.properties.template hive-log4j.properties

we hive-log4j.properties

# Log directory needs to be created in advance

hive.log.dir=/opt/modules/hive-0.13.1-bin/logs

4. Modify profile hive-env.sh

mv hive-env.sh.template hive-env.sh

we hive-env.sh

export HADOOP_HOME=/opt/modules/hadoop-2.5.0

export HIVE_CONF_DIR=/opt/modules/hive-0.13.1-bin/conf

The first start HDFS, Hive and then create a directory

bin/hdfs dfs -mkdir -p /user/hive/warehouse

bin/hdfs dfs -chmod g+w /user/hive/warehouse

6. Start hive

./hive

# View database

show databases;

# Use the default database

use default;

View table #

show tables;

(Vi) Hive integration with MySQL

1. Create a hive-site.xml file in the /opt/modules/hive-0.13.1-bin/conf directory, database configuration mysql yuan.

vi hive-site.xml

  <property>

    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://bigdata-pro01.kfk.com/metastore?createDatabaseIfNotExist=true</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

  </property>

 <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>root</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>123456</value>

  </property>

2. Set user connections

1) View User Information

mysql -uroot -p123456

show databases;

use mysql;

show tables;

select User,Host,Password from user;

2) update user information

update user set Host='%' where User = 'root' and Host='localhost'

3) Delete User Information

delete from user where user='root' and host='127.0.0.1'

select User,Host,Password from user;

delete from user where host='localhost'

4) Refresh information

flush privileges;

3. Copy mysql lib directory driver packages to the hive

cp mysql-connector-java-5.1.27.jar /opt/modules/hive-0.13.1/lib/

4. Ensure a third cluster node to another without keys Login

(Vii) Hive service starts with the test

1. Start HDFS and services YARN

2. Start hive

./hive

3. Create a table by hive service

CREATE TABLE stu(id INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;

4. Create a data file

we /opt/datas/stu.txt

00001 zhangsan

00002 lysis

00003 wangwu

00004 zhaoliu

The table data is loaded into the hive

load data local inpath '/opt/datas/stu.txt' into table stu;

(Viii) Hive integration with HBase

1. Zookeeper arranged in hive-site.xml file, hive HBase cluster to connect through this parameter.

<property>

    <name>hbase.zookeeper.quorum</name>   <value>bigdata-pro01.kfk.com,bigdata-pro02.kfk.com,bigdata-pro03.kfk.com</value>

</property>

2. hbase nine packets are copied to the hive / lib directory. If CDH version, does not need a good guide has been integrated package.

export HBASE_HOME=/opt/modules/hbase-0.98.6-cdh5.3.0

export HIVE_HOME=/opt/modules/hive-0.13.1/lib

ln -s $HBASE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/htrace-core-2.04.jar$HIVE_HOME/lib/htrace-core-2.04.jar

 

ln -s $HBASE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar

 

ln -s $HBASE_HOME/lib/high-scale-lib-1.1.1.jar $HIVE_HOME/lib/high-scale-lib-1.1.1.jar

 

ln -s $HBASE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar

3. Create integrated with HBase Hive external table

create external table weblogs(id string,datatime string,userid string,searchname string,retorder string,cliorder string,cliurl string)  STORED BY  'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,info:datatime,info:userid,info:searchname,info:retorder,info:cliorder,info:cliurl") TBLPROPERTIES("hbase.table.name" = "weblogs");

       

# View hbase data record

select count(*) from weblogs;

In beeline and hiveserver2 use 4.hive

1) Start hiveserver2

bin/hiveserver2

2) Start beeline

bin/beeline

# Connection hive2 Service

!connect jdbc:hive2//bigdata-pro03.kfk.com:10000

View table #

show tables;

# View the top 10 data

select * from weblogs limit 10;

Guess you like

Origin www.cnblogs.com/misliu/p/11005306.html