First, what is the Hive?
1, Hive is a translator, SQL ---> Hive engine ---> MR program
2, Hive HDFS is built on a data warehouse (Data Warehouse)
Hive HDFS
List of Tables
Directory partition
data file
File barrel
3, Hive support SQL (SQL99 standard from a subset)
Second, the architecture of the Hive (Paint)
Third, installation and configuration
To extract the installation / training / directory
tar -zxvf apache-hive-2.3.0-bin.tar.gz -C ~/training/
Set Environment Variables
HIVE_HOME=/root/training/apache-hive-2.3.0-bin
export HIVE_HOME
PATH=$HIVE_HOME/bin:$PATH
export PATH
Core configuration file: conf / hive-site.xml
1, embedded mode
(*) Do not need MySQL support, using Hive's built-in Derby database
(*) Limitations: Only one connection
javax.jdo.option.ConnectionURL
jdbc:derby:;databaseName=metastore_db;create=true
javax.jdo.option.ConnectionDriverName
org.apache.derby.jdbc.EmbeddedDriver
hive.metastore.local
true
hive.metastore.warehouse.dir
file:///root/training/apache-hive-2.3.0-bin/warehouse
Derby database initialization
schematool -dbType derby -initSchema
Journal
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
2, local mode, remote mode: the need for MySQL
(*) MySQL client: mysql front http://www.mysqlfront.de/
Hive installation
(1) install MySQL on a virtual machine:
rpm -ivh mysql-community-devel-5.7.19-1.el7.x86_64.rpm (optional)
rpm -ivh mysql-community-server-5.7.19-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.19-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.19-1.el7.x86_64.rpm
rpm -ivh mysql-community-common-5.7.19-1.el7.x86_64.rpm
yum remove mysql-libs
(2) Start MySQL: service mysqld start, or: systemctl start mysqld.service
Check the root user's password: cat /var/log/mysqld.log | grep password
After logging in to change the password: alter user 'root' @ 'localhost' identified by 'Sjm_123456';
MySQL database configurations:
Create a new database: create database hive;
Create a new user:
create user 'hiveowner'@'%' identified by 'Sjm_123456';
To the user authorization
grant all on hive.* TO 'hiveowner'@'%';
grant all on hive.* TO 'hiveowner'@'localhost' identified by 'Sjm_123456';
Remote mode
Metadata information is stored in a remote MySQL database
Note Be sure to use a high drive versions of MySQL (version 5.1.43 above)
Parameter File
Configuration parameters
Reference
hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?useSSL=false
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
hiveowner
javax.jdo.option.ConnectionPassword
Welcome_1
Initialization MetaStore: schematool -dbType mysql -initSchema
(*) Re-create the hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?useSSL=false
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
hiveowner
javax.jdo.option.ConnectionPassword
Sjm_123456
(*) The mysql into the jar package lib directory (upload mysql driver package)
u Note Be sure to use a high drive versions of MySQL (version 5.1.43 above)
Directory: /training/apache-hive-2.3.0-bin/lib
(*) Initialization MySQL
(*) The old version: automatically initialized when you first start the HIve
(*)new version:
schematool -dbType mysql -initSchema
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
Four, Hive data model (most important elements)
Note: The default: Column delimiter is a tab (tab)
Test data: employee table and the department table
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
First, look at the directory structure hive of HDFS
create database hive;
1, the inner table: table corresponding equivalent MySQL HDFS directory / user / hive / warehouse
create table emp
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int
comm int,
deptno int);
Insert data insert, load statement
load data inpath '/scott/emp.csv' into table emp; imported data of HDFS (HDFS from a directory, and the nature of the data tables introduced Hive ctrl + x)
load data local inpath '/ root / temp / *****' into table emp; Linux into the local data (table data imported Hive nature ctrl + c)
Create a table, you must specify the delimiter
create table emp1
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int
comm int,
deptno int)
row format delimited fields terminated by ',';
Creating a Department table and import data
create table dept
(deptno int,
dname string,
loc string)
row format delimited fields terminated by ',';
2, the partition table: can improve query efficiency ----> by viewing the SQL execution plan
Create a partition according to department numbers of employees
create table emp_part
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int
comm int)
partitioned by (deptno int)
row format delimited fields terminated by ',';
Specified imported data partition (import data through the sub-query) ----> MapReduce program
insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=10;
insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=20;
insert into table emp_part partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=30;
hive of silent mode: hive -S advantage of the console will not print some log information, the screen clean and fresh
How to view SQL execution plan? Need to use keywords to explain
1) Check hive ordinary table (internal table) SQL execution plan:
explain select * from emp_1 where deptno=10;
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp_1
Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (deptno = 10) (type: boolean)
Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE
ListSink
2) Check hive in the partition table of SQL execution plan
explain select * from emp_part where deptno=10;
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp_part
Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE
ListSink
How to understand or read the execution plan it?
Remember that a principle: from the bottom up, from right to left
3, external table: HDFS is essentially to create a new file or directory on a "shortcut"
create external table t1
(sid int,sname string,age)
row format delimited fields terminated by ','
location '/students';
Note: The external table, the table is deleted, the data is not deleted.
4, barrel table: hash algorithm employed essentially for data storage, in the form of a file. The difference is that the partition is a directory partition
(*) Hash partitioning
(*) Bucket list
create table emp_bucket
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int
comm int,
deptno int)
clustered by (job) into 4 buckets
row format delimited fields terminated by ',';
Note: The data is inserted into the barrel before the hive table must first set environment variables, or even if you insert data, but the hive will not have data stored in barrels points
Log hive, execute: hive -S
Then execute the following command:
set hive.enforce.bucketing = true;
as the picture shows:
Insert data by way of sub-query:
insert into emp_bucket select * from emp_1;
Sentence statement will be converted into MR program execution:
When finished, we look at the directory structure of the hive in HDFS barrel table:
Data are stored on four different buckets, you can easily view the contents of a file:
hdfs dfs -cat /user/hive/warehouse/hive02.db/emp_bucket/000000_0
5, view: view virtual table
(1) does not exist view view depends on a table called base table
(2) operational view of the same with the operating table
(3) improve the efficiency of queries can view it?
No, the view is to simplify complex queries
(4) For example query employee information: name of the department Employee Name
create view myview
as Wuxi flow of the hospital http://xmobile.wxbhnk120.com/
select dept.dname,emp1.ename
from emp1,dept
where emp1.deptno=dept.deptno;
Some operations:
Table hive
-------------------
1.managed table
Hosted table.
When you delete a table, the data is deleted.
2.external table
External table.
When you delete a table, the data is not deleted.
hive command
---------------
// Create a table, external table external
CREATE external TABLE IF NOT EXISTS t2(id int,name string,age int)
COMMENT 'xx' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;
// View table data
desc t2;
desc formatted t2 ;
// load data into the hive table
load data local inpath '/home/centos/customers.txt' into table t2; // local file upload
load data inpath '/user/centos/customers.txt' [overwrite] into table t2; // move files
// copy table
mysql> create table tt as select * from users; // carrying the data and the table structure
mysql> create table tt like users; // no data, only the table structure
hive>create table tt as select * from users ;
hive>create table tt like users ;
// count () query to turn into mr
$hive>select count(*) from t2 ;
$hive>select id,name from t2 ;
$hive>select * from t2 order by id desc ; //MR
// Enable / Disable table
ALTER TABLE t2 ENABLE NO_DROP; // can not be deleted
ALTER TABLE t2 DISABLE NO_DROP; // Can Delete
// partition table, one of the means to optimize, control of the search data from the directory level.
// Create the partition table.
CREATE TABLE t3(id int,name string,age int) PARTITIONED BY (Year INT, Month INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
// explicit partition information table
SHOW PARTITIONS t3;
// add a partition, create a directory
alter table t3 add partition (year=2014, month=12);
// delete the partition
ALTER TABLE employee_partitioned DROP IF EXISTS PARTITION (year=2014, month=11);
// partition structure
hive>/user/hive/warehouse/mydb2.db/t3/year=2014/month=11
hive>/user/hive/warehouse/mydb2.db/t3/year=2014/month=12
// loading data into a partitioned table
load data local inpath '/home/centos/customers.txt' into table t3 partition(year=2014,month=11);
// Create a bucket list
CREATE TABLE t4(id int,name string,age int) CLUSTERED BY (id) INTO 3 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
// Load the data will not be sub-barrel operation
load data local inpath '/home/centos/customers.txt' into table t4 ;
// query the data inserted into the table t3 t4 in.
insert into t4 select id,name,age from t3 ;
How barrel // number of tables set?
// the amount of assessment data, to ensure that the amount of 2 times the size of each data block of the tub.
// join query
CREATE TABLE customers(id int,name string,age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
CREATE TABLE orders(id int,orderno string,price float,cid int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
// load data into tables
// inner join query
select a.*,b.* from customers a , orders b where a.id = b.cid ;
// left outside
select a.*,b.* from customers a left outer join orders b on a.id = b.cid ;
select a.*,b.* from customers a right outer join orders b on a.id = b.cid ;
select a.*,b.* from customers a full outer join orders b on a.id = b.cid ;
// explode, burst, the table generating function.
// use the word hive achieve statistical
// 1. To build the table
CREATE TABLE doc(line string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
Five, Hive queries
Is SQL: select ---> MapReduce