Based on the introduction of five Hive

  First, what is the Hive?

  1, Hive is a translator, SQL ---> Hive engine ---> MR program

  2, Hive HDFS is built on a data warehouse (Data Warehouse)

  Hive HDFS

  List of Tables

  Directory partition

  data file

  File barrel

  3, Hive support SQL (SQL99 standard from a subset)

  Second, the architecture of the Hive (Paint)

  Third, installation and configuration

  To extract the installation / training / directory

  tar -zxvf apache-hive-2.3.0-bin.tar.gz -C ~/training/

  Set Environment Variables

  HIVE_HOME=/root/training/apache-hive-2.3.0-bin

  export HIVE_HOME

  PATH=$HIVE_HOME/bin:$PATH

  export PATH

  Core configuration file: conf / hive-site.xml

  1, embedded mode

  (*) Do not need MySQL support, using Hive's built-in Derby database

  (*) Limitations: Only one connection

  javax.jdo.option.ConnectionURL

  jdbc:derby:;databaseName=metastore_db;create=true

  javax.jdo.option.ConnectionDriverName

  org.apache.derby.jdbc.EmbeddedDriver

  hive.metastore.local

  true

  hive.metastore.warehouse.dir

  file:///root/training/apache-hive-2.3.0-bin/warehouse

  Derby database initialization

  schematool -dbType derby -initSchema

  Journal

  Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

  2, local mode, remote mode: the need for MySQL

  (*) MySQL client: mysql front http://www.mysqlfront.de/

  Hive installation

  (1) install MySQL on a virtual machine:

  rpm -ivh mysql-community-devel-5.7.19-1.el7.x86_64.rpm (optional)

  rpm -ivh mysql-community-server-5.7.19-1.el7.x86_64.rpm

  rpm -ivh mysql-community-client-5.7.19-1.el7.x86_64.rpm

  rpm -ivh mysql-community-libs-5.7.19-1.el7.x86_64.rpm

  rpm -ivh mysql-community-common-5.7.19-1.el7.x86_64.rpm

  yum remove mysql-libs

  (2) Start MySQL: service mysqld start, or: systemctl start mysqld.service

  Check the root user's password: cat /var/log/mysqld.log | grep password

  After logging in to change the password: alter user 'root' @ 'localhost' identified by 'Sjm_123456';

  MySQL database configurations:

  Create a new database: create database hive;

  Create a new user:

  create user 'hiveowner'@'%' identified by 'Sjm_123456';

  To the user authorization

  grant all on hive.* TO 'hiveowner'@'%';

  grant all on hive.* TO 'hiveowner'@'localhost' identified by 'Sjm_123456';

  Remote mode

  Metadata information is stored in a remote MySQL database

  Note Be sure to use a high drive versions of MySQL (version 5.1.43 above)

  Parameter File

  Configuration parameters

  Reference

  hive-site.xml

  javax.jdo.option.ConnectionURL

  jdbc:mysql://localhost:3306/hive?useSSL=false

  javax.jdo.option.ConnectionDriverName

  com.mysql.jdbc.Driver

  javax.jdo.option.ConnectionUserName

  hiveowner

  javax.jdo.option.ConnectionPassword

  Welcome_1

  Initialization MetaStore: schematool -dbType mysql -initSchema

  (*) Re-create the hive-site.xml

  javax.jdo.option.ConnectionURL

  jdbc:mysql://localhost:3306/hive?useSSL=false

  javax.jdo.option.ConnectionDriverName

  com.mysql.jdbc.Driver

  javax.jdo.option.ConnectionUserName

  hiveowner

  javax.jdo.option.ConnectionPassword

  Sjm_123456

  (*) The mysql into the jar package lib directory (upload mysql driver package)

  u Note Be sure to use a high drive versions of MySQL (version 5.1.43 above)

  Directory: /training/apache-hive-2.3.0-bin/lib

  (*) Initialization MySQL

  (*) The old version: automatically initialized when you first start the HIve

  (*)new version:

  schematool -dbType mysql -initSchema

  Starting metastore schema initialization to 2.3.0

  Initialization script hive-schema-2.3.0.mysql.sql

  Initialization script completed

  schemaTool completed

  Four, Hive data model (most important elements)

  Note: The default: Column delimiter is a tab (tab)

  Test data: employee table and the department table

  7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30

  First, look at the directory structure hive of HDFS

  create database hive;

  1, the inner table: table corresponding equivalent MySQL HDFS directory / user / hive / warehouse

  create table emp

  (empno int,

  ename string,

  job string,

  mgr int,

  hiredate string,

  sal int

  comm int,

  deptno int);

  Insert data insert, load statement

  load data inpath '/scott/emp.csv' into table emp; imported data of HDFS (HDFS from a directory, and the nature of the data tables introduced Hive ctrl + x)

  load data local inpath '/ root / temp / *****' into table emp; Linux into the local data (table data imported Hive nature ctrl + c)

  Create a table, you must specify the delimiter

  create table emp1

  (empno int,

  ename string,

  job string,

  mgr int,

  hiredate string,

  sal int

  comm int,

  deptno int)

  row format delimited fields terminated by ',';

  Creating a Department table and import data

  create table dept

  (deptno int,

  dname string,

  loc string)

  row format delimited fields terminated by ',';

  2, the partition table: can improve query efficiency ----> by viewing the SQL execution plan

  Create a partition according to department numbers of employees

  create table emp_part

  (empno int,

  ename string,

  job string,

  mgr int,

  hiredate string,

  sal int

  comm int)

  partitioned by (deptno int)

  row format delimited fields terminated by ',';

  Specified imported data partition (import data through the sub-query) ----> MapReduce program

  insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=10;

  insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=20;

  insert into table emp_part partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=30;

  hive of silent mode: hive -S advantage of the console will not print some log information, the screen clean and fresh

  How to view SQL execution plan? Need to use keywords to explain

  1) Check hive ordinary table (internal table) SQL execution plan:

  explain select * from emp_1 where deptno=10;

  STAGE DEPENDENCIES:

  Stage-0 is a root stage

  STAGE PLANS:

  Stage: Stage-0

  Fetch Operator

  limit: -1

  Processor Tree:

  TableScan

  alias: emp_1

  Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE

  Filter Operator

  predicate: (deptno = 10) (type: boolean)

  Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE

  Select Operator

  expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)

  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7

  Statistics: Num rows: 1 Data size: 619 Basic stats: COMPLETE Column stats: NONE

  ListSink

  2) Check hive in the partition table of SQL execution plan

  explain select * from emp_part where deptno=10;

  STAGE DEPENDENCIES:

  Stage-0 is a root stage

  STAGE PLANS:

  Stage: Stage-0

  Fetch Operator

  limit: -1

  Processor Tree:

  TableScan

  alias: emp_part

  Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE

  Select Operator

  expressions: empno (type: int), ename (type: string), job (type: string), mgr (type: int), hiredate (type: string), sal (type: int), comm (type: int), 10 (type: int)

  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7

  Statistics: Num rows: 3 Data size: 121 Basic stats: COMPLETE Column stats: NONE

  ListSink

  How to understand or read the execution plan it?

  Remember that a principle: from the bottom up, from right to left

  3, external table: HDFS is essentially to create a new file or directory on a "shortcut"

  create external table t1

  (sid int,sname string,age)

  row format delimited fields terminated by ','

  location '/students';

  Note: The external table, the table is deleted, the data is not deleted.

  4, barrel table: hash algorithm employed essentially for data storage, in the form of a file. The difference is that the partition is a directory partition

  (*) Hash partitioning

  (*) Bucket list

  create table emp_bucket

  (empno int,

  ename string,

  job string,

  mgr int,

  hiredate string,

  sal int

  comm int,

  deptno int)

  clustered by (job) into 4 buckets

  row format delimited fields terminated by ',';

  Note: The data is inserted into the barrel before the hive table must first set environment variables, or even if you insert data, but the hive will not have data stored in barrels points

  Log hive, execute: hive -S

  Then execute the following command:

  set hive.enforce.bucketing = true;

  as the picture shows:

  Insert data by way of sub-query:

  insert into emp_bucket select * from emp_1;

  Sentence statement will be converted into MR program execution:

  When finished, we look at the directory structure of the hive in HDFS barrel table:

  Data are stored on four different buckets, you can easily view the contents of a file:

  hdfs dfs -cat /user/hive/warehouse/hive02.db/emp_bucket/000000_0

  5, view: view virtual table

  (1) does not exist view view depends on a table called base table

  (2) operational view of the same with the operating table

  (3) improve the efficiency of queries can view it?

  No, the view is to simplify complex queries

  (4) For example query employee information: name of the department Employee Name

  create view myview

  as Wuxi flow of the hospital http://xmobile.wxbhnk120.com/

  select dept.dname,emp1.ename

  from emp1,dept

  where emp1.deptno=dept.deptno;

  Some operations:

  Table hive

  -------------------

  1.managed table

  Hosted table.

  When you delete a table, the data is deleted.

  2.external table

  External table.

  When you delete a table, the data is not deleted.

  hive command

  ---------------

  // Create a table, external table external

  CREATE external TABLE IF NOT EXISTS t2(id int,name string,age int)

  COMMENT 'xx' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;

  // View table data

  desc t2;

  desc formatted t2 ;

  // load data into the hive table

  load data local inpath '/home/centos/customers.txt' into table t2; // local file upload

  load data inpath '/user/centos/customers.txt' [overwrite] into table t2; // move files

  // copy table

  mysql> create table tt as select * from users; // carrying the data and the table structure

  mysql> create table tt like users; // no data, only the table structure

  hive>create table tt as select * from users ;

  hive>create table tt like users ;

  // count () query to turn into mr

  $hive>select count(*) from t2 ;

  $hive>select id,name from t2 ;

  $hive>select * from t2 order by id desc ; //MR

  // Enable / Disable table

  ALTER TABLE t2 ENABLE NO_DROP; // can not be deleted

  ALTER TABLE t2 DISABLE NO_DROP; // Can Delete

  // partition table, one of the means to optimize, control of the search data from the directory level.

  // Create the partition table.

  CREATE TABLE t3(id int,name string,age int) PARTITIONED BY (Year INT, Month INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

  // explicit partition information table

  SHOW PARTITIONS t3;

  // add a partition, create a directory

  alter table t3 add partition (year=2014, month=12);

  // delete the partition

  ALTER TABLE employee_partitioned DROP IF EXISTS PARTITION (year=2014, month=11);

  // partition structure

  hive>/user/hive/warehouse/mydb2.db/t3/year=2014/month=11

  hive>/user/hive/warehouse/mydb2.db/t3/year=2014/month=12

  // loading data into a partitioned table

  load data local inpath '/home/centos/customers.txt' into table t3 partition(year=2014,month=11);

  // Create a bucket list

  CREATE TABLE t4(id int,name string,age int) CLUSTERED BY (id) INTO 3 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

  // Load the data will not be sub-barrel operation

  load data local inpath '/home/centos/customers.txt' into table t4 ;

  // query the data inserted into the table t3 t4 in.

  insert into t4 select id,name,age from t3 ;

  How barrel // number of tables set?

  // the amount of assessment data, to ensure that the amount of 2 times the size of each data block of the tub.

  // join query

  CREATE TABLE customers(id int,name string,age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

  CREATE TABLE orders(id int,orderno string,price float,cid int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

  // load data into tables

  // inner join query

  select a.*,b.* from customers a , orders b where a.id = b.cid ;

  // left outside

  select a.*,b.* from customers a left outer join orders b on a.id = b.cid ;

  select a.*,b.* from customers a right outer join orders b on a.id = b.cid ;

  select a.*,b.* from customers a full outer join orders b on a.id = b.cid ;

  // explode, burst, the table generating function.

  // use the word hive achieve statistical

  // 1. To build the table

  CREATE TABLE doc(line string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

  Five, Hive queries

  Is SQL: select ---> MapReduce

 

Guess you like

Origin www.cnblogs.com/djw12333/p/11114571.html