1. 什么是Hive，Hive是用来做什么的

Hive是Facebook开源的
以SQL查询的方式对日志等结构化数据进行多维度进行分析
构建于Hadoop之上的，把HQL查询转换为Hadoop Map Reduce作业，对HDFS上的结构化数据进行分析
因为构建于Hadoop Map Reduce作业之上，因此只适合做离线分析。对分布式数据仓库中的数据进行分析
典型应用场景是日志分析，比如用户购买历史、视频网站的PV、UV以及低成本的数据分析
Hive简单易上手，数据表等元信息保存在独立的存储系统中(比如Mysql)，方便与Pig共享元数据信息

2. Hive基本架构

用户可以通过Hive命令行来访问Hive，也可以通过Web UI、Console UI和JDBC/ODBC访问Hive

Hive的模块组成：

用户接口包括 CLI，JDBC/ODBC，WebUI
元数据存储（metastore）默认存储在自带的数据库derby中，线上使用时一般换为MySQL
驱动器（Driver）解释器、编译器、优化器、执行器
Hadoop 用 MapReduce 进行计算，用 HDFS 进行存储

3. Hive部署

3.1 Hive测试环境部署

3.2 Hive生产环境部署

在生产环境中，Hive要安装在Hadoop集群的一台DataNode上，通常Hive会启动一个HiveServer

4.从metastore看Hive部署架构

Metastore是一个扩展的hive元数据服务，可以提供多客户端使用.使用hive命令可以看到metastore是hive提供的命令之一

[hadoop@hadoop bin]$ ./hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version 
Parameters parsed:
  --auxpath : Auxillary jars 
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
For help on a particular service:
  ./hive --service serviceName --help
Debug help:  ./hive --debug --help

5.如何使用metastore服务？

6. hwi服务

启动hwi服务：

[hadoop@hadoop bin]$ ./hive --service hwi
ls: cannot access /home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-hwi-*.war: No such file or directory
15/03/09 21:02:32 INFO hwi.HWIServer: HWI is starting up
15/03/09 21:02:36 FATAL hwi.HWIServer: HWI WAR file not found at /home/hadoop/software/apache-hive-0.14.0-bin/${env:HWI_WAR_FILE}

访问地址：http://localhost:9999/hwi

7. Hive数据库结构

一个数据库有多个表，一个表有多个分区(这个分区类似于MySQL的表分区)，一个分区有多个Bucket（桶的概念是更有利于HQL的查询，将数据的查询更加细化，类似于索引）。

Skewed Keys：可能导致数据倾斜的键，显式指定这些Key，Hive会做一些特殊的处理

【Hive二】Hive架构