Hive-核心概念和架构原理（第一天）

一、核心概念

hive是一个基于Hadoop的数据仓库工具，可以将结构化的数据映射成一张表，并提供SQL查询的功能。
HIVE与RMDBS数据库系统的区别

对比项	HIVE	RMDBS
查询语言	HQL	SQL
数据存储	HDFS	本地磁盘
执行器	MapReduce	executor
数据插入	批处理/单条处理	单条处理/批处理
数据操作	覆盖/追加	行级增删改查
数据规模	大	小
执行延迟	高	小
分区	支持	支持
索引	支持	支持
扩展性	高	低
数据加载（校验）	读时模式	写时模式
应用场景	海量数据查询	实时查询

hive只适合做海量数据的离线数据统计分析，即数据仓库
**优点：**使用SQL操作简单，可实现自定义函数。
**缺点：**不支持记录的增删改查、查询时延高，不支持事务。

二、架构原理

hive架构原理图
元数据metastore：使用derby或MySQL数据库存储。
数据：使用hadoop的HDFS文件系统存储。
启动hive工具进入shell交互模式

1、命令hive
2、启动server服务：nohup hive --server hiveserver2 &
命令beeline
！connect jdbc:hive2://node1:10000

三、hive命令

hive -e "sql语句"//执行sql语句
hive -f xxx.sql//执行sql文件

四、数据类型

1、基本类型

类型名称	描述
int	4字节有符号整数
bigint	8字节有符号整数
float	4字节单精度浮点数
double	8字节双精度浮点数
string	字符串（不设定长度）
varchar	字符串（1-65355）
timestamp	时间戳
date	日期

2、复合类型

类型名称	描述	举例
array	有序字段，字段类型必须相同	Array（元素1，元素2）获取值array[索引号]，索引号从0开始
map	无序键值对	Map(k1,b1,k2,v2) 获取值map[‘key’]
struct	命名的字段，字段类型可以不相同	Struct(a:type1,b:type2,c:type3) 获取值字段名.a 字段名.b 字段名.c

3、数据类型转换
**隐式类型转换：**系统自动实现类型转换
**手动类型转换：**cast（字段 as 类型），如果类型转换失败，返回空值null。

五、DDL操作

1、数据库DDL操作

create database db_hive;//创建数据库
create database if not exists db_hive;

show databases;//显示数据库
show databases like "db*";

desc database db_hive;//查看数据库详细信息
desc database extented db_hive;

use db_hive;//切换数据库

drop database db_hive;//删除数据库
drop database if exists db_hive;

drop database if exists db_hive cascade;//强制删除

2、数据表DDL操作

create [external] table [if not exists] table_name
[(col_name data_type [comment col_name])]
[comment table_comment]
[partitioned by (col_name data_type [comment col_name],...)]//分区
[clustered by [col_name,col_name,...]]//分桶
[sorted by (col_name [asc][dasc],...) into num_buckets BUCKETS]//排序
[row format row_format]row format delimited fields terminated by "分隔符"
[collection items terminated by "分隔符"]
[map keys terminated by "分隔符"]
[lines terminated by "分隔符"]
[stored as file_format]
[location hdfs_path]

file_format: sequencefile（序列化） textfile（纯文本） rcfile（列式存储）

3、创建内部表

方式一

create table if not exists student(
id int,
name string
)
row format delimited fields terminated by "\t"
stored as textfile;

方式二

create table if not exists student1 as select * from student;//表结构和数据

方式三

create table if not exists student2 like student;//只有表结构，没有数据

查询表的类型

desc formatted student;

4、创建外部表

create external table if not exists teacher(
id int,
name string,
age int
)
row format delimited fields terminated by "\t"
location "/bigdata/hive;

5、内部表和外部表相互转换

alter table student set tblpropertied('external'='true');//内部表转外部表
alter table student set tblpropertied('external'='false');//外部表转内部表

6、内部表和外部表的区别
建表语法的不同：external关键字
删表操作：内部表删除表后，数据也会被删除；外部表删除表后，数据不会被删除，还可以通过建表指定存放位置和格式可以进行恢复。

六、shell-hive交互窗口

1、操作本地命令：！ls /
2、操作HDFS：dfs -ls /

果不其燃

发布了44 篇原创文章 · 获赞 0 · 访问量 1414

私信关注

Hive-核心概念和架构原理（第一天）

Hive-核心概念和架构原理（第一天）

一、核心概念

二、架构原理

三、hive命令

四、数据类型

五、DDL操作

六、shell-hive交互窗口

猜你喜欢