Article Directory
Data WarehouseHive
- Based on:
executor Mapreduce
data storage HDFS
metadata storage relational database
* Architecture principle
-
Meta
table definition, table and HDFS mapping
Metadata includes: table name, field name, field type, associated HDFS file path, etc. Meta information of these databases -
Data storage
Data: stored in HDFS -
The query process
receives user instructions (SQL), uses its own Driver, combines metadata (MetaStore), translates these instructions into MapReduce, submits them to Hadoop for execution, and finally outputs the results returned by the execution to the user interaction interface.
DDL
library and table operations
Inner Table & Outer Table
-
Management table, internal table
delete: both metadata and HDFS will be deleted -
External table
delete: delete metadata only
* Partition Table
(Multiple files are classified in different directories)
-
Background
The amount of data in the table is too large, and the storage path is stored in partitions to avoid full scanning when querying. -
Principle:
hive partition is to divide the directory. Store data in sub-directories. -
Common usage:
According to the date partition, create some date directories, each directory contains the data of the day
Secondary partition
Partition by day and then partition by hour
dynamic partition
bucket table
(A file is split into multiple files for storage)
The single data file of the table and partition is too large.
Carry out finer-grained data range division and bucket the data file
Data import and export
Import select data when AS SELECT creates a table
create table 'table_name1'
as select xxx from 'table_name2'
Load data path when creating table in LOCATION
create table 'table_name' (field)
location '/data_path'
- INSERT Insert data
-- 追加
insert into table 'table_name' values('value');
-- 覆盖
insert overwrite table 'table_name' values('value');
-- 插入到本地 = 导出
insert overwrite local directory 'local_path' select * from 'table_name'
-- 插入到HDFS
insert overwrite directory 'hdfs_path' select * from 'table_name'
- LOAD load data path
load data [local] inpath '/data_path' [overwrite] into table 'table_name' [partition]
- EXPORT, IMPORT
DML
SELECTS and FILTERS
GROUP BY
join
multitable insert
streamin