Hive Getting Started Concept

Data WarehouseHive

  • Based on:
    executor Mapreduce
    data storage HDFS
    metadata storage relational database

* Architecture principle

insert image description here

  • Meta
    table definition, table and HDFS mapping
    Metadata includes: table name, field name, field type, associated HDFS file path, etc. Meta information of these databases

  • Data storage
    Data: stored in HDFS

  • The query process
    receives user instructions (SQL), uses its own Driver, combines metadata (MetaStore), translates these instructions into MapReduce, submits them to Hadoop for execution, and finally outputs the results returned by the execution to the user interaction interface.

DDL

library and table operations

Inner Table & Outer Table

  • Management table, internal table
    delete: both metadata and HDFS will be deleted

  • External table
    delete: delete metadata only

* Partition Table

(Multiple files are classified in different directories)

  • Background
    The amount of data in the table is too large, and the storage path is stored in partitions to avoid full scanning when querying.

  • Principle:
    hive partition is to divide the directory. Store data in sub-directories.

  • Common usage:
    According to the date partition, create some date directories, each directory contains the data of the day

Secondary partition

Partition by day and then partition by hour

dynamic partition

bucket table

(A file is split into multiple files for storage)

The single data file of the table and partition is too large.
Carry out finer-grained data range division and bucket the data file

Data import and export


  • Import select data when AS SELECT creates a table
create table 'table_name1' 
as select xxx from 'table_name2'

  • Load data path when creating table in LOCATION
create table 'table_name' (field)
location '/data_path'
  • INSERT Insert data
-- 追加
insert into table 'table_name' values('value');
-- 覆盖
insert overwrite table 'table_name' values('value');


-- 插入到本地 = 导出
insert overwrite local directory 'local_path' select * from 'table_name'
-- 插入到HDFS
insert overwrite directory 'hdfs_path' select * from 'table_name'
  • LOAD load data path
load data [local] inpath '/data_path' [overwrite] into table 'table_name' [partition] 
  • EXPORT, IMPORT

DML

SELECTS and FILTERS
GROUP BY

join

multitable insert
streamin

Guess you like

Origin blog.csdn.net/xyc1211/article/details/128835756