Hive basic table creation

1. Special data types of Hive

        Hive is generally similar to mysql, but there are more data types - collection data types:

        ARRAY: The data stored are of the same type

        MAP: key-value pairs with the same type

        STRUCT: encapsulates a set of fields

type Format definition
array ['aaa','bbb','bbb'] ARRAY<string>
map {'A':'Apex','B':'Bee'} MAP<string,string>
struct {'aaa',666} STRUCT<fruit:string,weight:int>

2. Create a static table

        Statement to create a static table:

create table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

row format delimited    : delimiter setting start statement

fields terminated by '|'    : set the separator between fields to "|"
collection items terminated by ','  : set the separator between each item of a complex type (array, struct) field to ","
map keys terminated by ':'    : set the separator between the key value of a complex type (Map) field to ":"
lines terminated by '\n';   : set the separator between lines to "\n "

        Import data from a local file into a table:

 load data local inpath '/opt/employee.txt' into table employee;

        Import data from a server file into a table:

load data inpath '/employee.txt' into table employee;

        Overwrite data from a file into a table:

load data inpath '/employee.txt' overwrite  into table employee;

3. Create a partition table

        The statement to create a partitioned table:

create table employee2(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int) --以age作为分区依据
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

        Import data into partitioned table:

load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);

        View partition table partition information:

show partitions employee2;

4. Internal and external tables

Data tables are divided into internal tables and external tables

Internal table (management table)

  • In HDFS, it is a subfolder under the database directory to which it belongs
  • The data is fully managed by Hive, deleting the table (metadata) will delete the data

External Tables

  • The data is saved in the HDFS path at the specified location
  • Hive does not fully manage data, dropping tables (metadata) does not delete data

The two employee tables created above are both internal tables

        Statement to create an external table:


create external table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';

To create an external table, add an external after create    

location '/tmp/hivedata/employee'; means: specify the data storage path (HDFS)    

Guess you like

Origin blog.csdn.net/Alcaibur/article/details/129188080