1. Special data types of Hive
Hive is generally similar to mysql, but there are more data types - collection data types:
ARRAY: The data stored are of the same type
MAP: key-value pairs with the same type
STRUCT: encapsulates a set of fields
type | Format | definition |
array | ['aaa','bbb','bbb'] | ARRAY<string> |
map | {'A':'Apex','B':'Bee'} | MAP<string,string> |
struct | {'aaa',666} | STRUCT<fruit:string,weight:int> |
2. Create a static table
Statement to create a static table:
create table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
row format delimited : delimiter setting start statement
fields terminated by '|' : set the separator between fields to "|"
collection items terminated by ',' : set the separator between each item of a complex type (array, struct) field to ","
map keys terminated by ':' : set the separator between the key value of a complex type (Map) field to ":"
lines terminated by '\n'; : set the separator between lines to "\n "
Import data from a local file into a table:
load data local inpath '/opt/employee.txt' into table employee;
Import data from a server file into a table:
load data inpath '/employee.txt' into table employee;
Overwrite data from a file into a table:
load data inpath '/employee.txt' overwrite into table employee;
3. Create a partition table
The statement to create a partitioned table:
create table employee2(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
partitioned by (age int) --以age作为分区依据
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
Import data into partitioned table:
load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);
View partition table partition information:
show partitions employee2;
4. Internal and external tables
Data tables are divided into internal tables and external tables
Internal table (management table)
- In HDFS, it is a subfolder under the database directory to which it belongs
- The data is fully managed by Hive, deleting the table (metadata) will delete the data
External Tables
- The data is saved in the HDFS path at the specified location
- Hive does not fully manage data, dropping tables (metadata) does not delete data
The two employee tables created above are both internal tables
Statement to create an external table:
create external table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';
To create an external table, add an external after create
location '/tmp/hivedata/employee'; means: specify the data storage path (HDFS)