Hive common operation statement--create table statement

One: hive table creation statement

create table page_view
(
page_id bigint comment 'Page ID',
page_name string comment 'page name',
page_url string comment 'page URL'
)
comment 'page view'
partitioned by (ds string comment 'The current time, for the partition field')
row format delimited
stored as rcfile
location '/user/hive/test';

 
Here we need to talk about the stored as keyword. Hive currently supports three methods:

1: It is the most common textfile, the data is not compressed, the disk overhead is large, and the parsing overhead is also large

2: SquenceFIle, a binary API method provided by hadoop api , which is easy to use, divisible, and compressible.

3: The combination of rcfile row and column storage, it will first divide the data into blocks to ensure that the same record is on one block, avoiding the need to read multiple blocks when reading a record. Secondly, block data is stored in columnar format, which is convenient for data storage and fast column access.

Because RCFILE adopts the columnar storage, the loading overhead is high, but it has a good query response and a better compression ratio.

If the created table needs to be partitioned, the statement is as follows:

Here partitioned by indicates what field to divide by, usually by time

create table test_ds
(
  id int comment 'User ID',
  name string comment 'username'
)
comment 'Test partition table'
partitioned by(ds string comment 'time partition field')
clustered by(id) sorted by(name) into 32 buckets
row format delimited
fields terminated by '\t'
stored as rcfile;

 

If some fields need to be clustered and stored to facilitate sampling of hive cluster columns, the SQL should be written like this:

create table test_ds
(
  id int comment 'User ID',
  name string comment 'username'
)
comment 'Test partition table'
partitioned by(ds string comment 'time partition field')
clustered by(id) sorted by(name) into 32 buckets	
row format delimited
fields terminated by '\t'
stored as rcfile;

 This means that the ids are sorted by name, clustered and aggregated, and then partitioned into 32 hash buckets.

If you want to change the location of the table in hdfs, you should explicitly specify it using the location field:

create table test_another_location
(
   id int,
   name string,
   url string
)
comment 'Test another location'
row format delimited
fields terminated by '\t'
stored as textfile
location '/tmp/test_location';

 Where /tmp/test_location does not need to be created first

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327067708&siteId=291194637