Hadoop Big Data Development Foundation series: Nine, HiveQL

First, what HiveQL that?

1.HiveQL Overview
Hive is a data warehouse analysis system, we spoke of earlier, it is mainly the SQL query to complete the process by MapReduce
On HiveQL essence is a kind of SQL. Hive to query by content analysis HiveSQL need to make are not familiar with MapReduce users can also easily use SQL to query, aggregate and analyze data.
2.HiveQL and Hive features
(1) with a SQL relational database is slightly different, but the support of the vast majority of statements such as DDL, DML and common aggregate functions join query, the query conditions.
(2) Hive is not suitable for on-line, do not provide real-time query function.
It is most suitable for use in a large number of batch jobs based on immutable data. As HiveQL does not support the update does not support indexing and affairs, and its sub-query and join operations are very limited, because it is dependent on the underlying Hadoop platform that decision, but some of its features is that SQL can not match.
(3) Hive scalable, support Hadoop cluster dynamically adding device, scalable and fault tolerance features
 
Next, a brief introduction to common operations:
 

Second, create tables and delete tables

1. Create a table

    
    
  1. CREATE [① TEMPORARY][② EXTERNAL] TABLE [③ IF NOT EXISTS][database_name] table_name 
  2.     [(col_name data_type)[⑨ COMMENT col_comment],...[constraint_sepecification]]
  3.     [ HOW table_comment]
  4.     [④PARTITIONED BY (col_name data_type)[ COMMENT col_comment],...)]
  5.     [⑤CLUSTERED BY (col_name1,col_name2,...) [⑩SORTED BY (col_name[ ASC| DESC],...)] INTO num_buckets BUCKETS]
  6.     [SKEWED BY (col_name,col_name,...)]
  7.         ON ((col_value,col_value,...),(col_value,col_value,...),...)
  8.         [ STORED AS DIRECTORIES]
  9.     [
  10.     [⑥ ROW FORMAT row_format]
  11.     [⑦ STORED AS file_format]
  12.         | STORED BY '' [ WITH SERDEPROPERTIES(...)]
  13.     ]
  14.     [LOCATION hdfs_path]
  15.     ...... (more parameters can be found in the official documentation)
  16. CREATE [ TEMPORARY][ EXTERNAL] TABLE [ IF NOT EXISTS][database_name] table_name
  17.     ⑧ LIKE existing_table_or_view_name
  18.     [LOCATION hdfs_path]

The following keywords are explained

①TEMPORARY
Create a temporary table, temporary table only take effect once the current session, the table is automatically deleted after the end of the session.
Note: Does not support partition field and create an index.
 
②EXTERNAL
Create an external table, the development path (LOCATION) a pointer to the actual data at the same time construction of the table, create an external table, recording only path to the location where the data, not the data to make any changes. When you delete the table, deletes only the metadata external table, not delete data.
LOCATION use with hdfs_name
 
③IF NOT EXISTS
Create a table with the specified name. If the same table name already exists, an exception is thrown; the user can use the IF NOT EXIST option to ignore this exception.
 
④PARTITIONED BY
A table may have one or more partitions, each partition separately in the presence of a directory.
 
⑤CLUSTERED BY
Points barrel; points barrels for two reasons, the first is more efficient query, the second is more efficient sampling.
Physically, each bucket is a partition table or file, each file corresponds to the output partition of the tub of MapReduce.
 
⑥ROW FORMAT
It is used to set the table created at the time of loading of data, support column separator.
 
⑦STORED AS
[STORED AS file_format]
It is used to specify the file format stored in the hive (mentioned in the previous section has) . Typically binding [ROW FORMAT] used.
 
⑧LIKE
Aid to existing tables, create an empty table .
 
⑨COMMENT
Statement to the effect comments
 
⑩SORTED BY
Designated sorted according to which column.
 
CATS
create table person2 as select * from person;
Insert data while creating a table, generate a new table.
 
2. Delete table
(1) Delete table
[DROP TABLE table_name]
    Including the definition and the associated table deletion object list (rule index, about, triggers, primary keys, and the like). Obviously, once the table is deleted, the table contains all the rows of data will be deleted together.   
(2) truncate table
[TRUNCATE TABLE table_name]
    truncate command is simply deleted all of the data rows in the table. Structure and all of the index table continues to exist, bound to rule on the column, the default value, the binding constraint continues, and the trigger is still active. Until you enter the command to delete a table (as described above).
 

Third, modify the table

1. Rename the table
ALTER TABLE table_name RENAME TO new_table_name;
 
2. Modify the attribute table
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
Notes to Table modify, edit table serde column separator, and other changes to the table Sede
 
3. Modify the partition table:
(1) new partition
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION 指定分区 [LOCATION ‘location1’];
    
    

(2) Rename partition ALERT TABLE (specified into the partition table)

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION 指定分区 [LOCATION ‘location1’];
  
  

(3) automatic recovery partition (the system automatically partition divided according to the directory structure) MSCK

MSCK REPAIR TABLE table_name;
    
    

(4) delete the partition

ALTER TABLE table_name DROP [IF EXISTS] PARTITION 指定分区(例:(y='WT0228',m='201501'));
    
    

4. Modify the column name change colname


    
    
  1. ALTER TABLE table_name [ PARTITION partition_spec] CHANGE [ COLUMN] col_oldname col_newname col_type
  2.     [ HOW col_col_comment];

5. Add columns or column replacement


    
    
  1. ALTER TABLE TABLE_NAME table_name
  2.      ADD| REPLACE COLUMNS(col_name data_type,......)

Fourth, view

The 1.Hive view features:
(1) is only a logical view, not materialized view (i.e. physical table, materialized view data itself is stored)
(2) only query view, can not Load / Insert / Update / Delete data
(3) In view creation time, just to save a metadata when the query view, the view began to execute those corresponding sub-queries;
 
2. Operation of view
(1) create

    
    
  1. CREATE VIEW [ IF NOT EXISTS] [db_name.] 
  2.     AS SELECT ...;

(2) modify

ALTER VIEW [db_name.] view_name SET TBLPROPERTIES table_properties;
    
    

(3) Delete

DROP VIEW [IF NOT EXISTS] [db_name.] view_name;
    
    

V. Index

(1) create

    
    
  1. CREATE INDEX index_name
  2.     ON TABLE base_table_name
  3.     AS index_type;

(2) deleted

DROP INDEX [IF EXISTS] index_name ON table_name;
    
    

(3) Modify

ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;
    
    

Sixth, the display information

1. List all the libraries
SHOW DATABASES ;
 
2. The following table lists all libraries
SHOW TABLES IN database_name;
 
3. List all views
SHOW VIEWS [IN/FROM database_name]
 
Partition Table 4. List
SHOW PARTITION table_name;
 
The display index
SHOW (INDEX|INDEXES) ON table_with_index [ (FROM|IN) db_name]; 
 
6. All column information display list
SHOW COLUMNS ( FROM|IN ) table_name [(FROM|IN) db_name];
 
7. Display Function (custom function) Information
SHOW FUNCTIONS "a.*";
 
The display lock table (added to the table read and write access) information
SHOW LOCKS <table_name>
 

Seven user-defined functions UFD

There are three Hive UDF
1.UDF:
Operating in a single data line, converting one
2.UDAF:
Aggregate function accepts a plurality of input data, generates several output functions
UDAF custom implementation is achieved UDAF class inheritance.
The jar package added to the hive server, create function
3.UDTF:
Row of data into multiple rows using the function
Published 18 original articles · won praise 0 · Views 443

First, what HiveQL that?

Guess you like

Origin blog.csdn.net/weixin_45678149/article/details/104943527