HiveQL: Data Definition

HiveQL: Data Definition

  1. Hive in the database

    Hive database is essentially a table of contents or namespace

    Building a database:

    hive> CREATE DATABASE [IF NOT EXISTS] finacials; 

    Hive will create a directory for each database table in the database will be stored in a subdirectory of the database directory. Exception: default database - "default database does not have its own directory.

    After the database is located in the directory specified property hive.metastore.warehouse.dir top-level directory

    Database files ending in .db

    举例,
    创建数据库financials ==》 hive对应创建目录/user/hive/warehouse/financials.db

    Modify database default location

    hive > CREATE DATABASE financials
         > LOCATION '/my/preferred/directory';

    Increase descriptive information, and queries

    hive > CREATE DATABASE financials 
         > COMMENT 'Holds all financial tables';
      
    hive > DESCRIBE DATABASE financials;
    financials Holds all financial tables
    hdfs://master-server/user/hive/warehouse/financials.db

    master-server represents the URL permission - "master node" (Namenode) + optional port number

    Hive will use the configuration items fs.default.name Hadoop as a master configuration file corresponding to the server name and port number, this configuration file can be found under $ HADOOP_HOME / conf directory

    hdfs:///user/hive/warehouse/financials.db和hdfs://master-server/user/hive/warehouse/financials.db等价

    Wherein the master-server is a DNS name of the master node and optional port number

    Remarks:

    For completeness, when the user specifies a relative path, and for HDFS Hive relative path are the root directory into the specified distributed file system. However, if the user is then performed in local mode, then the current working directory will be relatively local directory's parent directory.

    For portability, the server normally omitted and port number information, and only relates to the other distributed file system will indicate that the message instance.

    Increase in key attribute information for the data

    hive > CREATE DATABASE financials
         > WITH DBPROPERTIES ('creator'='Mark', 'date'='2012-01-02');
    
    hive > DESCRIBE DATABASE financials;
    financials hdfs://master-server/user/hive/warehouse/financials.db
    
    hive > DESCRIBE DATABASE EXTENDED financials;
    financials hdfs://master-server/user/hive/warehouse/financials.db
    {date=2012-01-02, creator=Mark};

    USE -> Switch user is currently working database

    hive > USE financials;

    Remarks:

    No command allows users to view the current job database, no embedded database concepts - "reusable USE

    Delete Database

    hive > DROP DATABASE [IF EXISTS] financials;

    Remarks:

    By default, Hive does not allow users to delete a database containing the table. Either delete user tables in the database, and then delete the database, either in the final surface data commands with the keyword CASCADE, so Hive free to delete the database tables.

    hive > DROP DATABASE IF EXISTS financials CASCADE;

    Use the keyword RESTRICT consistent with the default situation. If a database is deleted, the corresponding directory will also be deleted.

    Modify the database (there is no way to delete or "reset" database property)

    hive > ALTER DATABASE financials SET DBPROPETIES ('edited-by'='Joe')
  2. table

    Create a table

    CREATE TABLE [IF NOT EXISTS] xxx_db.xxx(...)
    COMMENT ''
    TBLPROPERTIES('')
    LOCATION ' ' ;

    In most cases, TBLPROPERTIES main role is key - value pair format for the table add a description of additional documentation.

    Hive table will automatically add two attributes: one is last_modified_by, save for the final table to modify the user's user name; one is last_modified_time, save the last modification time in seconds a new era

    By default, the directory will be created Hive always placed after the table of the database directory table belongs. (Pl: /user/hive/warehouse/mydb.db/employees)

    default database was an accident, which is in / user / hive / warehouse no directory, which is located directly table / user / hive / warehouse after directory

    Copy Tabular

    CREATE TABLE IF NOT EXISTS XXX_db.XXX2 LIKE XXX_db.XXX;

    Display table information

    SHOW TABLES [IN DB];
    DESCRIBE EXTENDED|FORMATTED XXX_db.XXX[.colXXX];

    Management table and the outer table

    Management Table -> Hive control the data lifecycle, by default Hive these tables will be stored in the configuration items hive.metastore.warehouse.dir defined directory subdirectories. [Pl: Delete a management table, Hive will delete data in the table]

    Since the read mode, Hive virtually no ability to manage user management table to

    External table ->

    CREATE EXTERNEL TABLE [IF NOT EXISTS] ...

    Keywords EXTERNEL told Hive table is external, because the table is external, Hive is not considered in full possession of this data. So when you delete the table does not delete this data, metadata description information will be deleted.

    Partition

    Partition management table with an external partition table

    Hive create better reflect the structure of the partition pl subdirectory:

    hdfs://master_server/user/hive/warehouse/mydb.db/employees

    Remarks:

    Execute a query that contains all the partitions may trigger a huge task MapReduce, Hive can be set to "strict" mode, so if the partition query where clause is not added to the partition filter, would be prohibited by submitting this task!

    Queries child partition

    SHOW PARTITIONS EMPLOYEES PARTITIONS (COUNTRY='CN');

    Loading data to create a partition

    LOAD DATA LOCAL INPATH '${ENV:HOME}/CALIFORNIA-EMPLOYEES'
    INTO TABLE EMPLOYEES
    PARTITION(COUNTRY='CN', STATE='BK');

    Hive do not care about a partition corresponds to partition the directory exists or whether there are files in the directory partition. If the partition directory does not exist or is no directory partition file, then for this filter partition query returned no results.

    Table stored display format specified by STORED AS, while the user can also specify a variety of separators create a table.

    TEXTFILE means that all fields are letters, numbers, character encodings, including international character sets.

    Use TEXTFILE, each line is considered as a separate record.

    SEQUENCEFILE and RCFILE uses binary coding and compression to optimize disk space usage and IO bandwidth performance.

    Hive inputformat using a stream object is divided into an input record, and then using a recording outputformat object formatted output stream, and then use a SerDe in reading recorded data parsed into columns, when writing recording data encoded into the column.

    Hive WITH SERDEPROPERTIES provide features that allow users to pass configuration information SerDe.

    Delete table

    DROP TABLE IF EXISTS employees;

    Remarks:

    Hadoop Recycle Bin function

    If the user turns on the feature (off by default), the data will be deleted after the transfer to the next .Trash user directory under the user root directory in a distributed file system, HDFS is in the \ user \ $ USER \ .Trash table of Contents. Fs.trash.interval value can be configured for a reasonably positive integer, the value is between time "Trash checkpoint" interval, in minutes. (Version does not necessarily support) mistakenly deleted data can rebuild the table with partitions, accidentally deleted files from the .Trash folder to the correct file directory down to re-store data.

    Modify table

    ATLER TALBE --仅仅修改表元数据,

    Rename Table

    ALTER TABLE xxx RENAME TO  new_xxx

    Additions and deletions partition

    ALTER TABLE XXX ADD IF NOT EXISTS PARTITION(, ,) LOCATION '//';
    ALTER TABLE XXX PARTITION(, ,) SET LOCATION '//';  --移动分区路径,不移走数据,也不删除旧数据
    ALTER TABLE XXX DROP IF EXISTS PARTITION(, ,);

    Modify table properties

    ALTER TABLE XXX SET TBLPROPERTIES(  ); 

    Modify the storage properties

    ALTER TABLE PARTITION(, ,) SET FILEFORMAT SEQUENCEFILE;

    Remarks:

    SERDEPROPERTIES properties such SerDe various implementations can allow the user to customize

    Column operations

    Modify column information

    ALTER TABLE xxx 
    CHANGE COLUMN xxx XXXX 
    COMMENT '--------'
    AFTER XXXXX

    Increase Column

    ALTER TABLE xxx ADD COLUMNS (, ,);

    Replace / Remove Columns

    ALTER TABLE XXX REPLACE COLUMNS(, , ,);
    ALTER TABLE ... TOUCH .... -- 钩子
    ALTER TABLE ... ARCHIVE(/UNARCHIVE) PARTITION( , ,) -- 将分区内的文件打成一个Hadoop压缩包
    ALTER TABLE ... ENABLE(/DISABLE) NO_DROP/OFF_LINE; --防止分区删除或被查询

Guess you like

Origin www.cnblogs.com/ganshuoos/p/11851512.html