Hadoop - Hive data operation

A, Hive data type

  1. The basic data types

    

 

     From the above table we see that hive does not support date type, in the hive where the date is represented by a string, and commonly used date format conversion operation is operated by a custom function.

    hive is in java, hive where the basic data types and basic data types java is also one correspondence , in addition to a string type. Signed integer type: TINYINT, SMALLINT, INT and BIGINT are equivalent to the java byte, short, int, and type of atoms long, they are 1-byte, 2 bytes, and 4 bytes of 8-byte signed integer . Hive and floating point data types FLOAT DOUBLE, corresponding to the basic type float and double types of java. The hive and the type BOOLEAN java equivalent of basic data types boolean.

    For hive of type String equivalent of varchar type of database , which is a variable of type string, but it can not declare which can hold up to the number of characters, in theory, it can store the number of characters to 2GB.

    Hive supports basic types of conversions basic types can be converted into the low byte of the high byte type , e.g. TINYINT, SMALLINT, INT can be converted to FLOAT, and all integer types, FLOAT and STRING type can be converted to DOUBLE, these transformation can be transformed into consideration java from the type of language, because the hive is written in java. Of course, also supports the high byte of the low byte type conversion type, which requires the use of custom function of the CAST hive

  2. Complex Data Types

    Complex data types including arrays ( ARRAY), map (MAP) and the structure (STRUCT / or may be understood as an object), shown in the following table:

    

 

  3. Text character encoding

    Text Format file, there is no doubt that for users, should be very familiar with the text file in comma or tab-separated parts, as long as the user needs, Hive support these file formats. However, both file formats have a common drawback that the user needs to text files that do not require extra care of as delimiters commas or tabs, and therefore, Hive uses several default control characters, these characters rarely appear in the field values. Hive belonging field to represent replace the default character delimiter.

    

 Second, the operation of the database

  1. Create a database

the Create  Database database name;

 

  2. Check database

// View all databases 
SHOW DATABASES;
 // check the specified database 
SHOW DATABASE database name;
 // use like fuzzy queries, such as: the beginning of the hive database 
SHOW DATABASE the LIKE 'like_ *' ;
  // to see a detailed description of the database 
 desc database hive_01;

 

 

  3. Use Database

use database name;

 

  4. Delete database

 // database name; this deletion, need corresponding table in the database after deleting all you can delete the database 
drop database database name;
 // force delete, delete all the tables themselves 
drop database database name cascade;

 

Third, the operation of the data table

  1. Create a data table

    

// Create internal table 
Create Table inner_table ( 
    ID int , 
    name String, 
    Hobby Array <String> , 
    address Map <String, String> 
) Row DELIMITED the format // fixed format 
Fields terminated by ',' // represents division field 
collection items terminated by '-' // striped array 
Map terminated by Keys ':'; // segmentation set

 

 

  Precautions:

    Data Sheet I created a symbol divided according to each according to their needs rewriting

            

   Download Data

// After inpath Linux is the path to load the file overwrite represent all data before emptying loaded from the write and 
load data local inpath '/hivetest/person.txt' overwrite into table person;
// create the external table   
 // create an external table requires the use of external keyword 
the Create the Table inner_table external ( 
    the above mentioned id int , 
    name String, 
    Hobby Array <String> , 
    address the Map <String, String> 
) Row format DELIMITED // fixed format 
fields terminated by ',' // represents division field 
collection items terminated by '-' // striped array 
Map terminated by Keys ':'; // segmentation set 
LOCATION '/ the outter / Data' // this path is the path existing in HDFS

  2. The difference between the inner and outer tables

    Although only one key difference between the internal and outer tables, but nature is completely different, the internal table exists hive, if you delete an internal table can not be restored, after the outer table is deleted, on the path to the external table hdfs created back to leave a file when creating the same type of field again right path to the original path you do not need to import data, and then automatically create a table exists, if in the new table, as long as the path to local memory in the file, then the file will be created along with matching table type, an error table.

    If speak directly into the table position of the outer table but not the metadata table in the hive, the through  MSCK the REPAIR TABLE table_name (to write metadata information to the HDFS Metastore);

  3. Review the table

// Check the entire contents of the table of 
the SELECT * from table name;
 // Check the table structure 
desc formatted table name;
 // Check the table in the library 
Show the Tables;
 // will table the results of inquiries into the new table 
create table table name as select query words;
 // create the same table no data structure 
cREATE tABLE new table 
table LIKE to be copied;

 

 

 

   4. Delete table

  drop table 表名;

 

  Table 5. Modify

@ 1: Rename table 
alter table source table name rename to new name;
 // 2: modification information table columns 
alter table change table name table column names new table column name table column names new type; 
 // 3: new additional table (disposable insert multiple columns) 
ALTER table table add columns ( 
   column name type a name 1, 
    column 2 column name type name 2 
) 
// 4: remove the column can not be deleted and replaced in the hive in nature Therefore, with a replacement to achieve the effect of deleting the 
ALTER table person_info replace Columns ( 
    ID String, 
    name String, 
    Hobby Array <String> , 
    address Map <String, String> 
) 
// . 5: substitution table storage format rcfile -> orcfile 
ALTER table the sET fileformat SequenceFile T1;
 // 6: view the construction of the table statement to 
show create table table name;
 // 7: comment table settings
Table person_info SET ALTER 
  tblproperties ( "Comment" = "Person Detail" );
 // . 8: modifying a table separator 
ALTER Table person_info SET serdeproperties ( 'colelction.delim' = '~' ;)
 // . 9: serde_class setting table serializable class 
Create Table T1 (ID int , String name, Age int ); 
ALTER Table T1 SET SerDe   'org.apache.hadoop.hive.serde2.RegexSerDe' 
with serdeproperties ( "input.regex" = "ID = (. *), name = (. * ), age = (. *) ")

   Note: It should be noted that when the type of modification type compatibility follows:

 

 

Guess you like

Origin www.cnblogs.com/wuxuewei/p/11469037.html