1. Create a database
1) create a database, the default database storage path on HDFS is /user/hive/warehouse/*.db.
hive > create database db_hive;
2) Avoid the database to be created already exists errors, increase if not exists judge. ( Standard wording )
hive > create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
hive > create database if not exists db_hive;
3) create a database, specify the location database stored on HDFS
hive > create database db_hive2 location '/db_hive2.db';
2, query the database
2.1 Display Database
1. Display Database
hive> show databases;
2. Filter the display of database query
hive> show databases like 'db_hive*';
OK
db_hive
db_hive_1
2.2 View database details
1. Display database information
hive> desc database db_hive;
OK
db_hive hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db atguiguUSER
2. Display database details, extended
hive> desc database extended db_hive;
OK
db_hive hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db atguiguUSER
40.3.3 切换当前数据库
hive (default)> use db_hive;
2 .3 switching current database
hive > use db_hive;
3, modify the database
Users can use the ALTER DATABASE command to set the key DBPROPERTIES a database - the value of an attribute value, attribute information described in this database. Other metadata information database is unalterable, including the name and location of the database directory database is located. Modify the database currently in use, quit using
hive (default)> alter database db_hive set dbproperties('createtime'='20170830');
In the hive see the changes result in
hive> desc database extended db_hive;
db_name comment location owner_name owner_type parameters
db_hive hdfs://master:8020/user/hive/warehouse/db_hive.db root USER {createtime=20170830}
4, delete the database
1. Delete empty database
hive>drop database db_hive2;
2. If you delete the database does not exist, the best use of the database to determine whether there is if exists
hive> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive> drop database if exists db_hive2;
3. If the database is not empty, you can use cascade command, forced to delete
hive> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
hive> drop database db_hive cascade;
5. Create a table
1. To build the table syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
2. Field explanations
(1) CREATE TABLE creates a table name is specified. If the same table name already exists, an exception is thrown; the user can use IF NOT EXISTS option to ignore this exception.
(2) EXTERNAL keyword allows the user to create an external table, while construction of the table to specify a path to the actual data points (LOCATION), when Hive create an internal table, will move to the path of the data warehouse data points; if you create an external table, only records where the data path, the location does not make any changes to the data . Delete table, the metadata of internal tables and data will be deleted together, and external table only remove metadata, do not delete the data.
(3) COMMENT: add comments to tables and columns.
(4) PARTITIONED BY create a partition table
(5) CLUSTERED BY create a sub-bucket table
(6) SORTED BY not used
(7)ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
When users can build custom SerDe table or use the built-in SerDe. If you do not specify ROW FORMAT or ROW FORMAT DELIMITED, will use its own SerDe. In the construction of the table, the user also needs to specify the column of the table, while users listed in the specified table also specify custom SerDe, Hive table determined by a particular data string SerDe.
SerDe is Serialize / Deserilize short, intended for serialization and deserialization.
(8) STORED AS designated storage file type
Common store file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (columnar storage format)
If the data file is plain text, you can use STORED AS TEXTFILE. If you need to compress data using STORED AS SEQUENCEFILE.
(9) LOCATION: Specifies the location table is stored in the HDFS.
(10) LIKE allows the user to copy an existing table structure, but does not copy the data.
5.1 management table
1. theory
Tables are created by default so-called management table, sometimes also referred to as an internal table . Because of this table, Hive will be (more or less) controls the data life cycle. Hive these tables will default data stored in the subdirectory of a configuration item hive.metastore.warehouse.dir (e.g., / user / hive / warehouse) as defined above. When we When you delete a management table, Hive will also delete the data in the table. Management is not suitable for table and other tools to share data.
2. Case practical operation
(1) Create a general table
create table if not exists student2(
id int, name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/student2';
(2) create a table based on query results (the results of the query will be added to the table in the newly created)
create table if not exists student3 as select id, name from student;
(3) Create a table based on an existing table structure
create table if not exists student4 like student;
Type (4) look-up table
hive > desc formatted student2;
Table Type: MANAGED_TABLE
5.2 External Table
1. theory
Because the table is the outer table, so that its wholly owned Hive is not this data. Delete the table does not delete this data, but the metadata description of the information will be deleted.
2. Management and outer tables of usage scenarios
The collected daily website logs regularly flow into HDFS text file. On the basis of doing the external table (original log table) on a large number of statistical analysis , an intermediate used in the table, using internal tables stored in result table , data table SELECT + INSERT into the interior.
3. Case practical operation
Creating departments and employees are outside tables, and import the data in the table.
(1) the raw data
dept.txt
10 ACCOUNTING 1700
20 RESEARCH 1800
30 SALES 1900
40 OPERATIONS 1700
emp.txt
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
(2) construction of the table statement
Creating a Department table
create external table if not exists default.dept(
deptno int,
dname string,
loc int
)
row format delimited fields terminated by '\t';
Create Employee table
create external table if not exists default.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t';
(3) See table created
hive > show tables;
OK
tab_name
dept
emp
(4) into the table data to an external
Import Data
hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept;
hive > load data local inpath '/opt/module/datas/emp.txt' into table default.emp;
search result
hive > select * from emp;
hive > select * from dept;
(5) Display formatted data
hive > desc formatted dept;
Table Type: EXTERNAL_TABLE
5.3 management table and the outer table interchangeable
Can only use single quotes, strictly case-sensitive, if not in full compliance, it will only take effect without adding kv
Type (1) lookup table
hive > desc formatted student2;
Table Type: MANAGED_TABLE
(2) modify the internal tables outer table student2
alter table student2 set tblproperties('EXTERNAL'='TRUE');
Type (3) look-up table
hive > desc formatted student2;
Table Type: EXTERNAL_TABLE
(4) modify the internal tables outer table student2
alter table student2 set tblproperties('EXTERNAL'='FALSE');
Type (5) look-up table
hive > desc formatted student2;
Table Type: MANAGED_TABLE
Note: ( 'the EXTERNAL' = ' TRUE ') and ( 'EXTERNAL' = 'FALSE' ) fixed wording, case sensitive!
6, partition table
Partition table is actually corresponding to a separate file on the file system HDFS folder, the folder is the partition of all data files. Hive partitions is subdirectory , the need for a large data set into smaller data sets according to the service. When a query query selects the specified partition required by WHERE clause expression, such query efficiency will improve a lot.
6.1 partition table basic operations
1. The introduction of the partition table (the need for log management according to date)
/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log
2. Create a partition table syntax
hive > create table dept_partition(
deptno int, dname string, loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';
3. Loading data into a partitioned table
hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');
hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708');
hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707’);
4. Query data partition table
Single partition query
hive > select * from dept_partition where month='201709';
Multi-partition joint inquiry union (sort) or in three ways
hive > select * from dept_partition where month='201709'
union
select * from dept_partition where month='201708'
union
select * from dept_partition where month='201707';
_u3.deptno _u3.dname _u3.loc _u3.month
10 ACCOUNTING NEW YORK 201707
10 ACCOUNTING NEW YORK 201708
10 ACCOUNTING NEW YORK 201709
20 RESEARCH DALLAS 201707
20 RESEARCH DALLAS 201708
20 RESEARCH DALLAS 201709
30 SALES CHICAGO 201707
30 SALES CHICAGO 201708
30 SALES CHICAGO 201709
40 OPERATIONS BOSTON 201707
40 OPERATIONS BOSTON 201708
40 OPERATIONS BOSTON 201709
5. Add District
Create a single partition
hive > alter table dept_partition add partition(month='201706') ;
Create multiple partitions separated by spaces
hive > alter table dept_partition add partition(month='201705') partition(month='201704');
6. Delete partition
To delete a single partition
hive > alter table dept_partition drop partition (month='201704');
Delete multiple partitions separated by commas
hive > alter table dept_partition drop partition (month='201705'), partition (month='201706');
7. View the partition table how many partitions
hive > show partitions dept_partition;
8. View the partition table structure
hive > desc formatted dept_partition;
# Partition Information
# col_name data_type comment
month string
6.2 Partition Table Notes
1. Creating a Secondary Partition Table
create table dept_partition2(
deptno int, dname string, loc string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t';
2. Normal loading data
(1) loading data into the secondary partition table
hive > load data local inpath '/opt/module/datas/dept.txt' into table
default.dept_partition2 partition(month='201709', day='13');
(2) partitioned data queries
hive > select * from dept_partition2 where month='201709' and day='13';
3. The data is directly uploaded to the directory partition, so that the partition table and the associated data generated in three ways
(1) One way: After uploading data recovery
upload data
hive > dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=12;
hive > dfs -put /opt/module/datas/dept.txt/user/hive/warehouse
/dept_partition2/month=201709/day=12;
Query data (query data can not just upload)
hive > select * from dept_partition2 where month='201709' and day='12';
Perform a repair order
hive > msck repair table dept_partition2;
Query data again
hive > select * from dept_partition2 where month='201709' and day='12';
(2) Second way: After uploading the data partition is added
upload data
hive > dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=11;
hive > dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11;
Add a partition execution
hive > alter table dept_partition2 add partition(month='201709',
day='11');
Query data
hive > select * from dept_partition2 where month='201709' and day='11';
(3) Three ways: After uploading the data to the partitioned load
Create a directory
hive > dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=10;
upload data
hive > load data local inpath '/opt/module/datas/dept.txt' into table
dept_partition2 partition(month='201709',day='10');
Query data
hive > select * from dept_partition2 where month='201709' and day='10';
7 Modify the table
7.1 Rename Table
1. grammar
ALTER TABLE table_name RENAME TO new_table_name
2. Practical operation case
hive > alter table dept_partition2 rename to dept_partition3;
7.2 add, modify and delete the partition table
See 6.1 above basic operation partition table.
7.3 add / modify / replace column information
1. grammar
Update Column
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
Additions and substitutions column
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
NOTE: ADD is representative of a new field, all fields in the position behind the column (partition before the column), the REPLACE is the substitution table showing all fields.
2. Practical operation case
(1) lookup table structure
hive> desc dept_partition;
(2) add columns
hive > alter table dept_partition add columns(deptdesc string);
(3) look-up table structure
hive> desc dept_partition;
(4) Update Column
hive > alter table dept_partition change column deptdesc desc int;
(5) look-up table structure
hive> desc dept_partition;
(6) Replace Column
hive > alter table dept_partition replace columns(deptno string, dname
string, loc string);
(7) lookup table structure
hive> desc dept_partition;
8, delete the table
hive > drop table dept_partition;