hive database delete deletes part of the data/delete the data in the partition

In Hive, deleting some data is a common operation, especially when we need to clear data that is no longer needed or update the data. Hive provides multiple ways to delete some data. This article will introduce some of the commonly used methods.

1. Hive delete data

1.1. Delete the entire table

The easiest way is to delete the entire table, which will delete all data in the table. You can use the DROP TABLE statement to accomplish this operation. Here's an example:

DROP TABLE  my_table;

This will delete the table named my_table. Note that this will delete the table's metadata and data.

1.2. Delete specific rows in the table

If you only need to delete some data in the table, you can use the DELETE statement. The DELETE statement is used to delete rows that meet specified conditions. Here's an example:

DELETE FROM my_table WHERE condition;

Among them, my_table is the name of the table where data is to be deleted, and condition is an expression used to specify which rows should be deleted. For example, to delete all rows in the my_table table with an age greater than 30, you can use the following statement:

DELETE FROM my_table WHERE age > 30;

This will delete all rows with age greater than 30.

1.3. Delete specific partitions in the table

If the table is partitioned, you can use the DELETE statement to delete specific partitions. Partitions are divided based on certain columns in the table, so only specific partitions can be deleted without affecting other partitions. Here's an example:

DELETE FROM my_table PARTITION (partition_column = partition_value);
或者
alter table my_table drop partition(partition_column = partition_value)

Among them, my_table is the name of the table to delete data, partition_column is the name of the partition column, and partition_value is the value of the partition to be deleted. For example, to delete the partition with the date column '2022-01-01' in the my_table table, you can use the following statement:

DELETE FROM my_table PARTITION (date = '2022-01-01');
或者
alter table my_table drop partition(date = '2022-01-01')

This will delete all partitions with date listed as '2022-01-01'.

1.4. Delete some data in the partition

If we just want to delete the specified data in the partition, we can delete it with the following command

DELETE FROM my_table PARTITION (partition_column = partition_value) where conditon

For example: Delete boys whose birth year is 2020

DELETE FROM my_table PARTITION (year= '2020') where sex = '男'

1.5. Clear all data in the table

If you only need to delete all the data in the table without deleting the table itself, you can use the TRUNCATE statement. The TRUNCATE statement is used to delete all rows in a table but retain the table's metadata. Here's an example:

TRUNCATE TABLE my_table;

2. Extension

2.1、dynamic partition on Crud si not disabled, please set hive.crud.dynamic.partition=true to enable it

In the hive database, using delete to delete the data in the partition reports an error. The SQL and error content are as follows:

# user表根据year年份字段进行分区,删除2020年分区内的男生
sql:  delete from user where year = '2020' and sex = '男'

报错内容:dynamic partition on Crud si not disabled, please set hive.crud.dynamic.partition=true to enable it

Solution:
Put the partition field in front of where, the syntax is as follows:

delete from user  partition(year = '2020') where  sex = '男'

Guess you like

Origin blog.csdn.net/weixin_49114503/article/details/134461085