Difference between Hive internal table and external table

  It is mainly reflected in the operations of load and drop (whether to delete metadata and data at the same time):

  When Hive creates an internal table, it moves the data to the path pointed to by the data warehouse, and hive manages the life cycle of the data;
  when creating an external table, it only records the path where the data is located, and does not make any changes to the location of the data.
 
  When deleting a table, the metadata and data of the internal table will be deleted together, and some hive operations are not suitable for external tables, such as creating a table and inserting data into the table with a single query statement.
  External tables only delete metadata, not data. In this way, external tables are relatively more secure, data organization is more flexible, and it is convenient to share source data. When creating an external table, you don't even need to know if the external data exists, you can defer creating the data until after the table is created.

  Option: Internal tables are not much different from external tables. If all the data is processed by hive, create an internal table; if the processing of the data is processed by hive along with other tools, create an external table.

  However, managing tables is inconvenient to share data with other jobs. For example, suppose we have a piece of data created by pig or other tool and used primarily by this tool, and we also want to use hive to perform some queries on this data, but without giving Hive ownership of the data, we can create An external table points to this data without needing to own it.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325244226&siteId=291194637