hive of data organization

Section I: Database

      Hive mysql database with a database, the data for detailed management of data of different service modules in a database.

Section II: Data Sheet

First, according to management rights division

1, the internal table

      Management authority is their own hive, hive has the absolute authority to add or delete data (raw data) table. Remove inner table carrying table, the data (corresponding directory HDFS) table is deleted, the metadata is deleted.

2, the external table

      External table like a hdfs data user, data management authority hdfs use their own management, hive is only usage rights. External table during the delete table, the metadata will be deleted (because the table does not exist), the data in the table, hdfs corresponding data will not be deleted.

3, summary

      One difference internal table and the most essential outer table, delete the table, the external data table is not deleted, the data inside the table will be deleted. Internal or external table table table when deleting metadata will be deleted.

Second, by function

1, the partition table

      The role of the partition table: reduce the scan range, improve query performance.

      Zoning criteria: when the district during the partition using the partition field. Partition field condition field is typically filtered, often fields for filtering.

      Partition Table Storage

      (1) Normal data storage table

      hdfs: / user / hive / warehouse / database / table / data files (default path, the configuration file may be modified)

      The presence of (2) a different directory partition table

      hdfs: / user / hive / warehouse / database / table / age = 18 /

      / User / hive / warehouse / database / table / age = 19 /

      / User / hive / warehouse / database / table / age = 20 /

2, sub-barrel table

      The concept of sub-bucket table: the two tables to be associated with the query segmentation are carried out in accordance with uniform rules segmentation. Everything points each small file, called a barrel. Two tables for the sub-barrel tables, barrels basis points basis points for the bucket algorithm, the number of points associated with the construction .hash% barrel. The same as the final association built in two tables corresponding to the tub.

      The role of sub-barrel table: (1) To improve the performance associated with the query. When the number of tables associated with the sub-barrel two limitations: the same or multiple relationship. (2) enhance the sample query performance. A bucket of data can be considered a sample of data sampling.

3, summary

      Whether the partition table or partition table barrel, are stored in the storage time on the physical, it has been divided individual directories or files, if need to improve query performance and performance you can multi-table joins in the partition table in the establishment of sub bucket list .

      Partition is divided into directories, sub-divided into buckets is on file, it is physical.

Section III: View

      1, a view that is represented by a query

      2, hive in view there is only a logical view, not materialized view, save only the view down representative sql statement. Materialized view: The view represents the sql query execution results returned.

      3, view at the time of the query view of the start of real implementation of the statement on behalf of the view. select * from view_name;

      4, only the role of view in order to enhance the readability of sql statement

  5, hive of view does not support insert, delete, update

Guess you like

Origin www.cnblogs.com/zhangxiaofan/p/11037383.html