HIVE of storage

  • Hive Supported storage formats are mainly: TEXTFILE, SEQUENCEFILE, ORC, PARQUET

  • TEXTFILE SEQUENCEFILE and storage formats are based on the stored row; PARQUET the ORC and is based on the columnar storage

    • Rows of memory features: Query an entire row of data when the conditions are satisfied, the storage column is required to find the aggregated each field value of each column and row memory only need to find a value which, in the rest phase values ​​are o place, so the speed is faster this time row store query

    • Column storage features: Because the data gathered is stored for each field, the query takes only a few fields in time, can greatly reduce the amount of data read; data type for each field must be the same for columnar storage can better compression algorithm of design

  • TextFile formats: default format, the data is not compressed, large disk overhead, large overhead data analysis

  • Parquet binary file is stored, it is not directly readable file comprising data and metadata of the file, the file format is thus self Parquet parsed

Guess you like

Origin www.cnblogs.com/xiangyuguan/p/11410826.html