-
Hive Supported storage formats are mainly: TEXTFILE, SEQUENCEFILE, ORC, PARQUET
-
TEXTFILE SEQUENCEFILE and storage formats are based on the stored row; PARQUET the ORC and is based on the columnar storage
-
Rows of memory features: Query an entire row of data when the conditions are satisfied, the storage column is required to find the aggregated each field value of each column and row memory only need to find a value which, in the rest phase values are o place, so the speed is faster this time row store query
-
Column storage features: Because the data gathered is stored for each field, the query takes only a few fields in time, can greatly reduce the amount of data read; data type for each field must be the same for columnar storage can better compression algorithm of design
-
-
TextFile formats: default format, the data is not compressed, large disk overhead, large overhead data analysis