hive 存储格式的生产应用

相同数据,分别以TextFile、SequenceFile、RcFile、ORC存储的比较。

原始大小: 19M

1. TextFile(默认) 文件大小为18.1M

2. SequenceFile

1

2

扫描二维码关注公众号,回复: 8522647 查看本文章

3

4

5

6

7

8

9

10

11

12

create table page_views_seq(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)ROW FORMAT DELIMITED FIELDS TERMINATED BY “\t”

STORED AS SEQUENCEFILE;

insert into table page_views_seq select * from page_views;

用SequenceFile存储后的文件为19.6M

3. RcFile

1

2

3

4

5

6

7

8

9

10

11

12

create table page_views_rcfile(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"

STORED AS RCFILE;

insert into table page_views_rcfile select * from page_views;

用RcFile存储后的文件为17.9M

4. ORCFile

1

2

3

4

5

create table page_views_orc

ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"

STORED AS ORC

TBLPROPERTIES("orc.compress"="NONE")

as select * from page_views;

用ORCFile存储后的文件为7.7M

5. Parquet

create table page_views_parquet

ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"

STORED AS PARQUET

as select * from page_views;

用ORCFile存储后的文件为13.1M

总结:磁盘空间占用大小比较

ORCFile(7.7M)<parquet(13.1M)<RcFile(17.9M)<Textfile(18.1M)<SequenceFile(19.6)

发布了48 篇原创文章 · 获赞 5 · 访问量 1186

猜你喜欢

转载自blog.csdn.net/qq_34897849/article/details/102691366