版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/kwu_ganymede/article/details/62434616
presto/spark/mapreduce 计算引擎对比
对比的表结构为146列, 15920816 行数据,数据压缩前的大小15G。
对于执行语句的效率,单位秒
TextFile格式
执行的SQL | presto | spark | mr |
SELECT COUNT(*) FROM tmp.mb_crm1 | 5 | 9.264 | 21.711 |
SELECT sum(lately_land_btw) FROM tmp.mb_crm1; | 7 | 17.23 | 25.781 |
SELECT sum(cast(lately_land_btw as bigint)) num,mb_name FROM tmp.mb_crm1 where age>=25 group by mb_name order by num desc |
8 | 20.265 | 128.811 |
Parquet格式
执行的SQL | presto | spark | mr |
SELECT COUNT(*) FROM tmp.mb_crm1 | 1 | 5.255 | 24.142 |
SELECT sum(lately_land_btw) FROM tmp.mb_crm1; | 1 | 3.181 | 42.893 |
SELECT sum(cast(lately_land_btw as bigint)) num,mb_name FROM tmp.mb_crm1 where age>=25 group by mb_name order by num desc |
3 | 11.486 | 66.903 |
可看出presto优势明显,spark次之,mr 最慢。
使用列式储存后,presto提速明显。