Hive面试题4：讲讲Hive中的排序Sort By、Order By、Cluster By、Distrbute By - 代码天地

Hive面试题4：讲讲Hive中的排序Sort By、Order By、Cluster By、Distrbute By

其他 2020-04-04 20:35:03 阅读次数: 0

hive中的4个排序，面试时经常被问到。自己总结的一个面试时的话语：

1.order by 会对输入做全局排序，为保证全局的排序，因此只有一个reducer，会导致当输入规模较大时，需要较长的计算时间。
2. sort by不是全局排序，其在数据进入reducer前完成排序。因此，如果用sort by进行排序，则sort by只保证每个reducer的输出有序，不保证全局有序。
3. distribute by(字段)根据指定的字段将数据分到不同的reducer，且分发算法是hash散列，常用sort by结合使用，Hive要求distribute by语句要写在sort by语句之前。
4. cluster by(字段) 除了具有distribute by的功能(既可以把数据分到不同的reduce)外，还会对该字段进行排序.但是排序只能是倒序排序，不能指定排序规则为asc或者desc
5. 因此：
当数据量规模较大时，不使用order by，使用用distribute by + sort by
如果distribute by 和 sort by字段是同一个时，此时，cluster by = distribute by + sort by

Sql Boy

发布了35 篇原创文章 · 获赞 12 · 访问量 6637

私信关注

猜你喜欢

转载自blog.csdn.net/u012955829/article/details/102847736

Hive面试题4：讲讲Hive中的排序Sort By、Order By、Cluster By、Distrbute By

hive中的order by、sort by、distribute by、cluster by排序

hive Sort By/Order By/Cluster By/Distribute By

Hive的排序（Order by，Sort by，Distribute by，Cluster by）

Hive 排序及优化 ORDER BY, SORT BY, DISTRIBUTE BY, CLUSTER BY

Hive中order by、sort by、distribute by和cluster by

Hive中order by、sort by、distribute by、cluster by的区别

Hive中order by sort by distribute by cluster by用法

hive中 order by ,distribute by ,cluster by ,sort by 区别

hive 中 order by ,sort by ,distribute by ,cluster by 详解

Hive中的order by、sort by、distribute by和cluster by

Hive中order by，sort by，distribute by，cluster by的区别

hive中cluster by，order by，sort by，distribute by的区别

hive中的order by，sort by，distribut by，cluster by

hive中order by ，sort by ，distribute by 和 cluster by

Hive中的order by,sort by,distribute by,cluster by 的区别

hive中 order by,sort by和cluster by的区别

Hive面试题：cluster by，order by，sort by distribute by的使用场景

Hive_Hive 排序及优化 ORDER BY, SORT BY, DISTRIBUTE BY, CLUSTER BY

Hive的sort by, order by, distribute by, cluster by区别？

【Hive】Order by、Sort by、Distribute by和Cluster by

谈谈hive的order by ，sort by ，distribute by 和 cluster by

Hive之Order,Sort,Cluster and Distribute By

Hive的Order by、Sort by、Distribute by和Cluster by的区别

hive- order by、sort by 、distribute by、cluster by

hive的 group 、distribute 、sort 、cluster、order 区别

Hive学习：order by，sort by，distribute by，cluster by的区别

【Hive】Hive 中四个BY对比：Order By,Sort By ,Distribute By,Cluster By [Hive面试常考]

hive入门之排序查询（order by,sort by,distribute by,cluster by...）

hive四种排序order by，sort by，distribute by，cluster by的区别

今日推荐

数学建模Matlab之数据预处理方法

充电桩---ISO15118协议详细介绍

对话Kaldi之父、小米首席语音科学家Daniel Povey：开源环境比金钱和荣誉更吸引我 | AGI技术50人...

Hugging Face全攻略：轻松下载Llama 3模型，探索NLP的无限可能！【实操】

阅读送书抽奖？玩转抽奖游戏，js-tool-big-box工具库新上抽奖功能

百度发布Comate代码知识增强2.0，国内首个支持实时检索智能代码助手

黑客利用扫雷游戏 Python 克隆隐藏恶意脚本，攻击欧洲和美国金融机构

微软对开源字体 Cascadia Code 进行重大更新

好书推荐《ChatGPT原理与架构：大模型的预训练、迁移和中间件编程》

Baidu Comate 智能编码助手：编程新伙伴，效率新飞跃

AI时代：人工智能大模型引领科技创造新时代

百篇博客 · 千里之行

周排行

Python模块之shelve

勇于承担责任

Hikyuu 1.1.0 发布，量化交易研究框架

字节跳动Java3面“凉凉”~不负韶华，努力复习备战“金三银四”

Linux下静态链接库与动态链接库的区别

spring boot架构改造

怎么理解AOP

文件不同步 --本地和eclipse

在linux配置nginx负载均衡

Linux Shell基础命令

每日归档

更多

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)

2024-05-23(9)

2024-05-22(41)

2024-05-21(8)

2024-05-20(36)

2024-05-19(0)