[Spark SQL基础]-- 基本语法之 select [hints ...]

背景

      今天偶然有机会看见了以前一位同学在 join 中使用了 mapjoin 小表广播的优化,由此激起了我对 select 语法中的 hints 部分的深入挖掘,并分享出来,供小伙伴们参考,不足之处,还望赐教!

目录

  • select 基本语法
  • hints 来源
  • hints 的语法和选项
  • hints 使用的组合

内容

1 select 基本语法结构

SELECT [hints, ...] [ALL|DISTINCT] named_expression[, named_expression, ...]
  FROM relation[, relation, ...]
  [lateral_view[, lateral_view, ...]]
  [WHERE boolean_expression]
  [aggregation [HAVING boolean_expression]]
  [ORDER BY sort_expressions]
  [CLUSTER BY expressions]
  [DISTRIBUTE BY expressions]
  [SORT BY sort_expressions]
  [WINDOW named_window[, WINDOW named_window, ...]]
  [LIMIT num_rows]

named_expression:
  : expression [AS alias]

relation:
  | join_relation
  | (table_name|query|relation) [sample] [AS alias]
  : VALUES (expressions)[, (expressions), ...]
        [AS (column_name[, column_name, ...])]

expressions:
  : expression[, expression, ...]

sort_expressions:
  : expression [ASC|DESC][, expression [ASC|DESC], ...]

2 hints 来源

这是来源于创始人 Reynold Xin 提出的,自 Spark-2.2 开始增加的 框架。

Patch:https://issues.apache.org/jira/browse/SPARK-20857

3 hints 的语法和选项

SELECT /*+ MAPJOIN(table_name) */

SELECT /*+ BROADCASTJOIN(table_name) */ 

SELECT /*+ BROADCAST(table_name) */ 

// spark -2.4.0 之后新增的功能
// 由中国贡献者提出并参与贡献
// https://issues.apache.org/jira/browse/SPARK-24940

SELECT /*+ REPARTITION(number) */ 

SELECT /*+ COALESCE(number) */ 


4 hints 使用的组合

  • mapjoin 结合 unionall 使用:select /*+ mapjoin(a) ,a.*,b.* from t_test a join t_map b on a.id=bid.id;
  • repartition 和 coalesce 结合 group by 使用,用于修改 并行度和分区数量

参考

https://docs.databricks.com/spark/latest/spark-sql/language-manual/select.html

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hint-framework.html

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala

https://issues.apache.org/jira/browse/SPARK-16475

https://issues.apache.org/jira/browse/SPARK-20857

https://issues.apache.org/jira/browse/SPARK-24940

发布了508 篇原创文章 · 获赞 613 · 访问量 201万+

猜你喜欢

转载自blog.csdn.net/high2011/article/details/90489028
今日推荐