sparksql--join关联执行情况

版权声明:转载请注明出处----------谢谢! https://blog.csdn.net/sinat_36755318/article/details/79877332

准备5张表的数据,例如

select * from yxl_test;
+----+-------+-----+
|  id   | name  |  val   | 
+----+-------+-----+
| 3     | au       | 90.0  | 
| 6     | pp       | 92.0  | 
| 8     | we      | 57.0  | 
| 8     | we      | 27.0  | 
| 6     |           | 85.0  | 
| 3     | tom   | 30.0  | 
| 12   |          | 78.0  | 
| NULL| jay   | 49.0  | 
| 7     | jy      | 28.0   |
| 9     |          | NULL  | 

+-------+-------+-------+


cache table yxl_test1 as select id,name,val from yxl_test order by id limit 2;

select * from yxl_test1;
+-------+-------+-------+
|  id   | name  |  val  | 
+-------+-------+-------+
| NULL  | jay   | 49.0  | 
| 3     | tom   | 30.0  |

+-------+-------+-------+

cache table yxl_test2 as select id,name,val from yxl_test order by id desc limit 2;

select * from yxl_test2;
+-----+-------+-------+--+
| id  | name  |  val  |
+-----+-------+-------+--+
| 12  |       | 78.0  |
| 9   |       | NULL  |

+-----+-------+-------+--+


cache table yxl_test3 as select id,name,val from yxl_test where id = 6;

select * from yxl_test3;
+-----+-------+-------+--+
| id  | name  |  val  |
+-----+-------+-------+--+
| 6   | pp    | 92.0  |
| 6   |       | 85.0  |

+-----+-------+-------+--+


cache table yxl_test4 as select id,name,val from yxl_test where id=3;

 select * from yxl_test4;
+-----+-------+-------+--+
| id  | name  |  val  |
+-----+-------+-------+--+
| 3   | au    | 90.0  |
| 3   | tom   | 30.0  |

+-----+-------+-------+--+


cache table yxl_test5 as select id,name,val from yxl_test where id=3 and name='au';

 select * from yxl_test5;
+-----+-------+-------+--+
| id  | name  |  val  |
+-----+-------+-------+--+
| 3   | au    | 90.0  |
+-----+-------+-------+--+


====================================TEST====================================

 1:a表left join b表,a表left join c 表

结论:a为主表,数据量与a相同,列数增加为三倍,第一份a表,第二份b表,第三份c表,关联不上的列置空

select * from yxl_test a left join yxl_test1 b on a.id=b.id left join yxl_test2 c on a.id=c.id

+-------+-------+-------+-------+-------+-------+-------+-------+-------+--+
|  id   | name  |  val  |  id   | name  |  val  |  id   | name  |  val  |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+--+
| 3     | au    | 90.0  |
3     | tom   | 30.0  | NULL  | NULL  | NULL  |
| 6     | pp    | 92.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 8     | we    | 57.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 8     | we    | 27.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 6     |       | 85.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 3     | tom   | 30.0  |
3     | tom   | 30.0  | NULL  | NULL  | NULL  |
| 12    |       | 78.0  | NULL  | NULL  | NULL  |
12    |       | 78.0  |
| NULL  | jay   | 49.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 7     | jy    | 28.0  | NULL  | NULL  | NULL  | NULL  | NULL  | NULL  |
| 9     |       | NULL  | NULL  | NULL  | NULL  |
9     |       | NULL  |

+-------+-------+-------+-------+-------+-------+-------+-------+-------+--+

物理执行计划关键部分:2821为yxl_test2;    2897为yxl_test1

 == Physical Plan ==

*BroadcastHashJoin [id#3224], [id#2897], LeftOuter, BuildRight
:- *BroadcastHashJoin [id#3224], [id#2821], LeftOuter, BuildRight


2:a表left join b表,b表inner join c 表

结论:执行计划显示a表先与b表进行inner join,再b表与c表进行inner join   

例1:select * from yxl_test a left join yxl_test1 b on a.id=b.id inner join yxl_test5 c on b.id=c.id;

+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+
| id  | name  |  val  | id  | name  |  val  | id  | name  |  val  |
+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+
| 3   | au    | 90.0  | 3   | tom   | 30.0  | 3   | au    | 90.0  |
| 3   | tom   | 30.0  | 3   | tom   | 30.0  | 3   | au    | 90.0  |
+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+

物理执行计划关键部分:4371为yxl_test;    2821为yxl_test1;   4150为 yxl_test5

 == Physical Plan ==
*BroadcastHashJoin [id#2821], [id#4150], Inner, BuildRight
:- *BroadcastHashJoin [id#4371], [id#2821], Inner, BuildRight

例2:select * from yxl_test a left join yxl_test1 b on a.id=b.id inner join yxl_test3 c on b.id=c.id;
+-----+-------+------+-----+-------+------+-----+-------+------+--+
| id  | name  | val  | id  | name  | val  | id  | name  | val  |
+-----+-------+------+-----+-------+------+-----+-------+------+--+

+-----+-------+------+-----+-------+------+-----+-------+------+--+


3:a表inner join b表,a表left join c 表

结论:a表先与b表inner join ,再将取出来的数据集与c表left join

例1:select * from yxl_test a inner join yxl_test1 b on a.id=b.id left join yxl_test2 c on a.id=c.id;

+-----+-------+-------+-----+-------+-------+-------+-------+-------+--+
| id  | name  |  val  | id  | name  |  val  |  id   | name  |  val  |
+-----+-------+-------+-----+-------+-------+-------+-------+-------+--+
| 3   | au    | 90.0  | 3   | tom   | 30.0  | NULL  | NULL  | NULL  |
| 3   | tom   | 30.0  | 3   | tom   | 30.0  | NULL  | NULL  | NULL  |

+-----+-------+-------+-----+-------+-------+-------+-------+-------+--+

物理执行计划关键部分:2821为yxl_test1;    2897为yxl_test2 

== Physical Plan ==

*BroadcastHashJoin [id#3562], [id#2897], LeftOuter, BuildRight
:- *BroadcastHashJoin [id#3562], [id#2821], Inner, BuildRight


例2:select * from yxl_test a inner join yxl_test1 b on a.id=b.id left join yxl_test4  c on a.id=c.id;
+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+
| id  | name  |  val  | id  | name  |  val  | id  | name  |  val  |
+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+
| 3   | au    | 90.0  | 3   | tom   | 30.0  | 3   | tom   | 30.0  |
| 3   | au    | 90.0  | 3   | tom   | 30.0  | 3   | au    | 90.0  |
| 3   | tom   | 30.0  | 3   | tom   | 30.0  | 3   | tom   | 30.0  |
| 3   | tom   | 30.0  | 3   | tom   | 30.0  | 3   | au    | 90.0  |

+-----+-------+-------+-----+-------+-------+-----+-------+-------+--+

物理执行计划关键部分:2821为yxl_test1;    3730为yxl_test4

 == Physical Plan ==
*BroadcastHashJoin [id#3964], [id#3730], LeftOuter, BuildRight

:- *BroadcastHashJoin [id#3964], [id#2821], Inner, BuildRight


4:a表inner join b表,a表inner join c 表

select * from yxl_test a inner join yxl_test1 b on a.id=b.id inner join yxl_test3 c on a.id=c.id;
+-----+-------+------+------------+-----------------+---------+-----+-------+------+-----+-------+------+--+
| id  | name  | val  | bandclass  | p_provincecode  | p_date  | id  | name  | val  | id  | name  | val  |
+-----+-------+------+------------+-----------------+---------+-----+-------+------+-----+-------+------+--+

+-----+-------+------+------------+-----------------+---------+-----+-------+------+-----+-------+------+--+

物理执行计划关键部分:2821为yxl_test1;    2973为yxl_test3

 == Physical Plan ==
*BroadcastHashJoin [id#5332], [id#2973], Inner, BuildRight
:- *BroadcastHashJoin [id#5332], [id#2821], Inner, BuildRight


5:a表 left join b表,a表inner join c表

结论:a为主表,先执行left join,再执行inner join

 select * from yxl_test a left join yxl_test1 b on a.id=b.id inner join yxl_test2 c on a.id=c.id;

+-----+-------+-------+------------+-----------------+-------------+-------+-------+-------+-----+-------+-------+--+
| id  | name  |  val  | bandclass  | p_provincecode  |   p_date    |  id   | name  |  val  | id  | name  |  val  |
+-----+-------+-------+------------+-----------------+-------------+-------+-------+-------+-----+-------+-------+--+
| 12  |       | 78.0  | NULL       | 510000          | 2018-04-10  | NULL  | NULL  | NULL  | 12  |       | 78.0  |
| 9   |       | NULL  | NULL       | 510000          | 2018-04-10  | NULL  | NULL  | NULL  | 9   |       | NULL  |
+-----+-------+-------+------------+-----------------+-------------+-------+-------+-------+-----+-------+-------+--+

物理执行计划关键部分:6488为yxl_test1 ;6540 为 yxl_test2

== Physical Plan ==
*BroadcastHashJoin [id#6776], [id#6540], Inner, BuildRight
:- *BroadcastHashJoin [id#6776], [id#6488], LeftOuter, BuildRight

猜你喜欢

转载自blog.csdn.net/sinat_36755318/article/details/79877332