Greenplum implementation plan and pg Similar, but because gp is distributed shared nothing architecture, the implementation plan and the inevitable pg or some difference.
gp view SQL execution plan by also explain the statement, the syntax is as follows:
Command: EXPLAIN
Description: show the execution plan of a statement
Syntax:
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
Compared to explain the statement options pg less verbose and analyze only two options, the role are:
analyze: executing the command and displays the actual time.
verbose: Displays the query inside the actual structure of the tree, rather than a simple summary.
Distributed execution plan:
Shared Nothing architecture features in gp is: absolutely not share the underlying data, each segment is only part of the data, all nodes are connected through a network.
- redistribution of broadcast
because the data gp is distributed in different segment, so how data is coming together becomes very crucial, which will be designed to migrate data, that redistribution and broadcasting.
Broadcast: transmitting a table of all data on each segment to segment all, this is equivalent to the total amount of each segment has data.
Redistribution: When you need to cross-correlation and aggregation of libraries, when data can not meet the conditions of broadcasting, this time gp redistribution will select data, select the new distribution key (associated with key) to re-break the data re-distributed to all segment on.
Pg distributed execution plan in the execution plan as compared to some more different terms, namely:
. 1, Gather Motion (N:. 1)
polymerization operation, the data on the polymerization of N nodes to the same node.
postgres=# EXPLAIN select * from t3 join t2 on t3.id=t2.id+100 limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=24028.00..24028.69 rows=10 width=74)
-> Gather Motion 2:1 (slice2; segments: 2) (cost=24028.00..24028.69 rows=10 width=74)
-> Limit (cost=24028.00..24028.49 rows=5 width=74)
-> Hash Join (cost=24028.00..72660.00 rows=500000 width=74)
Hash Cond: (t2.id + 100) = t3.id
-> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..31132.00 rows=500000 width=37)
Hash Key: t2.id + 100
-> Append-only Columnar Scan on t2 (cost=0.00..11132.00 rows=500000 width=37)
-> Hash (cost=11528.00..11528.00 rows=500000 width=37)
-> Append-only Scan on t3 (cost=0.00..11528.00 rows=500000 width=37)
Optimizer status: legacy query optimizer
(11 rows)
2, Broadcast Motion (N: N )
broadcasts, to a table of data on each node to all portions of the full segment.
3, Redistribute Motion (N: N )
redistribution, redistribute the data back to all break segment. In general often occurs: when the association, group by, other windowing function.
4, Slice
slice. gp when implementing distributed execution plan, SQL will be split into multiple slice, each slice is part of a single SQL database to perform.