Optimization of group by and join statements in vertica

vertica group by optimizes the statement, first perform the explain operation on the statement to view the pre-execution plan, in which group by is divided into GROUPBY PIPELINED and GROUPBY HASH, through the execution plan, you can clearly see the execution method adopted by vertica, the optimization is generally GROUPBY HASH is optimized for GROUPBY PIPELINED
Let's talk about the examples on the official website

CREATE TABLE sortopt (
    a INT NOT NULL, 
    b INT NOT NULL,
    c INT,
    d INT
);
CREATE PROJECTION sortopt_p (
   a_proj,
   b_proj,
   c_proj,
   d_proj )
AS SELECT * FROM sortopt
ORDER BY a,b,c 
UNSEGMENTED ALL NODES;
INSERT INTO sortopt VALUES(5,2,13,84);
INSERT INTO sortopt VALUES(14,22,8,115);
INSERT INTO sortopt VALUES(79,9,401,33);
  • first case
  • GROUP BY a
    GROUP BY a,b
    GROUP BY b,a
    GROUP BY a,b,c
    GROUP BY c,a,b
    If the above group by is used, GROUPBY PIPELINED is used, because the fields after group by are all Presort in projection
  • GROUP BY a,b,c,d
  • In this case, the use of GROUPBY HASH is not recommended~

  • In the second case,
    GROUP BY a, c is
    executed according to GROUP BY HASH because a, c fields are not adjacent, if it is
    GROUP BY a, c or GROUP BY b, c will be executed according to GROUP BY PIPELINED

  • third case

  • When there is a where condition before group by

SELECT a FROM tab WHERE a = 10 GROUP BY b 此时按照 GROUPBY PIPELINED

SELECT a FROM tab WHERE a = 10 GROUP BY c At this time, according to GROUP BY HASH, all Projectionn columns
processed not appear in the equivalent condition of the where clause, if the above is changed to SELECT a FROM tab
WHERE a = 10 and b=10 GROUP BY c will be executed according to GROUPBY PIPELINED

Through the introduction of the above three situations, I hope you have a certain understanding of the group by of vertica.

**> Regarding join vertica, there will be two execution methods: Merge Join and Hash join. It is recommended to use Merge Join for association.

The necessary condition for performing Merge join is that the associated fields should be pre-sorted in the two tables, that is, if the order by fields of the two tables**

You can refer to the official website Avoiding GROUP BY HASH with Projection Design

<script type="text/javascript"> $(function () { $('pre.prettyprint code').each(function () { var lines = $(this).text().split('\n').length; var $numbering = $('<ul/>').addClass('pre-numbering').hide(); $(this).addClass('has-numbering').parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($('<li/>').text(i)); }; $numbering.fadeIn(1700); }); }); </script>

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326689360&siteId=291194637