Greenplum Optimization--SQL Tuning

Once GPDB is started, it only cares about the segment instance and has nothing to do with the specific physical machine. Therefore, the number of segments configured on each host can be different. In addition to the number, other configurations can also be different (for example, different data directories). Our company's cloud services operate in this heterogeneous situation in some cases, and have been operating smoothly for a long time. So, it is theoretically possible. The main problem is that now, whether it is the open source version of GPDB or the official version of Pivotal, almost all management scripts, including system initialization scripts, expansion scripts, etc., assume that all physical machines are homogeneous. So, it is almost impossible for you to initialize a heterogeneous GPDB cluster.


Database query preparation

1. VACUUM

  • vacuum simply reclaims space and makes it available again, without requesting an exclusive lock, you can still read and write to the table
  • vacuum full performs more extensive processing, including moving rows across blocks in order to compress the table to use the least number of disk blocks for storage. Relative vacuum is slower and requires exclusive locks.
  • Regular execution: During routine maintenance, the data dictionary needs to be vacuumed regularly, which can be done every day when the database is idle. Then perform a vacuum full on the system table every long period of time (two or three months). This operation requires downtime and is time-consuming. Large tables may take several hours.
  • reindex: After performing vacuum, it is best to rebuild the index on the table

2. ANALYZE

  • 命令:analyze [talbe [(column,..)]]
  • Gather statistics on table contents to optimize execution plans. If this command is executed after the index is created, the index will be used for subsequent queries.
  • Automatic statistics collection
  • There are parameters gp_autostats_mode that control automatic collection in postgresql.conf. There are three values ​​of gp_autostats_mode: none, no_change, on_no_stats (default)
    • none: disable statistics collection
    • on change: When the number of rows affected by the execution of a DML exceeds the value specified by the gp_autostats_on_change_threshold parameter, an analyze operation will be automatically performed after the execution of the DML to collect table statistics.
    • no_no_stats: When using create talbe as select, insert, copy, if no statistics have been collected in the target table, then analyze will be automatically executed to collect the information of this table. gp uses on_no_stats by default, which consumes less of the database, but for tables that are constantly changing, the database will not collect statistics after the first collection of statistics. You need to execute analyze at regular intervals.
  • If you have a lot of SQL running under 1 minute, you will find that a lot of time is spent collecting statistics. In order to reduce the consumption of this part, you can specify not to collect statistics for some columns, as follows:

    1. create table test(id int, name text,note text);
    

    The above is known that the table column note does not need to appear on the join column, nor will it appear in the filter condition of the where statement, because this column can be set to not collect statistics:

    1. alter table test alter note SET STATISTICS 0;
    

3. EXPLAIN execution plan

Displays the execution plan generated by the planner for the provided statement.

  • cost: The startup time before returning the first row of records, and the total time (
    measured
  • rows: Estimate the number of rows in the result set returned by SQL based on statistics
  • width: The length of each row of the returned result set, the length value is calculated according
    to .

4. Two aggregation methods

  • Hashaggregate
    calculates the hash value according to the value behind the group by field, and maintains the corresponding list in memory according to the aggregation function used earlier. There are several arrays for several aggregation functions. In the case of the same amount of data, the smaller the repetition of the aggregation field, the larger the memory used.
  • groupaggregate
    first sorts the data in the table according to the fields of group by, performs a full scan on the sorted data, and performs aggregate function calculations. The memory consumption is basically constant.

  • There are a large number of aggregate functions in SQL, and group by should use groupaggregate when there are few repeated values ​​in the field of group by.

5. Association

It is divided into three categories: hash join, nestloop join, and merge join. On the premise of ensuring the correct execution of SQL, the planner preferentially uses hash join.

  • Hash join: First calculate the hash value of one of the associated tables, save it in a hash table in memory, then perform a full table scan on the other table, and then associate each row with this hash table.
  • nestedloop: The data in the two associated tables is relatively small for broadcasting, such as Cartesian product:select * fromtest1,test2
  • Merge join: Sort the two tables according to the associated key, and then associate the data according to the merge sort method, which is less efficient than hash join. Full outer join can only be implemented by merge join.
  • The associated broadcast and redistribution analysis P133, the general planner will automatically select the optimal execution plan.
  • Sometimes leads to redistribution and broadcasting, time-consuming operations

6. Redistribution

In some SQL queries, data needs to be redistributed on each node, which is limited by network transmission and disk I/O, and the redistribution speed is relatively slow.

  • Associative key coercion type conversion
    Generally , the table is hashed according to the specified distribution key. If two tables are distributed according to id:intege, id:numericr, when associated, a table id is required for forced type conversion, because the hash values ​​of different types are different, which leads to data redistribution.
  • The association key does not match the partial key
  • group by, windowing function, grouping sets will cause redistribution

Query optimization

Observe the execution plan through explain to determine how to optimize the SQL.

1. explain parameter

Displays the execution plan generated by the planner for the provided statement.

  • cost: The startup time before returning the first row of records, and the total time (measured in disk page accesses) to return all records
  • rows: Estimate the number of rows in the result set returned by SQL based on statistics
  • width: The length of each row of the returned result set, which is calculated according to the statistics in the pg_statistic table.

2. Choose the appropriate distribution key

Improper selection of the distribution key will lead to redistribution and uneven data distribution, and uneven data distribution will concentrate the execution of SQL on one segment node, limiting the overall speed of the gp.

  • The data storage of all nodes is uniform, and the data distribution is uniform in order to make full use of the query of multiple machines and give full play to the advantages of distribution.
  • Join, windowing functions, etc. try to use the distribution key as the association key and partition key. In particular, it should be noted that the join and windowing functions will perform redistribution or broadcast operations based on the association key and partition key. Therefore, if the distribution key and the association key are inconsistent, no matter how the distribution key is modified, it needs to be redistributed again.
  • Try to ensure that the storage of the result set generated by the where condition is as uniform as possible.
  • Check if a table is unevenly distributed:select gp_segment_id,count(*) from fact_tablegroup by gp_segment_id
  • At the segment level, you can select gp_segment_id,count(*) from fact_table group by gp_segment_idcheck whether the data of each table is evenly stored by
  • At the system level, you can directly use df -h or du -h to check whether the disk or directory data is uniform
  • View the data skew table in the database
    First, define the data skew rate: maximum child node data volume/average node data volume. In order to prevent the data volume of the entire table from being empty and have little impact on the results, a small value is added to the average node data volume. The SQL is as follows:
SELECT tabname,
max(SIZE)/(avg(SIZE)+0.001) AS max_div_avg,
sum(SIZE) total_size
FROM
(SELECT gp_segment_id,
oid::regclass tabname,
pg_relation_size(oid) SIZE
FROM gp_dist_random('pg_class')
WHERE relkind='r'
AND relstorage IN ('a','h')) t
GROUP BY tabname
ORDER BY 2 DESC;

3. Partition table

Partitioning according to a field does not affect the distribution of data on data nodes, but only on a single data node, data is partitioned and stored. It can speed up the query speed of the partition field.

4. Compress the table

Use compression for large AO tables and partitioned tables to save storage space and improve system I/O. Compression can also be configured at the field level. Application scenarios:

  • Table updates and deletes are not required
  • When accessing a table, it is basically a full table scan, no need to build an index
  • You cannot frequently add fields or modify field types to the table

5. Group expansion

The GROUP BY extension to Greenplum Database can perform some common calculations more efficiently than applications or stored procedures.

    GROUP BY ROLLUP(col1, col2, col3)
    GROUP BY CUBE(col1, col2, col3)
    GROUP BY GROUPING SETS((col1, col2), (col1, col3))

ROLLUP computes aggregate counts on grouping fields (or expressions) from the most detailed level to the top level. The argument to ROLLUP is an ordered list of grouped fields that computes the aggregation at each level from right to left. For example ROLLUP(c1, c2, c3) will compute aggregates for the following grouping criteria:

    (c1, c2, c3)
    (c1, c2)
    (c1)
    ()

CUBE computes aggregates for all combinations of grouping fields. For example CUBE(c1, c2, c3) will compute the aggregate:

    (c1, c2, c3)
    (c1, c2)
    (c2, c3)
    (c1, c3)
    (c1)
    (c2)
    (c3)
    ()

GROUPING SETS specifies that aggregates are computed on those fields, and it can control the partitioning conditions more precisely than ROLLUP and CUBE.

6. Window functions

Window functions can implement aggregation or ranking functions on grouped subsets of the result set, such as sum(population) over (partition by city). The window function is powerful and has excellent performance. Because it does the computation inside the database, data transfer is avoided.

  • The window function row_number() computes the row number of a row in a grouped subset, eg row_number() over (order by id).
  • If the query plan shows that a table is scanned multiple times, it may be possible to reduce the number of scans by using window functions.
  • Window functions can often avoid self-association.

7. Column Store and Row Store

Column storage, that is, the data of the same column is continuously stored in a physical file, has a higher compression rate, and is suitable for the scenario of filtering some fields in the table.
It should be noted that if there are many nodes in the cluster and there are many columns in the table, at least one file will be generated for each column of each node, then more files will be generated in general, and the DDL operation on the table will be will be slower. When used with a partition table, more files will be generated, and may even exceed the file handle limit of Linux , so pay special attention.

  • Row storage: If the record needs to be updated/deleted, only the uncompressed row storage method can be selected. For queries, if the number of selected columns often exceeds more than 30 columns, then the row storage method should also be selected.
  • Column storage: If the number of selected columns is very limited, and you want to exchange a higher compression ratio for better IO performance when querying massive data, then you should choose the column storage mode. Among them, in the column-stored partition table, each column of each partition will have a corresponding physical file, so pay attention to avoid too many files, which may exceed the upper limit of the number of files that can be opened at the same time on Linux and the efficiency of DDL commands is very poor.

8. Functions and Stored Procedures

Although cursors are supported, try not to use cursors to process data, but to operate data as a whole.

9. Index usage

  • If you are returning a very small result set (no more than 5%) from a very large result set, it is recommended to use a BTREE index (atypical data warehouse operations)
  • The storage order of table records is preferably consistent with the index, which can further reduce IO (good index cluster)
  • The columns in the where condition are joined in the way of or, and you can consider using the index
  • When the key value is repeated a lot, it is more suitable to use the bitmap index

For tests on index usage, see GP Index Tuning Test - Basic and GP Index Tuning Test - Sorting .

10. NOT IN

  • It has been optimized in gp4.3, using hash left anti semi join for connection.
  • The following is only for gp4.1 and before

    • SQL with not in will be executed by Cartesian product. Nested join is used, which is extremely inefficient.
    • not in==" is implemented by using left join table association after deduplication
    • example

       select * from test1 where col1 not in (select col2 from test1)
      

      change to

       select * from test1 a left join (select col2 from test1 group bycol2) b on a.col1=b.col2 where b.col2 is null
      

      The runtime has been improved from more than 30 seconds to 92 milliseconds.

11. Too many aggregate functions

  • There are too many aggregate functions in one SQL, and the hashaggregate may be wrongly selected for execution because the statistical information is not detailed enough or the SQL is too responsible, resulting in insufficient memory.
  • Solution:
    • Split into multiple SQL for execution to reduce the memory used by hashaggregate
    • Execute enable_hashagg=off, turn off the hashaggregate parameter, and force it to not be used. Groupaggregate will be used, so the sorting time will be longer, but the memory is controllable. It is recommended to use this method for simplicity.

12. Resource queue

Different users are used for data writing and querying. When GP creates users, different resource queues are specified for different users.

13. Other Optimization Tips

  • Use group by to rewrite distinct, because DISTINCT needs to be sorted
  • Rewrite UNION with UNION ALL and GROUP BY
  • Try to use the aggregate functions and window functions provided by GREENPLUM to complete some complex analysis

refer to

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326487486&siteId=291194637