AnalyticDB for PostgreSQL 6 new property analysis - Index Only Scan

Introduce the principle of

PG indexes are all secondary index, i.e., the process of indexing the query, the index data and the need to access the data source table. Index Only Scan literally understanding that need only scan the index data in the query process. This scanning method requires a premise that the index contains all the data needed for the query (also called a covering index), as appears in the SELECT, WHERE, ORDER BY cited in the column.

Since the PG MVCC mechanism, in the absence Index only scan, you have to go through any index query checks for visibility through the source table data, as shown:

1

In the index scan process, you need to acquire information for each Record visibility through the source table.
After PG9.2 version supports Index Only Scan, if a query data needed to fully cover can be indexed, then the Index Only Scan will become a new scanning path, and avoids obtains the source through Visibility map table visibility checks, enhanced query performance, if shown:

2

This is mainly dependent on the mechanism Visibility map of Visibility map has a flag, whether the mark Page tuples are visible, it means that if the table is not delete, update, or had been vacuum before.
If it can be confirmed that the Visibility map corresponding Page Index entry is visible, then no acquisition source table Record for determining the visibility, but also need to obtain the source or tuples and visibility determination.

Examples of Use

GP6 version integrates PG9.4 version, and therefore supports the characteristics of the Index Only Scan.
For example there is a table and a column on which the index is created:

postgres=# \d customer_reviews_hp
         Table "public.customer_reviews_hp"
        Column        |      Type       | Modifiers
----------------------+-----------------+-----------
 customer_id          | text            |
 review_date          | date            |
 review_rating        | integer         |
 review_votes         | integer         |
 review_helpful_votes | integer         |
 product_id           | character(10)   |
 product_title        | text            |
 product_sales_rank   | bigint          |
 product_group        | text            |
 product_category     | text            |
 product_subcategory  | text            |
 similar_product_ids  | character(10)[] |
Indexes:
    "c_review_rating" btree (review_rating)
Distributed by: (customer_id)

Inquire:

postgres=# explain analyze select count(*), review_rating from customer_reviews_hp where review_rating > 1 group by 2;
                                                                                       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Gather Motion 4:1  (slice2; segments: 4)  (cost=49979.36..49979.50 rows=5 width=12) (actual time=782.673..782.726 rows=4 loops=1)
   ->  GroupAggregate  (cost=49979.36..49979.50 rows=2 width=12) (actual time=782.384..782.385 rows=2 loops=1)
         Group Key: customer_reviews_hp.review_rating
         ->  Sort  (cost=49979.36..49979.37 rows=2 width=12) (actual time=782.376..782.377 rows=8 loops=1)
               Sort Key: customer_reviews_hp.review_rating
               Sort Method:  quicksort  Memory: 132kB
               ->  Redistribute Motion 4:4  (slice1; segments: 4)  (cost=0.18..49979.30 rows=2 width=12) (actual time=76.538..782.345 rows=8 loops=1)
                     Hash Key: customer_reviews_hp.review_rating
                     ->  GroupAggregate  (cost=0.18..49979.20 rows=2 width=12) (actual time=5.102..73.709 rows=4 loops=1)
                           Group Key: customer_reviews_hp.review_rating
                           ->  Index Only Scan using c_review_rating on customer_reviews_hp  (cost=0.18..41742.09 rows=411854 width=4) (actual time=0.128..643.718 rows=1061311 lo
ops=1)
                                 Index Cond: (review_rating > 1)
                                 Heap Fetches: 0
 Planning time: 0.212 ms
   (slice0)    Executor memory: 220K bytes.
   (slice1)    Executor memory: 156K bytes avg x 4 workers, 156K bytes max (seg0).
   (slice2)    Executor memory: 92K bytes avg x 4 workers, 92K bytes max (seg0).  Work_mem: 33K bytes max.
 Memory used:  2047000kB
 Optimizer: Postgres query optimizer
 Execution time: 783.308 ms
(20 rows)

Thus enabled Index Only Scan.

Can be controlled by using whether enable_indexonlyscan Index Only Scan, for example, the same query above after setting enable_indexonlyscan off, perform again:

postgres=# explain analyze select count(*), review_rating from customer_reviews_hp where review_rating > 1 group by 2;
                                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
 Gather Motion 4:1  (slice2; segments: 4)  (cost=49979.36..49979.50 rows=5 width=12) (actual time=951.830..951.840 rows=4 loops=1)
   ->  GroupAggregate  (cost=49979.36..49979.50 rows=2 width=12) (actual time=951.566..951.567 rows=2 loops=1)
         Group Key: customer_reviews_hp.review_rating
         ->  Sort  (cost=49979.36..49979.37 rows=2 width=12) (actual time=951.556..951.556 rows=8 loops=1)
               Sort Key: customer_reviews_hp.review_rating
               Sort Method:  quicksort  Memory: 132kB
               ->  Redistribute Motion 4:4  (slice1; segments: 4)  (cost=0.18..49979.30 rows=2 width=12) (actual time=75.010..951.527 rows=8 loops=1)
                     Hash Key: customer_reviews_hp.review_rating
                     ->  GroupAggregate  (cost=0.18..49979.20 rows=2 width=12) (actual time=5.211..77.359 rows=4 loops=1)
                           Group Key: customer_reviews_hp.review_rating
                           ->  Index Scan using c_review_rating on customer_reviews_hp  (cost=0.18..41742.09 rows=411854 width=4) (actual time=0.118..817.460 rows=1061311 loops=1
)
                                 Index Cond: (review_rating > 1)
 Planning time: 0.217 ms
   (slice0)    Executor memory: 156K bytes.
   (slice1)    Executor memory: 92K bytes avg x 4 workers, 92K bytes max (seg0).
   (slice2)    Executor memory: 92K bytes avg x 4 workers, 92K bytes max (seg0).  Work_mem: 33K bytes max.
 Memory used:  2047000kB
 Optimizer: Postgres query optimizer
 Execution time: 952.473 ms
(19 rows)

Just use the index, not the Index Only Scan, the execution time increased by nearly 200ms, fell by about 20%.
But it should be noted that, Index Only Scan is not a silver bullet, so Index Only Scan often need to create a joint index, the index itself will be a joint performance issues, such as write, update performance. Need to analyze specific issues, Index Only Scan more than just an optimized path selection.

GP restrictions

  1. Orca optimizer does not support Index Only Scan, GP6 version, only the PG native optimization supports Index Only Scan.
  2. Column deposit table does not support Index Only Scan, Index Only Scan-dependent mechanism to achieve Visibility map, listed in Table apparently can not keep Index Only Scan.
  3. Index Only Scan the GP when explain analyze, Heap Fetches displayed accurately, for example:
create table test (a , b ,c);
create table test (a int, b int ,c int);
insert into test values(generate_series(1,100000),generate_series(1,100000),generate_series(1,100000));
create index a_ind on test(a,b,c);

-- Master上执行:
postgres=# explain analyze select * from test where a > 1 order by a;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.17..2463.87 rows=99990 width=12) (actual time=1.169..84.196 rows=99999 loops=1)
   Merge Key: a
   ->  Index Only Scan using a_ind on test  (cost=0.17..2463.87 rows=99990 width=12) (actual time=0.116..44.373 rows=99999 loops=1)
         Index Cond: (a > 1)
         Heap Fetches: 0
 Planning time: 0.685 ms
   (slice0)    Executor memory: 216K bytes.
   (slice1)    Executor memory: 148K bytes (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution time: 96.809 ms
(11 rows)

Heap Fetchs displayed as 0, and directly connected to the segment for explain analyze:

postgres=# explain analyze select * from test where a > 1 order by a;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using a_ind on test  (cost=0.29..1255.62 rows=33334 width=12) (actual time=0.072..39.561 rows=99999 loops=1)
   Index Cond: (a > 1)
   Heap Fetches: 99999
 Planning time: 0.148 ms
   (slice0)
 Optimizer: Postgres query optimizer
 Execution time: 47.481 ms
(7 rows)

In fact, the presence of Heap Fetches from execution time point of view, Heap Fetches Master items on display right.
This situation needs to rely on to do Visibility Map Vacuum cleaning work. Vacuum analyze done under normal circumstances can not guarantee does not require Heap Fetch.

reference

https://www.postgresql.org/docs/current/indexes-index-only-scans.html

Guess you like

Origin yq.aliyun.com/articles/720749