PostgreSQL technology insider (seven) index scan

Index overview

A database index is a database object that reorganizes data in certain fields of a table. By using indexes, some operations of the database can be greatly accelerated, and the idea behind it is also very simple: space for time.

The index in the database can be compared to the catalog of a book. When we query a certain information in the book, we can quickly locate the corresponding chapter with the help of the catalog, thus avoiding the need to browse through the entire book and speed up the search. the process of.

index classification

Common indexes in Postgres generally include the following types, among which BTree index is the most widely used, and it is also the default option when creating an index.

index type index name illustrate
btree B+ tree index The index type implemented by the B+ tree has rich index features (multi-value, sorting, clustering, etc.), and the operation performance of adding, deleting, and modifying is stable and widely used. It is the default index type.
hash hash index Hash index is based on hash table structure and is suitable for equality comparison query. The query speed of Hash index is very fast, but it does not support range query and sorting operations.
gin Universal Inverted Index Can be used to support queries of various text types. It is suitable for text search, array, range query and other scenarios. The GIN index aggregates the matching values ​​during the query, so the query speed is faster, but the update and insertion speed of the index is slower.
gist Universal Search Tree Index GiST index is a general spatial index used to support spatial queries and range queries. GiST indexes can handle complex spatial data types such as points, lines, and polygons, so they are suitable for scenarios such as geographic information systems (GIS). Queries with GiST indexes are faster, but updates and inserts into the indexes are slower.

Example of an index scan

Let's use an example to experience the impact of indexes on the performance of table scans. We first create a test table, for example called articles, and insert some test data into it.

CREATE TABLE articles (  id SERIAL8 NOT NULL PRIMARY KEY,  a text,  b text,  c text);INSERT INTO articles(a, b, c)SELECTmd5(random()::text),md5(random()::text),md5(random()::text)from (  SELECT * FROM generate_series(1,1000000) AS id) AS x;

We query a piece of data from this table, for example, to find the data of a = '65c966eb2be73daf418c126df8dc33b5', the query plan is as follows:

It can be seen that the sequential scan (Seq Scan) is used here, and the cost (Cost) is 22450. If we add an index (default is BTree) to field a, create index on articles (a), and then execute this SQL statement, the query plan is as follows: You can see that Index Scan is used here, and the cost It is 8. Compared with 22450 for sequential scanning, the query cost is greatly reduced, and the query performance is thus greatly improved.

scan method

sequential scan

The query optimizer will use the sequential scan method when a query is made on a field that is not indexed, or when it is judged that the query will return most of the data. Still take the previous articles table as an example, here we query the data with id > 100, which contains most of the data in the table, so although there is an index on the id column, sequential scanning will still be used.

index scan

If it is judged that the query will hit a very small amount of data, the query optimizer will choose the index scan method, which has been shown in the above example. The following is an example of scanning an index range. You can see that the hit data accounts for a small amount of table data, and index scanning is the most efficient.

bitmap index scan

Although index scans generally have less data, this scan requires random IO operations, so it is not always less expensive than the sequential IO operations used by sequential scans. So in hitting moderate data (between few and many), sequential scans and index scans have their own shortcomings. In this case, bitmap index scanning can generally be used. The principle is to order the pages that need to be accessed and convert random IO to sequential IO.

The general operation steps are as follows:

  • Use the index to scan all TIDs that meet the condition

  • Build a bitmap with the list of TIDs in order of page access

  • When reading data records, the same page only needs to be read once

The following figure describes several table data scanning methods in Postgres, and the query optimizer will select the optimal scanning method according to the calculation cost.

index physical storage

The index in postgres is a secondary index, that is, on physical storage, the index data and the corresponding table data are separated. Each specific index object is stored as an independent relational table, and can be queried in the pg_class system table.

Taking BTree as an example, its general structure is as follows: General characteristics of B+ tree:

  • Fewer tree levels: Each internal node no longer stores data, so it can store more key values, resulting in fewer tree levels and faster query data (reducing random IO).

  • The query speed is more stable: because all data is stored on the leaf nodes, the number of times of each search (the height of the tree times random IO operations) is the same, and the query speed is also more stable.

  • It is more convenient to traverse the query: the leaf node data of B+Tree constitutes an ordered linked list. When traversing the query, first locate the position of the first key value, and then access all the data along the linked list.

Each node in BTree is physically stored as a page, and the structure of the page is similar to that of the heap table, as follows:

Taking BTree as an example, the content in the index can be understood as a mapping from key value to data tuple TID, where TID consists of a block number and offset.

index creation

When the user uses the create index on table (col) statement, it will go through stages such as syntax analysis and permission checking, then establish index relationships, update system metadata, and finally use the data in the table to build a complete B-Tree index.

The main function call path is as follows:

ProcessUtility() Utility语句的处理入口DefineIndex() 定义一个索引(异常判断,准备index_create()的输入参数)index_create() 创建一个索引(建立关系文件并更新系统表数据)index_build() 构建索引的外层接口bt_build() B-Tree的索引构建逻辑
ProcessUtility The unified processing entry of the database Utility statement, for creating an index, it is forwarded to the DefineIndex function to continue processing
DefineIndex The main function is to perform various permissions and abnormal situation judgments, and initialize each parameter required by the index_create function
index_create The main function is to establish index relations and system table records
index_build Create the peripheral interface of the index, mainly call the ambuild function corresponding to the index
btbuild BTree index construction logic

Taking BTree as an example, using the data in the table to build a B-Tree index is generally divided into two steps. One is to sort the data in the table, and the other is to traverse the entire BTree from bottom to top according to the ordered data tuples.

Here mainly different ambuild methods will be called for different index types. The method corresponding to BTree is btbuild. The following figure shows the access relationship of index-related interfaces. Different index access methods are abstracted by IndexAM and called by the upper-layer executor.

index scan

The three steps of an index scan in the executor are

  • ExecInitIndexScan

  • ExecIndexScan

  • ExecEndIndexScan

ExecInitIndexScan

Mainly responsible for initializing the index scan state structure IndexScanState The core task is to convert the filter conditions of the index scan into various types of scan keys ScanKey.

  • ScanKey mainly stores the information of index columns, operation functions and functions to be compared. ScanKey describes a complete filter condition and is used for index scanning

  • But if the filter condition is a complex expression, iss_RuntimeKeys is introduced to handle it

The main fields of IndexScanState:

type field describe
List* indexqualorig index filter
ScanKeyData iss_ScanKeys Qual's right-hand operator is a constant
IndexRuntimeKeyInfo iss_RuntimeKeys If the right operator of Qual is not a constant, and the value of the expression needs to be dynamically calculated during execution, the expression information will be stored in IndexRuntimeKey

The main focus of the Init phase is the ExecIndexBuildScanKeys method, which is used to convert the scan filter conditions into various types of scan keys ScanKeys.

Index filter conditions are divided into the following five situations:

  • Constants or common operations, directly stored in ScanKey

  • Non-constant value expression operation. At this time, the executor node cannot get the result of the expression at the initial stage, and needs to be temporarily stored in iss_RuntimeKeys

  • RowCompareExpr, for example, the filter condition is "(indexkey1, indexkey2) > (1, 2)", which means a combination of multiple filter conditions, traverse all sub-filter conditions, and store them in iss_ScanKeys or iss_RuntimeKeys respectively

  • ScalarArrayOpExpr, for example, the filter condition is "indexkey1 = ANY (1,10,20)", if the index supports processing array-based searches, store constants in ScanKey or RuntimeKey respectively, if array searches are not supported, such as Hash, GIN, Gist indexes , then store the filter conditions in arrayKeys

  • NullTest, whether the index key is NULL, such as _"indexkey IS NULL/IS NOT NULL", just set the corresponding value of ScanKey_

ExecIndexScan

Responsible for reading the tuple based on the index and returning it to the upper node of the executor. The function IndexNext continuously scans the index, reads the tuples, and encapsulates the tuples into TupleTableSlots and passes them to the upper nodes.

  • The main parameter of this function is IndexScanDesc, which saves the status information during the scan process

  • Use xs_heap_continue to judge whether it is on the HOT chain, if yes, do nothing

  • Otherwise call index_getnext_tid to return a TID

    • Find the inner interface function corresponding to amgettuple in the pg_am table

    • Call this function (such as btgettuple in BTree), and return a TID according to the specific index implementation
      • Call index_fetch_heap to get the actual tuple

ExecEndIndexScan

It is mainly responsible for cleaning, releasing the memory context for calculating RuntimeKey, and closing related index tables and data tables.

Guess you like

Origin blog.csdn.net/m0_54979897/article/details/130198903