Doris-07-Detailed introduction to index (prefix index, Ordinal index, Zone Map index, Bitmap index, Bloom Filter index, NGram BloomFilter index, inverted index)

index

introduce

The Apache Doris storage engine uses an LSM tree-like structure to provide fast data writing support. When importing data, the data will first be written into the MemTable corresponding to the Tablet. When the MemTable is full, the data in the MemTable will be flushed to the disk to generate immutable Segment files of no more than 256MB.

MemTable uses the data structure of SkipList to temporarily store data in memory. SkipList sorts the data rows according to Key. Therefore, the Segment files written to the disk are also sorted by Key. The bottom layer of Apache Doris uses column storage to store data, and each column of data is divided into multiple Data Pages.

In order to improve data reading efficiency, the underlying storage engine of Apache Doris provides a variety of index types. Currently, Doris mainly supports two types of indexes:

  1. Built-in smart indexes, including prefix indexes and ZoneMap indexes.
  2. Secondary indexes manually created by users include inverted indexes, bloomfilter indexes, ngram bloomfilter indexes and bitmap indexes.

The process of flushing data from MemTable to disk is divided into two stages:

  • The first stage is to convert the row storage structure in MemTable into a column storage structure in memory, and generate a corresponding index structure for each column;
  • The second stage is to write the converted column storage structure to disk and generate a Segment file.

prefix index

Index generation

Unlike traditional database designs, Doris does not support creating indexes on arbitrary columns. OLAP databases with MPP architecture such as Doris usually process large amounts of data by improving concurrency.

Essentially, Doris' data is stored in a data structure similar to SSTable (Sorted String Table). This structure is an ordered data structure that can be sorted and stored according to specified columns. In this data structure, searching based on sorted columns will be very efficient.

In the three data models of Aggregate, Uniq and Duplicate. The underlying data storage is sorted and stored according to the columns specified in AGGREGATE KEY, UNIQ KEY and DUPLICATE KEY in the respective table creation statements. The prefix index is an index method for quickly querying data based on a given prefix column based on sorting .

We use the first 36 bytes of a row of data as the prefix index of this row of data. When a VARCHAR type is encountered, the prefix index is truncated directly.

A prefix index is a sparse index. During the data flushing process, a prefix index entry will be generated every time a certain data row is written (the default is 1024 rows). The prefix index will encode the prefix field of the first data row in each index interval. The encoding of the prefix field has the same sorting rules as the value of the prefix field, that is, the higher the value of the prefix field is sorted, the corresponding coded value is sorted Also closer to the front. Segment files are sorted by Key, so the prefix index items are also sorted by Key.

The prefix index data in a Segment file is stored in an independent Short Key Page , which contains the encoding data of each prefix index item, the offset of each prefix index item, the footer of the Short Key Page, and the Checksum information of the Short Key Page . The footer of the Short Key Page records the Page type, the size of the prefix index encoding data, the size of the prefix index offset data, the number of prefix index items and other information.

The offset and size of the Short Key Page in the Segment will be saved in the footer of the Segment file so that the prefix index data can be correctly loaded from the Segment file when data is read. The storage structure of the prefix index is shown in the figure.

Query filtering

During data query, the Segment file will be opened, the offset and size of the Short Key Page will be obtained from the footer, then the index data in the Short Key Page will be read from the Segment file, and each prefix index item will be parsed.

If the query filter condition contains a prefix field, you can use the prefix index to quickly filter rows. Query filter conditions will be divided into multiple Key Ranges. The method for row filtering on a Key Range is as follows:

  • Find the row number upper rowid corresponding to the upper bound of the Key Range in the entire Segment row range (find the first row in the Segment that is greater than the upper bound of the Key Range).

    • Encode the prefix field key of the upper bound of the Key Range.
    • Find the lower bound start of the possible range of key. Find the first prefix index item in the prefix index that is equal to (the prefix index item exists and the encoding of the key is the same) or greater than (the prefix index item does not exist and the encoding of the key is the same) the key encoding according to the encoding. If an index item that meets the conditions is found, and the index item is not the first prefix index item, the row number corresponding to the previous prefix index item of the index item is recorded as start (the prefix index is a sparse index, the first one is equal to or Data rows that are larger than the upper bound key of the Key Range may be after the data row corresponding to the previous prefix index item); if an index item that meets the conditions is found and the index item is the first prefix index item, record the corresponding index item The line number is start. If a code whose prefix index item is equal to or greater than key is not found, the line number corresponding to the last prefix index item is recorded as start (the first line equal to or greater than key may be after the last prefix index item).
    • Find the upper bound end of the possible range of key. Find the first binary-encoded index item in the prefix index that is greater than the key based on the encoding. If an index item that meets the condition is found, the line number corresponding to the index item is recorded as end; if a code with a prefix index item greater than the key is not found, the line number of the last line of the segment is recorded as end.
    • Use the binary search algorithm to find the first row with a code greater than the key in the row range between start and end. The row number is marked as upper rowid.

    Note: The prefix index is a sparse index and cannot accurately locate the row where the key is located. It can only roughly locate the possible range of the key, and then use the binary search algorithm to accurately locate the key's location, as shown in the figure.

  • Find the row number lower rowid corresponding to the lower bound of the Key Range in the range of 0 ~ upper rowid (find the first row in the Segment that is equal to or greater than the lower bound of the Key Range). The method of finding the row id corresponding to the upper bound of the Key Range is the same and will not be described again.

  • Get the row range of Key Range. All data rows between upper_rowid and lower_rowid are the row range that needs to be scanned by the current Key Range.

Ordinal index

Index generation

The bottom layer of Apache Doris uses column storage to store data, and each column of data is divided into multiple Data Pages .

When data is refreshed, an Ordinal index item will be generated for each Data Page, which saves the offset of the Data Page in the Segment file, the size of the Data Page, and the starting line number of the Data Page. The Ordinal index items of all Data Pages will be saved. In an Ordinal Index Page , the offset of the Ordinal Index Page in the Segment file and the size of the Ordinal Index Page will be saved in the footer of the Segment file, so that the Data Page can be found through the two-level index when reading data (first, through The footer of the Segment file finds the Ordinal Index Page, and then finds the Data Page through the index items in the Ordinal Index Page).

The Ordinal Index Page contains the following information:

  • All Ordinal index item data;
  • The footer of the Ordinal Index Page: contains information such as the type of the current Page, the size of the Ordinal index item data, the number of Ordinal index items, etc.;
  • Checksum information of the Short Key Page.

If there is only one Data Page in the column, that is, the column has only one Ordinal index entry, then the Ordinal index data of the column does not need to be saved in the Segment file. You only need to add the offset of the unique Data Page in the Segment file and the Data Page The size is stored in the footer of the Segment file. When reading data, you can directly find this unique Data Page through the footer of the Segment file. The storage structure of the Ordinal index is as shown in the figure.

The function of the Ordinal index is to facilitate other types of indexes to use a unified method to find the Data Page, and thus to shield the offset of the Data Page in the Segment file for other types of indexes.

Query filtering

During data query, the Ordinal index data of each column will be loaded.

Determine whether the current column has an Ordinal Index Page through the Meta information of the Ordinal index recorded in the Segment footer, that is, determine whether the current column has multiple Data Pages.

If an Ordinal Index Page exists in the current column, obtain the offset of the Ordinal Index Page in the Segment and the size of the Ordinal Index Page from the Segment footer, then read the Ordinal Index Page data from the Segment file, and parse out each Ordinal index item. You can obtain the starting row number of each Data Page in the current column, the offset of the Data Page in the Segment, and the size of the Data Page through the Ordinal index item.

If there is no Ordinal Index Page in the current column, you can directly obtain the offset of the only Data Page in the current column in the Segment and the size of the Data Page from the Segment footer.

Zone Map Index

Apache Doris will add a Zone Map index to a column of data in the Segment file , and will also add a Zone Map index to each Data Page in the column . The Zone Map index item records the maximum value (max value), minimum value (min value), whether there is a null value (has null), and whether there is a non-null value (has not null) for each column or each Data Page in the column. Information. During initialization, the max value will be set to the minimum value of the current column type, the min value will be set to the maximum value of the current column type, and has null and has not null will be set to false.

Index generation

When data is refreshed, a Zone Map index item will be created for each Data Page. Every time a piece of data is added to the Data Page, the Zone Map index item of the Data Page will be updated.

  • If the added data is null, set the has null flag of the Zone Map index item to true, otherwise, set the has not null flag of the Zone Map index item to true.
  • If the added data is less than the min value of the Zone Map index item, use the current data to update the min value; if the added data is greater than the max value of the Zone Map index item, use the current data to update the max value.

When a Data Page is full, the Zone Map index item of the column will be updated once . If the min value of the Data Page index item is less than the min value of the column index item, the min value of the column index item will be updated using the min value of the Data Page index item. ; If the max value of the Data Page index item is greater than the max value of the column index item, use the max value of the Data Page index item to update the max value of the column index item; if the has null flag of the Data Page index item is true, update the column index The has null flag of the item is true; if the has not null flag of the Data Page index item is true, the has not null flag of the updated column index item is true. The process of updating the Zone Map index is shown in Figure 4.

The Zone Map index item of each Data Page in the column will be serialized and saved in the Zone Map Index Page .

The Zone Map Index Page contains the following information: Zone Map index item data, Zone Map Index Page footer, and Zone Map Index Page Checksum information.

The footer of the Zone Map Index Page contains the type of the current Page, the size of the Zone Map index item data in the current Page, the number of Zone Map index items in the current Page, and the first index item in the current Page among the Zone Map index items of the entire column. serial number and other information.

After a Zone Map Index Page is full, a new Zone Map Index Page will be created to record the subsequent Zone Map index items of the column.

If there are multiple Zone Map Index Pages for a column, the Zone Map index for that column will use a two-level indexing mechanism . The second-level index is multiple Zone Map Index Pages, which store the Zone Map index data of the Data Page. Each Zone Map Index Page will generate an Ordinal index item, and the Ordinal index items of all Zone Map Index Pages will be saved in one Ordinal Index Page (note that the Ordinal Index here is different from the Ordinal Index in Part 3. The Ordinal Index here points to the Zone Map Index Page, while the Ordinal Index in Part 3 points to the Data Page) as a primary index.

Each Ordinal index item consists of two parts: key and value. The key records the sequence number of the first index item in the current Zone Map Index Page in the Zone Map index item of the entire column. The value records the sequence number of the current Zone Map Index Page in the Segment. The offset and size in the file.

The Ordinal Index Page contains the following information: the Ordinal index data of all Zone Map Index Pages, the footer of the Ordinal Index Page, and the Checksum information of the Ordinal Index Page. The footer of the Ordinal Index Page contains the type of the current Page, the size of the index data in the current Page, the number of index items in the current Page, etc.

The offset and size of the primary index Ordinal Index Page in the Segment file will be recorded in the footer of the Segment file. If there is only one Zone Map Index Page for a certain column, no two-level index is needed. The offset and size of this unique Zone Map Index Page in the Block will be recorded in the footer of the Segment file. The storage structure of the Zone Map index is as shown in the figure.

Query filtering

During data query, the Zone Map index data of each column will be loaded, and the Zone Map index data of each Data Page will be parsed.

Determine whether the Zone Map of the current column contains a two-level index through the Meta information of the Zone Map index recorded in the Segment footer.

If there is a two-level index, the Segment footer records the offset and size of the first-level index Ordinal Index Page in the Segment file, loads the first-level index Ordinal Index Page, and parses out the key and value of each Ordinal index item, key The sequence number of the first index item in each Zone Map Index Page among all the Zone Map index items in the entire column is recorded, and the value records the offset and size of each Zone Map Index Page in the Segment file.

Otherwise, the Zone Map index of the current column only contains one Zone Map Index Page, and the offset and size of the Zone Map Index Page in the Segment file are recorded in the Segment footer. The Zone Map index data of each Data Page can be parsed through the Zone Map Index Page, including the maximum value (max value), minimum value (min value), whether there is a null value (has null) and whether there is a non-null value (has not null) information.

The method of using Zone Map to filter Data Page is as follows:

  • The operator of the filter condition is not IS. If the has null of the Zone Map index is true (the Data Page contains NULL values), the Data Page cannot be filtered out.
  • The filter condition is field = value. If the value is between the maximum value and the minimum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field != value. If the value is less than the minimum value of the Zone Map index or the value is greater than the maximum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field < value. If the value is greater than the minimum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field <= value. If value is greater than or equal to the minimum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field > value. If the value is less than the maximum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field >= value. If the value is less than or equal to the maximum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field IN {value1, value2, …}. If at least one value among value1, value2, ... is between the maximum value and minimum value of the Zone Map index, the Data Page cannot be filtered out.
  • The filter condition is field IS NULL. If the has null of the Zone Map index is true (the Data Page contains NULL values), the Data Page cannot be filtered out.
  • The filter condition is field IS NOT NULL. If the has not null of the Zone Map index is true (the Data Page contains non-NULL values), the Data Page cannot be filtered out.

For Data Pages that are not filtered by the Zone Map index, you can use the Ordinal index to quickly locate the row numbers of the starting rows of these Data Pages and obtain the row range of these Data Pages. Use the Ordinal index item corresponding to the Data Page to quickly obtain the start row number start of the current Data Page, and use the next Ordinal index item to quickly obtain the start row number end of the next Data Page, with the left-closed right-open interval [start , end) is the row range of the current Data Page.

Bitmap index

In order to speed up data query, Apache Doris supports users to add Bitmap indexes for certain fields . Bitmap index consists of two parts:

  • Ordered dictionary: Stores all the different values ​​in a column in order.
  • Roaring bitmap of dictionary value: Saves the Roaring bitmap of each value in the ordered dictionary, indicating the row number of the dictionary value in the column.

For example: As shown in the figure, a column of data is [x, x, y, y, y, z, y, x, z, x] and contains a total of 10 rows. Then the ordered dictionary of the Bitmap index of the column of data is { x, y, z}, the corresponding bitmaps of x, y, z are:

Bitmap of x: [0, 1, 7, 9]

Bitmap of y: [2, 3, 4, 6]

Bitmap of z: [5, 8]

Create statement:

CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING BITMAP COMMENT 'balabala';

Index generation

When data is flushed, a Bitmap index will be created for the column specified by the user. Each time a value is added to a column, the Bitmap index of the current column is updated. Find whether the added value already exists in the ordered dictionary of the Bitmap index. If the value added this time already exists in the ordered dictionary of the Bitmap index, directly update the Roaring bitmap corresponding to the dictionary value. If the value added this time already exists, the Roaring bitmap corresponding to the dictionary value will be directly updated. If a value does not exist in the ordered dictionary indexed by Bitmap, the value is added to the ordered dictionary and a Roaring bitmap is created for the dictionary value. Of course, NULL values ​​will also have separate Roaring bitmaps.

Dictionary data of Bitmap index and Roaring bitmap data are stored separately (one-to-one correspondence).

The dictionary values ​​of the Bitmap index in the column will be saved in the Dict Page in order . The Dict Page contains the following information: the dictionary data of the Bitmap index, the footer of the Dict Page, and the Checksum information of the Dict Page. The footer of the Dict Page contains the type of the current Page, the size of the dictionary data of the Bitmap index in the current Page, the number of dictionary values ​​​​in the Bitmap index in the current Page, and the number of the first dictionary value in the current Page in the Bitmap index dictionary value of the entire column. Serial number and other information. Dictionary data indexed by Bitmap will be compressed according to the LZ4F format.

After a Dict Page is full, a new Dict Page will be created to record the subsequent dictionary data of the column.

If there are multiple Dict Pages in a column, a two-level indexing mechanism will be used : the second-level index is multiple Dict Pages, which save the dictionary data of the Bitmap index. Each Dict Page generates a Value index item, and each Value The index item records the encoding of the first dictionary value in the current Dict Page and the offset and size of the current Dict Page in the Segment file. All Dict Page Value index items will be saved in a Value Index Page as a primary index.

The Value Index Page contains the following information: Value index data of all Dict Pages, footer of the Value Index Page, and Checksum information of the Value Index Page. The footer of the Value Index Page contains the type of the current Page, the size of the index data in the current Page, the number of index items in the current Page, etc.

The offset and size of the primary index Value Index Page in the Segment file will be recorded in the footer of the Segment file. If there is only one Dict Page for a certain column, no two-level index is needed. The offset and size of this unique Dict Page in the Segment file will be recorded in the footer of the Segment file. The storage structure of dictionary data indexed by Bitmap is as shown in the figure.

The Roaring bitmap data of the Bitmap index in the column will be saved in the Bitmap Page .

The Bitmap Page contains the following information: the Roaring bitmap data of the Bitmap index, the footer of the Bitmap Page, and the Checksum information of the Bitmap Page.

The footer of the Bitmap Page contains the type of the current Page, the size of the Roaring bitmap data indexed by the Bitmap in the current Page, the number of Roaring bitmaps indexed by the Bitmap in the current Page, and the Bitmap index of the first Roaring bitmap in the entire column in the current Page The serial number and other information in the Roaring bitmap. The Roaring bitmap data of the Bitmap index is not compressed.

After a Bitmap Page is full, a new Bitmap Page will be created to record the subsequent Roaring bitmap data of the column.

If there are multiple Bitmap Pages for a certain column, a two-level indexing mechanism will be used. The second-level index is multiple Bitmap Pages, which store the bitmap data of the Bitmap index, and each Bitmap Page generates an Ordinal index item, and all Ordinal index items of the Bitmap Page will be stored in an Ordinal Index Page as a first-level index .

Each Ordinal index item consists of key and value. The key records the serial number of the first Roaring bitmap in the current Bitmap Page in the BitMap index Roaring bitmap of the entire column, and the value records the current Bitmap Page in the Segment file. The offset and size.

The Ordinal Index Page contains the following information: Ordinal index data of all Bitmap Pages, footer of the Ordinal Index Page, and Checksum information of the Ordinal Index Page. The footer of the Ordinal Index Page contains the type of the current Page, the size of the index data in the current Page, the number of index items in the current Page, etc.

The offset and size of the primary index Ordinal Index Page in the Segment file will be recorded in the footer of the Segment file. If there is only one Bitmap Page for a certain column, no two-level index is needed. The offset and size of this unique Bitmap Page in the Segment file will be recorded in the footer of the Segment file. The storage structure of Roaring bitmap data of Bitmap index is as shown in the figure.

Query filtering

During data query, the Bitmap index data of the column will be loaded, and the ordered dictionary and Roaring bitmap data will be parsed.

  • First, judge whether the dictionary of the Bitmap index of the current column contains a two-level index through the dictionary Meta information of the Bitmap index recorded in the Segment footer. If it contains a two-level index, the Segment footer records the first-level index Value Index Page in the Block. offset and size, first load the first-level index Value Index Page, and parse out each Value index item, obtain the first dictionary value in each Dict Page and the offset and size of each Dict Page in the Segment file; otherwise, The Bitmap index of the current column only contains one Dict Page, and the offset and size of the Dict Page in the Segment file are recorded in the Segment footer. Each dictionary value can be parsed through Dict Page.
  • Then, use the Roaring bitmap Meta information of the Bitmap index recorded in the Segment footer to determine whether the Roaring bitmap of the Bitmap index of the current column contains a two-level index. If it contains a two-level index, the first-level index Ordinal Index Page is recorded in the Segment footer. For the offset and size in the Segment file, first load the first-level index Ordinal Index Page, and parse out each Ordinal index item, and obtain the first Roaring bitmap in each Bitmap Page in the Bitmap index Roaring bitmap of the entire column. The serial number and the offset and size of each Bitmap Page in the Segment file; otherwise, the Bitmap index of the current column only contains one Bitmap Page, and the Segment footer records the offset and size of the Bitmap Page in the Segment file. The Roaring bitmap corresponding to each dictionary value can be parsed through the Bitmap Page.

Dict Page and Bitmap Page are loaded only when Bitmap index is actually used for data filtering.

The method of using a certain query filter condition to filter rows is as follows:

  • The filter condition is field = value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. If the dictionary value found is exactly equal to value, the ordinal bitmap is read from the Bitmap Page, and the bitmap represents the range of rows left after filtering by the query condition.
  • The filter condition is field != value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. If the dictionary value found is exactly equal to value, the ordinal bitmap is read from the Bitmap Page, and the bitmap represents the range of rows that need to be filtered out.
  • The filter condition is field < value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. Read the first ordinal bitmaps from the Bitmap Page. The union of these bitmaps represents the range of rows left after filtering by the query condition.
  • The filter condition is field <= value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. If the dictionary value found is exactly equal to value, the previous ordinal + 1 bitmaps are read from the Bitmap Page; if the dictionary value found is greater than value, the previous ordinal bitmaps are read from the Bitmap Page. These bitmaps The union of represents the range of rows left after filtering by the query conditions.
  • The filter condition is field > value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. If the dictionary value found is exactly equal to value, all bitmaps after the ordinal bitmap are read from the Bitmap Page; if the dictionary value found is greater than value, all bitmaps after the ordinal bitmap are read from the Bitmap Page. Bitmap, the union of these bitmaps represents the range of rows left after filtering by this query condition.
  • The filter condition is field >= value. Find the first dictionary value that is equal to or greater than value from the Dict Page, and obtain the ordinal of the dictionary value in the ordered dictionary. Read all bitmaps after ordinal from the Bitmap Page. The union of these bitmaps represents the range of rows left after filtering by the query condition.

Applicable scene

Apache Doris supports creating Bitmap indexes on specified columns when creating a table. You can also execute the Alter Table command to add a Bitmap index to an already created table.

 ALTER TABLE table_name ADD INDEX index_name (column_name) USING BITMAP COMMENT ‘’; 

Currently, Bitmap indexes are only supported for fields of TINYINT, SMALLINT, INT, UNSIGNEDINT, BIGINT, LARGEINT, CHAR, VARCHAR, DATE, DATETIME, BOOL, and DECIMAL types. Bitmap indexes are not supported for other types of fields. Bitmap indexes are more suitable for equivalent queries or range queries on columns with lower cardinality.

Bloom Filter Index

Apache Doris supports users to add Bloom Filter indexes to fields with relatively large value distinctions . Bloom Filter indexes are generated according to the granularity of the Data Page . When data is written, each value written to the Data Page will be recorded. When a Data Page is full, a Bloom Filter index will be generated for the Data Page based on all the different values ​​of the Data Page. When querying data, the query conditions are filtered on fields with Bloom Filter indexes set. When the Bloom Filter of a Data Page does not hit, it means that there is no required data in the Data Page. This can quickly filter the Data Page and reduce unnecessary queries. necessary data to read.

Create statement:

CREATE TABLE IF NOT EXISTS sale_detail_bloom  (
    sale_date date NOT NULL COMMENT "销售时间",
    customer_id int NOT NULL COMMENT "客户编号",
    saler_id int NOT NULL COMMENT "销售员",
    sku_id int NOT NULL COMMENT "商品编号",
    category_id int NOT NULL COMMENT "商品分类",
    sale_count int NOT NULL COMMENT "销售数量",
    sale_price DECIMAL(12,2) NOT NULL COMMENT "单价",
    sale_amt DECIMAL(20,2)  COMMENT "销售总金额"
)
Duplicate  KEY(sale_date, customer_id,saler_id,sku_id,category_id)
PARTITION BY RANGE(sale_date)
(
PARTITION P_202111 VALUES [('2021-11-01'), ('2021-12-01'))
)
DISTRIBUTED BY HASH(saler_id) BUCKETS 10
PROPERTIES (
"replication_num" = "3",
"bloom_filter_columns"="saler_id,category_id",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "MONTH",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-2147483648",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "P_",
"dynamic_partition.replication_num" = "3",
"dynamic_partition.buckets" = "3"
);

Index generation

When data is refreshed, a Bloom Filter index item will be created for each Data Page. Apache Doris uses the Block-based Bloom Filter algorithm. The Bloom Filter index data corresponding to each Data Page will be divided into multiple Blocks. The data length of each Block is BYTES_PER_BLOCK (default is 32 bytes, 256 bits in total), and each Bit in the Block will be initialized to 0. When writing data to the Data Page, each different value will set the BITS_SET_PER_BLOCK (default value is 8) Bits in a Block to 1. The structure of the Bloom Filter index is shown in the figure.

The Bloom Filter index data length BLOOM_FILTER_BIT of a single Data Page is calculated by the following formula:
BLOOM _ FILTER _ BIT = − N ∗ ln ( FPP ) ( ln ( 2 ) 2 ) BLOOM\_FILTER\_BIT = -N * ln(FPP) (ln (2) ^ 2)BLOOM_FILTER_BIT=Nln(FPP)(ln(2)2 )
Among them, N represents the number of different values ​​in the current Data Page; FPP (False Positive Probablity) represents the expected false positive rate, and the default value is 0.05. (Note: The calculated Bloom Filter data length (unit: bit) must be an integer power of 2.)

In Bloom Filter, the length of each Block is BYTES_PER_BLOCK (32 bytes). Therefore, the number of Blocks in Bloom Filter is calculated by the following formula:
BLOCK _ NUM = ( BLOOM _ FILTER _ BIT ÷ 8 ) ÷ BYTES _ PER _ BLOCK ; BLOCK\_NUM = (BLOOM\_FILTER\_BIT ÷ 8) ÷ BYTES\_PER\_BLOCK;BLOCK_NUM=(BLOOM_FILTER_BIT÷8)÷B Y TES _ PER _ B L OC K ;
The method to generate Bloom Filter index items for Data Page is as follows:

  • For each different value in the Data Page, a 64-bit HASH_CODE is calculated. In Apache Doris, the Hash strategy of Bloom Filter is HASH_MURMUR3.

  • Take the high 32 bits of HASH_CODE to calculate the Block corresponding to the current value in the Bloom Filter. The method is as follows:
    BLOCK _ INDEX = ( HASH _ CODE >> 32 ) & ( BLOCK _ NUM − 1 ) BLOCK\_INDEX = (HASH\_CODE > > 32) \& (BLOCK\_NUM - 1)BLOCK_INDEX=(HASH_CODE>>32)&(BLOCK_NUM1 )
    Among them, BLOCK_INDEX represents the serial number of Block, and BLOCK_NUM is an integer power of 2, so BLOCK_INDEX must be less than BLOCK_NUM.

  • Take the lower 32 bits of HASH_CODE to calculate which Bits in the Block will be set to 1 by the current value. The method is as follows:

    uint32_t key = (uint32_t)HASH_CODE
    uint32_t SALT[8] = {0x47b6137b, 0x44974d91, 0x8824ad5b, 0xa2b7289d,  0x705495c7, 0x2df1424b, 0x9efc4947, 0x5c6bfb31};
    uint32_t masks[BITS_SET_PER_BLOCK];
    for (int i = 0; i < BITS_SET_PER_BLOCK; ++i) {
            masks[i] = key * SALT[i];
            masks[i] = masks[i] >> 27;
            masks[i] = 0x1 << masks[i];
    }
    

    Among them, masks[i] contains 32 Bits, of which only 1 Bit is set to 1, and the other 31 Bits are all 0.

    Take the bitwise OR of masks[i] and the i-th 32Bit in the Block to update the Bloom Filter index data of the Data Page. (A Block contains 256 Bits, that is, BITS_SET_PER_BLOCK 32 Bits)

    uint32_t* BLOCK_OFFSET =  BLOOM_FILTER_OFFSET + BYTES_PER_BLOCK * BLOCK_INDEX
    for (int i = 0; i < BITS_SET_PER_BLOCK; ++i) {
    	*(BLOCK_OFFSET + i) |= masks[i];
    }
    

The Bloom Filter index item sets a separate flag to indicate whether the Data Page contains NULL values.

The Bloom Filter index item of each Data Page in the column will be saved in the Bloom Filter Index Page . The Bloom Filter Index Page contains the following information: Bloom Filter index item data, footer of the Bloom Filter Index Page, and Checksum information of the Bloom Filter Index Page. The footer of the Bloom Filter Index Page contains the type of the current Page, the size of the Bloom Filter index item data in the current Page, the number of Bloom Filter index items in the current Page, and the first index item in the current Page in the Bloom Filter index items of the entire column serial number and other information.

After a Bloom Filter Index Page is full, a new Bloom Filter Index Page will be created to record the subsequent Bloom Filter index items of the column. If a column has multiple Bloom Filter Index Pages, the Bloom Filter index for this column will use a two-level indexing mechanism. The second-level index is multiple Bloom Filter Index Pages, which store the Bloom Filter index data of the Data Page. Each Bloom Filter Index Page generates an Ordinal index item. The Ordinal index items of all Bloom Filter Index Pages will be saved in an Ordinal Index Page as a primary index.

Each Ordinal index item consists of key and value. The key records the serial number of the first index item in the current Bloom Filter Index Page in the Bloom Filter index item of the entire column, and the value records the current Bloom Filter Index Page in the Segment The offset and size in the file. The Ordinal Index Page contains the following information: Ordinal index data of all Bloom Filter Index Pages, footer of the Ordinal Index Page, and Checksum information of the Ordinal Index Page. The footer of the Ordinal Index Page contains the type of the current Page, the size of the index data in the current Page, the number of index items in the current Page, etc. The offset and size of the primary index Ordinal Index Page in the Segment file will be recorded in the footer of the Segment file. If there is only one Bloom Filter Index Page for a certain column, a two-level index is not required. The offset and size of the unique Bloom Filter Index Page in the Segment file will be recorded in the footer of the Segment file. The storage structure of the Bloom Filter index is shown in the figure.

Query filtering

During data query, the Bloom Filter index data of the column will be loaded, and the Bloom Filter index items of each Data Page will be parsed. First, judge whether the Bloom Filter of the current column contains a two-level index through the Meta information of the Bloom Filter index recorded in the segment footer. If it contains a two-level index, the offset of the first-level index Ordinal Index Page in the segment file is recorded in the segment footer and size, first load the first-level index Ordinal Index Page, and parse out the key and value of each Ordinal index item. The key records the first index item in each Bloom Filter Index Page and all the Bloom Filter index items in the entire column The serial number in the value records the offset and size of each Bloom Filter Index Page in the Segment file; otherwise, the Bloom Filter index of the current column contains only one Bloom Filter Index Page, and the Segment footer records the Bloom Filter Index Page in the Segment The offset and size in the file. The Bloom Filter index data of each Data Page can be parsed through the Bloom Filter Index Page.

The method to determine whether a certain value hits the Bloom Filter is as follows:

  • First, based on the HASH_MURMUR3 method, the 64-bit HASH_CODE is calculated for the value of the query filter condition;
  • Then, use the same method used to generate Bloom Filter index data to calculate the Block corresponding to the value in the Bloom Filter and the corresponding BITS_SET_PER_BLOCK Bits in the Block.
  • Determine whether the values ​​of the BITS_SET_PER_BLOCK Bits corresponding to the Block in the Bloom Filter index data are all 1. If the BITS_SET_PER_BLOCK Bit values ​​in the corresponding Block are all 1, it means that the Bloom Filter hits, and this value may exist in the Data Page corresponding to the Bloom Filter; otherwise, it means that the Bloom Filter does not hit, and the value is in the corresponding Data Page of the Bloom Filter. It must not exist in the Data Page.

When querying data, the query filter conditions ("=", "IS" or "IN" statements) filter each Data Page in sequence on the columns with Bloom Filter indexes. When performing NULL value query, you can directly use the NULL value flag in the Bloom Filter index item to perform Data Page filtering. When querying non-NULL values, the method to filter the Data Page using query filter conditions is as follows:

  • The filter condition is field = value. If the value does not hit the Bloom Filter corresponding to a Data Page, the Data Page can be filtered out.
  • The filter condition is field IN {value1, value2, …}. If all the values ​​in value1, value2, ... do not hit the Bloom Filter corresponding to a certain Data Page, the Data Page can be filtered out.

The filter condition is field IS NULL. If the NULL value does not hit the Bloom Filter corresponding to a Data Page, the Data Page can be filtered out.

Applicable scene

Apache Doris supports creating Bloom Filter indexes on specified columns when creating a table. You can also execute the Alter Table command to add a Bloom Filter index to an already created table.

 ALTER TABLE table_name SET ("bloom_filter_columns"="c1, c2, c3"); 

Currently, Bloom Filter indexes are only supported for fields of SMALLINT, INT, UNSIGNEDINT, BIGINT, LARGEINT, CHAR, VARCHAR, DATE, DATETIME and DECIMAL types. Bloom Filter indexes are not supported for other types of fields. For fields for which Bloom Filter indexes have been created, the Bloom Filter index will be used to filter the Data Page only when the query condition is the "=", "is" or "in" statement. Bloom Filter index is more suitable for equivalent query scenarios on columns with high cardinality .

NGram BloomFilter index

In order to improve the query performance of like, the NGram BloomFilter index is added, and its implementation mainly refers to ClickHouse's ngrambf.

  1. NGram BloomFilter only supports string columns
  2. NGram BloomFilter index and BloomFilter index are mutually exclusive, that is, the same column can only be set to one of the two.
  3. The size of NGram and the number of bytes of BloomFilter can be adjusted according to the actual situation. If NGram is relatively small, the size of BloomFilter can be increased appropriately.
  4. If you want to check whether a query hits the NGram Bloom Filter index, you can check it through the profile information of the query.

Specify when creating the table:

CREATE TABLE `table3` (
    `siteid` int(11) NULL DEFAULT "10" COMMENT "",
    `citycode` smallint(6) NULL COMMENT "",
    `username` varchar(32) NULL DEFAULT "" COMMENT "",
    INDEX idx_ngrambf (`username`) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="256") COMMENT 'username ngram_bf index'
) ENGINE=OLAP
AGGREGATE KEY(`siteid`, `citycode`, `username`) COMMENT "OLAP"
DISTRIBUTED BY HASH(`siteid`) BUCKETS 10
PROPERTIES (
    "replication_num" = "1"
);

-- PROPERTIES("gram_size"="3", "bf_size"="256"),分别表示gram的个数和bloom filter的字节数。
-- gram的个数跟实际查询场景相关,通常设置为大部分查询字符串的长度,bloom filter字节数,可以通过测试得出,通常越大过滤效果越好,可以从256开始进行验证测试看看效果。当然字节数越大也会带来索引存储、内存cost上升。
-- 如果数据基数比较高,字节数可以不用设置过大,如果基数不是很高,可以通过增加字节数来提升过滤效果。

Inverted index

introduce

Starting from version 2.0.0, Doris supports inverted indexes, which can be used for full-text retrieval of text types, equivalent range queries of ordinary numerical date types, and to quickly filter out rows that meet conditions from massive data.

Inverted index: Inverted index is a commonly used indexing technology in the field of information retrieval. It divides the text into words and builds a word->document number index, which can quickly find in which documents a word appears.

Doris uses CLucene as the underlying inverted index library. CLucene is a high-performance, stable Lucene inverted index library implemented in C++. Doris further optimizes CLucene, making it simpler, faster, and more suitable for database scenarios.

In Doris's inverted index implementation, one row of the table corresponds to a document and one column corresponds to a field in the document. Therefore, the inverted index can be used to quickly locate the row containing it based on the keyword to achieve the purpose of accelerating the WHERE clause.

Different from other indexes in Doris, the inverted index uses independent files at the storage layer, which has a logical correspondence with the segment file, but the stored files are independent of each other. The advantage of this is that you can create and delete indexes without rewriting tablet and segment files, which greatly reduces processing overhead.

A brief introduction to the functions of Doris inverted index is as follows:

  • Added full-text search of string type
    • Supports string full-text search, including matching multiple keywords MATCH_ALL at the same time and matching any keyword MATCH_ANY
    • Support full-text search of string array type
    • Support English and Chinese word segmentation
  • Accelerates common equivalent and range queries, covers the function of bitmap index, and will replace bitmap index in the future
    • Supports string, numeric, date and time types =, !=, >, >=, <, <= quick filtering
    • Supports string, number, date and time array types =, !=, >, >=, <, <=
  • Support perfect logical combination
    • New index pushes down OR NOT logic
    • Supports any AND OR NOT combination of multiple conditions
  • Flexible and fast index management
    • Supports defining inverted index on created table
    • Supports adding inverted indexes to existing tables, and supports incremental construction of inverted indexes without rewriting existing data in the table
    • Support to delete the inverted index on the existing table without rewriting the existing data in the table

use

Define the inverted index when creating the table:

CREATE TABLE table_name
(
  columns_difinition,
  INDEX idx_name1(column_name1) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment']
  INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment']
)
table_properties;
  • USING INVERTED is required and is used to specify that the index type is an inverted index.
  • PROPERTIES is optional and is used to specify additional properties for the inverted index. Currently there is an attribute parser that specifies the word segmenter.
    • By default, not specified means no word segmentation
    • english is English word segmentation, which is suitable when the indexed column is in English. It uses spaces and punctuation marks to segment words and has high performance.
    • chinese is a Chinese word segmentation, which is suitable for indexed columns that contain Chinese or a mixture of Chinese and English. The jieba word segmentation library is used, and its performance is lower than that of english word segmentation.
  • COMMENT is optional and is used to specify comments
CREATE TABLE table_name
(
    columns_difinition,
    INDEX idx_name1(column_name1) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment']
    INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment']
)
table_properties;

Add an inverted index to an existing table:

-- 语法1
CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment'];
-- 语法2
ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese")] [COMMENT 'your comment'];

Delete the inverted index:

-- 语法1
DROP INDEX idx_name ON table_name;
-- 语法2
ALTER TABLE table_name DROP INDEX idx_name;

Use inverted index to speed up queries:

-- 1. 全文检索关键词匹配,通过MATCH_ANY MATCH_ALL完成
SELECT * FROM table_name WHERE column_name MATCH_ANY | MATCH_ALL 'keyword1 ...';

-- 1.1 logmsg中包含keyword1的行
SELECT * FROM table_name WHERE logmsg MATCH_ANY 'keyword1';

-- 1.2 logmsg中包含keyword1或者keyword2的行,后面还可以添加多个keyword
SELECT * FROM table_name WHERE logmsg MATCH_ANY 'keyword2 keyword2';

-- 1.3 logmsg中同时包含keyword1和keyword2的行,后面还可以添加多个keyword
SELECT * FROM table_name WHERE logmsg MATCH_ALL 'keyword2 keyword2';

-- 2. 普通等值、范围、IN、NOT IN,正常的SQL语句即可,例如
SELECT * FROM table_name WHERE id = 123;
SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00';
SELECT * FROM table_name WHERE op_type IN ('add', 'delete');

Guess you like

Origin blog.csdn.net/qq_44766883/article/details/131353636