Note|ElasticSearch|ES page turning performance optimization method

Optimization methods at the query syntax level

1. If it is only documented doc_ic, it can be configured"_source": false

If we only need the document doc_idand not _sourceany fields in the document, then we can add configuration "_source": false. At this point, ES will only need to execute the query phase of the query instead of the fetch phase, thus greatly speeding up the query.

before fixing:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    },
    "_source": ""
}

After modification:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    },
    "_source": false
}

2. Change query to filter statement

Use FilterContext instead of QueryConext, because the performance of the filter query clause is better than that of the query query clause. The filter query clause does not need to calculate the correlation score, but the query query clause needs to calculate the correlation branch. The result of the filter query clause can be cache.

before fixing:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "term": {
    
    
            "field_name": "field_value"
        }
    }
}

After modification:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "bool": {
    
    
            "filter": {
    
    
                "term": {
    
    
                    "field_name": "field_value"
                }
            }
        }
    }
}

3. On the premise of not timing out, increase the number of records obtained by each scroll

We can sizemake each scroll return more data by increasing it, thereby reducing the number of query, fetch, and response stages of the query and improving efficiency. But you need to pay attention to increase the timeout to avoid timeout because each scroll will return more data.

before fixing:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    },
    "size": 1000
}

After modification:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    },
    "size": 10000
}

4. _docSort

In the official documentation of ElasticSearch, it is explained _docthat it means sorting by index order, which is the most efficient sorting method. If you don't care about the order in which documents are returned, you should _docsort to improve query performance. This works especially well scrollwhen .

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/sort-search-results.html

before fixing:

{
    
    
    "query": {
    
    
        "bool": {
    
    
            "filter": {
    
    
                "term": {
    
    
                    "field_name": "field_value"
                }
            }
        }
    }
}

After modification:

{
    
    
    "query": {
    
    
        "bool": {
    
    
            "filter": {
    
    
                "term": {
    
    
                    "field_name": "field_value"
                }
            }
        }
    },
    "sort": ["_doc"]
}

5. Reduce unnecessary query fields (use _sourcefilter )

By reducing unnecessary fields, the time consumption of the fetch phase of the query can be effectively reduced.

before fixing:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    }
}

After modification:

GET /my-index-000001/_search?
{
    
    
    "query": {
    
    
        "match_all": {
    
    }
    },
    "_source": ["need-field-1", "need-field-2"]
}

6. Avoid fuzzy matching

7. Use filter_path to filter the returned results

By adding filter_path, the network IO usage can be reduced. It should be noted that if scroll is used, _scroll_idthe field .

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/common-options.html#common-options-response-filtering

8. scroll scan (before version 2.1)

It is no longer supported after version 2.1.

9. search after

After testing, in the environment of version 7.10.2, when PIT is not used: when _docsorting , the full query speed of search after is basically the same as that of scroll, but a small amount of data may be missed; _idwhen sorting is used, The full query speed of search after is significantly slower than the full query speed of scroll? (The above test results are inconsistent with the results of some articles, which need to be further analyzed)

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/paginate-search-results.html#search-after

response = es_client.search(
        index="my-index-0001",
        size=10000,
        body={
    
    "query": {
    
    ...}, "sort": ["_doc"]}
    )
    while response["hits"]["hits"]:
        last = None
        for item in response["hits"]["hits"]:
            last = item["sort"]
            # ...... 处理逻辑
        response = es_client.search(
            index="my-index-0001",
            size=10000,
            body={
    
    "query": {
    
    ...}, "search_after": last, "sort": ["_doc"]}
        )

10. search after + PIT (concurrent method)

Added to X-Pack in version 7.10; added in version 7.11.

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/point-in-time-api.html

11. slice scroll (concurrent method)

Concurrent scroll can be supported through slice scroll.

But if there is no suitable field as the slice field, if the number of slices exceeds the number of shards in the index, then ES will take longer O ( N ) O(N )O ( N ) time complexity and space complexity to complete the split, and this process can only be completed after a considerable proportion of queries have been performed. After testing, in the environment of version 7.10.2, when the number of slices exceeds the number of shards, ES needs to query about 60% - 70% to complete the splitting process. Before the splitting is completed, the sum of the scroll speeds of all processes is equal to The speed of single-process scroll is basically the same.

Document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/paginate-search-results.html#slice-scroll

Optimization methods in index design

1. Change the string format field to number or date format

Because of the indexing method, rangethe efficiency of filtering is very low for fields of string type; while rangethe efficiency of filtering is very high for fields of number and date types. Therefore, if the meaning of a field is a number or a date, it should not be stored as a string type.

2. Lower the nesting level

Nesting levels can "field-1": {"field-1-1": 1, "field-1-2": 2}be reduced by flattening deeply nested (for example) fields.

3. Reduce refresh frequency

If the timeliness of the search is not high, you can extend the refresh cycle to reduce the number of refreshes, but it also means higher memory usage.

4. Reduce the number of replicas

At the cost of reducing availability, reduce the number of replicas and increase the speed of index writing.

Guess you like

Origin blog.csdn.net/Changxing_J/article/details/130139631