How does Elasticsearch 8.X recall data sequentially based on user-specified ID?

1. Practical problems

How to output the results according to the order of the input ids, the number of ids is 500, and there is pagination?

Question source: https://t.zsxq.com/0cdyq7tzr

2. Proposal Discussion

2.1 Elasticsearch default sorting mechanism

  • In Elasticsearch, if no sorting rule is specified, the default sorting method of retrieval results is to sort documents in descending order according to their relevance score (_score). The relevance score indicates how well the document matches the query. The higher the score, the better the document matches the query.

  • In some cases, scoring of query results may not be relevant or computed. For example, Elasticsearch does not calculate scores in the context of filter queries (such as terms, terms, or ids queries) or filter, must_not contexts of Boolean queries. In these cases, documents are usually scored with a score of 1.0 or other default values ​​(filter, must_not score of 0).

2.2 How to recall data sequentially based on the ID used for assignment?

The native Elasticsearch retrieval mechanism does not have this functionality. That means, we have to implement it ourselves.

How? Treat the sequence given by the user (non-increasing and non-decreasing irregular sequence, such as 3, 1, 5, 7) as one-dimensional array data.

The subscripts of their arrays can only be 0, 1, 2, 3...that is to say, the subscripts are in order.

ab929b2fbf8f7604aecb1f8475c623a5.png

Then the next question is transferred to the question of how to sort in ascending order based on the array subscript?

It can be realized by sorting the script footsteps sorted by sort.

3. Preconditions

PUT /_cluster/settings
{
  "transient": {
    "indices.id_field_data.enabled": true
  }
}

The interpretation is as follows:

The PUT /_cluster/settings request is the API in Elasticsearch for updating cluster settings. The implication of this particular request is that we want to update the transient settings of the cluster.

{"transient": {"indices.id_field_data.enabled": true}}

In this request, we set indices.id_field_data.enabled to true.

This setting controls whether Elasticsearch allows fielddata access for the _id field.

By default, this setting is disabled (false) because accessing fielddata for the _id field can consume a lot of memory and can cause performance degradation.

The transient attribute used here means that the setting change is temporary and only takes effect until the cluster is restarted. When the cluster restarts, this setting will be reset to the default. If you wish to change this permanently, you can use the persistent attribute:

PUT /_cluster/settings
{"persistent": {"indices.id_field_data.enabled": true}}

Note that in practice we generally don't recommend enabling fielddata access for the _id field as it can cause performance issues.

4. Give sample data

Give batch data for later use!

PUT test_index/_bulk
{"index":{"_id":1}}
{"title":"001"}
{"index":{"_id":3}}
{"title":"003"}
{"index":{"_id":5}}
{"title":"005"}
{"index":{"_id":7}}
{"title":"007"}

5. Give the implementation

POST test_index/_search
{
  "query": {
    "ids": {
      "values": [
        "3",
        "1",
        "5",
        "7"
      ]
    }
  },
  "sort": [
    {
      "_script": {
        "type": "number",
        "script": {
          "lang": "painless",
          "source": """
          List ids_list = params.ids;
          String cur_id = doc['_id'].value;
          for(int i = 0; i < ids_list.length; i++)
          {
            if(cur_id.equals(ids_list[i]))
            {
              return i;
            }
          }
          return -1;
          """,
          "params": {
            "ids": ["3","1","5","7"]
          }
        },
        "order": "asc"
      }
    }
  ]
}

Realize interpretation:

This Elasticsearch query is used to search for documents from an index named test_index. The main purpose of the query is to retrieve documents based on the given ID list and sort the retrieved documents in the order of the ID list.

Here is a detailed explanation of each part of the query:

  • size: set to 10, means the query will return at most 10 documents. In this case, since our ID list only contains 4 IDs, the query will return a maximum of 4 documents.

  • query: use the ids query to filter documents in the given list of ids. In this example, we want to retrieve documents with IDs "3", "1", "5", and "7".

  • sort: Use script sort (_script) to sort the returned documents in the order of the given list of IDs. -- type: Set to "number", indicating that the value returned by the script will be treated as a number.

  • script: Defines a Painless script that calculates the rank value for each document.

  • lang: Set to "painless" to indicate that the script is written in the Painless language.

  • source: The source code of the script. This script iterates through the given list of IDs, looking for an ID that matches the current document_id. If a match is found, the index of the match in the ID list is returned as the sort value. If no match is found, -1 is returned (in this example, nothing actually happens).

  • params: The parameters of the script, including a list called ids, which contains the IDs to be sorted. Here, we pass a list of IDs as parameters to the script.

  • order: set to "asc" to sort documents in ascending order. This means that query results will be returned in the order of the ID list.

With this query, you can fetch the documents with the specified ID from the test_index index and sort the results by the given ID order ("3", "1", "5", "7").

6. Summary

For pagination, please refer to the common retrieval implementation.

In this paper, combined with the method of script sorting, the result data is recalled based on the order specified by the user. The video interpretation is as follows:

Do you have a better way to achieve it? Welcome to leave a message and exchange.

recommended reading

  1. First release on the whole network! From 0 to 1 Elasticsearch 8.X clearance video

  2. Heavyweight | Dead Elasticsearch 8.X Methodology Cognition List

  3. How to systematically learn Elasticsearch?

  4. 2023, do something

db81f18f770d43f10baca7acc8995d1f.jpeg

Acquire more dry goods faster in a shorter time!

Improve with nearly 1900+ Elastic enthusiasts around the world!

2234b2f42e390ecd8e067da537823585.gif

Learn advanced dry goods one step ahead of your colleagues!

Guess you like

Origin blog.csdn.net/wojiushiwo987/article/details/129964806