Explore the application and principle of Elasticsearch 8.X Terms Set retrieval

1. Introduction to Terms Set Search

Terms Set query is a powerful query type in Elasticsearch, mainly used to deal with document matching in multi-valued fields.

At its core, it retrieves documents that match at least a certain number of a given term, where the number of matches can be a fixed value or a dynamic value based on another field. This style of query is useful when dealing with complex data with multiple attributes, categories, or labels.

2. Terms Set retrieval generation background

Terms Set query is a new feature introduced in Elasticsearch 6.1 release. Before version 6.1, Elasticsearch provided a variety of query types, but when dealing with multi-valued fields, users may need to write more complex queries or use scripts to achieve specific matching conditions.

The main purpose of introducing the Terms Set query is to simplify query processing in such scenarios. Using the Terms Set query, users can easily find documents that match at least a certain number of given terms, while supporting dynamic calculation of the number of matches based on other fields or scripts. This style of query is useful when dealing with complex data with multiple attributes, categories, or labels.

3. Terms Set retrieval application scenarios

Terms Set queries are useful when dealing with multivalued fields and specific matching conditions.

Here are some common application scenarios:

labeling system

In applications with a tagging system, such as blogs, social media, or news sites, users may assign multiple tags to content such as articles, posts, or products. With a Terms Set query, you can find content with at least a certain number of a given tag. This is very useful for filtering and recommendation functions.

search engine

In a search engine, a user may enter multiple keywords to find relevant content. Using the Terms Set query, the results can be sorted based on how closely documents match a given keyword. For example, documents matching at least half the number of keywords entered by the user may be found.

e-commerce

In an e-commerce application, a product may have multiple attributes such as color, size, or brand. Using the Terms Set query, you can find products that meet multiple attribute conditions at the same time. For example, it is possible to find products with at least 2 specified colors and 3 specified sizes.

document management system

In a document management system, documents may have multiple categories or tags. Using the Terms Set query, you can filter documents based on their classification or label matching. For example, it is possible to find documents that match at least a certain number of given classifications or tags.

skill match

In a recruiting or job application, a candidate may have multiple skills. Using a Terms Set query, it is possible to find candidates with at least a certain number of given skills. This is very useful for screening and recommending suitable candidates. In summary, Terms Set queries are very useful when dealing with complex data with multiple attributes, categories or labels. By flexibly setting matching quantity conditions, documents meeting specific requirements can be easily found.

4. How the Terms Set retrieval works

The basic syntax for a Terms Set query is as follows:

{
  "query": {
    "terms_set": {
      "<字段名>": {
        "terms": ["<词项1>", "<词项2>", ...],
        "minimum_should_match_field": "<匹配数量字段名>",
        "minimum_should_match_script": {
          "source": "<脚本>"
        }
      }
    }
  }
}

The working principle of the Terms Set query can be divided into the following steps:

  • Specifies the name of the field to be queried, which is usually a multi-valued field, such as an array or collection.

  • Provides a set of terms to match against in the specified field.

  • There are two ways to set the matching quantity condition (the two cannot be combined, only one of them can be selected):

    • Specify the name of a field containing the number of matches via the minimum_should_match_field parameter.

    • Use the minimum_should_match_script parameter to provide a script that dynamically calculates the number of matches.

  • Elasticsearch retrieves documents matching the specified number of terms and returns them as query results.

5. Terms Set search application example

Suppose we have a database of movies and each movie has multiple tags. Now, we want to find movies that have a certain number of given tags at the same time.

Here is an example query using the Terms Set:

5.1 Data preparation

First, create an index called movies:

PUT movies
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "tags": {
        "type": "keyword"
      },
      "tags_count": {
        "type": "integer"
      }
    }
  }
}

Then, add some movie data to the index:

POST /movies/_bulk
{"index":{"_id":1}}
{"title":"电影1","tags":["喜剧","动作","科幻"],"tags_count":3}
{"index":{"_id":2}}
{"title":"电影2","tags":["喜剧","爱情","家庭"],"tags_count":3}
{"index":{"_id":3}}
{"title":"电影3","tags":["动作","科幻","喜剧"],"tags_count":3}

5.2 Retrieve Movies Using Terms Set

Now, we want to find movies with at least 2 given tags ("comedy", "action" and "sci-fi"). We can use the Terms Set query to achieve this requirement:

Search based on minimum_should_match_field

GET /movies/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": ["喜剧", "动作", "科幻"],
        "minimum_should_match_field": "tags_count"
      }
    }
  }
}

The above code uses a terms_set query to retrieve movies in an index called movies that satisfy a dynamic number of matches determined by the tags_count field, where the query tags include "comedy", "action", and "sci-fi". The returned results are as follows, document 1 is recalled.

e181be0036475f0d659f7d13a87435c2.png

Look at the search below.

Search based on minimum_should_match_script

GET /movies/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [
          "喜剧",
          "动作",
          "科幻"
        ],
        "minimum_should_match_script": {
          "source": "doc['tags_count'].value * 0.7"
        }
      }
    }
  }
}

Retrieve movies from the index named movies that match at least 70% of the total number of given tags ("comedy", "action" and "sci-fi") as above. The number of matches is determined by the custom script doc['tags_count'].value * 0.7 dynamic calculation. Two documents with "_id" 1 and "_id" 3 are recalled.

fe11f768e185511596fb2360dac9e507.png

6. Summary

Terms Set query is a very powerful query method in Elasticsearch, which is suitable for processing complex data with multiple attributes, classifications or labels.

By flexibly setting matching quantity conditions, we can easily find documents that meet specific requirements.

However, it is important to note that you may experience performance issues when using Terms Set queries, especially when dealing with large amounts of data. In order to improve query performance, you can consider preprocessing the data, such as using a clustering algorithm to group labels, and then query documents based on the grouping.

recommended reading

  1. First release on the whole network! From 0 to 1 Elasticsearch 8.X clearance video

  2. Heavyweight | Dead Elasticsearch 8.X Methodology Cognition List

  3. How to systematically learn Elasticsearch?

  4. 2023, do something

30caa489725722e70ecc245df3ef22b0.jpeg

Acquire more dry goods faster in a shorter time!

Improve with nearly 2000+ Elastic enthusiasts around the world!

f9f1206dc1cf417cd9722cd49af6609c.gif

Take the lead in learning advanced dry goods!

Guess you like

Origin blog.csdn.net/wojiushiwo987/article/details/130591798