In-depth exploration of Elasticsearch 8.X: function_score parameter interpretation and actual combat case analysis

In Elasticsearch, function_score allows us to custom score the search results while querying .

function_score provides a series of parameters and functions so that we can flexibly set them according to our needs.

Recently, some students reported that the relevant parameters of function_score are not easy to understand. This article will discuss the core parameters and functions of function_score in depth.

d723694e7858d4f6a2b5eb4f4f29b80c.png

1. The purpose and applicable scenarios of the function_score function

Elasticsearch's function_score query is a powerful tool that allows us to modify the basic relevance score of documents , allowing us to obtain better search results in specific application scenarios.

This function is realized by providing a set of built-in functions (such as script_score, weight, random_score, field_value_factor, decay functions, etc.), and a series of parameters (such as boost_mode and score_mode, etc.).

Here are some scenarios where function_score can be applied:

1.1 User preference scenarios

If we need to understand the user's interest or behavior, we can use function_score to improve the results that the user may be interested in.

For example, in a recommendation system , if we already know that a user likes an author's article, we can increase the score of the author's article.

For example, the recently popular "Luocha Haishi" was recommended to the top by NetEase Cloud Music.

9aecf7b958fbb6594ac6dee444385828.png

1.2 Random sampling scenario

If we need to randomly sample from a large dataset, we can use the random_score function.

This function generates a random score for each document, allowing us to get random search results.

1.3 Time-sensitive query scenarios

For some time-sensitive data, such as news, blog posts, or forum posts, newer documents are usually more relevant than older ones.

In this case, we can use decay functions (attenuation function) to reduce the score of the old documents.

1.4 Geographically Sensitive Query Scenarios

If our application cares about geographical location, such as real estate or travel related applications.

Decay functions can be used to boost the score of documents that are close to a geographic location.

1.5 Specific Fields Affect Scenarios

If our documents have some field values ​​that can affect the relevance score , we can use the field_value_factor (field value factor) function.

For example, in an e-commerce scenario, the sales, ratings , or number of reviews of a product may affect the ranking of search results.

Overall, function_score provides a flexible way to meet various complex relevance scoring needs.

2. Function_score parameter introduction

2.1 boost_mode parameter

boost_mode determines how query scores and function scores are combined.

Acceptable parameters are:

boost_mode describe
multiply query score and function score are multiplied (default)
sum Add query score and function score
avg Average of query score and function score
first just use the function score
max Maximum of query score and function score
min Minimum of query score and function score
replace Completely replace the query score, only use the function score

2.2 score_mode

score_mode determines how scores from multiple functions are handled.

Acceptable parameters are:

score_mode describe
multiply The function scores are multiplied together
sum Scores are summed for each function (default)
avg The average of the individual function scores
first only use the score of the first function
max the maximum of the individual function scores
min Minimum of individual function scores

2.3 Functions provided

function_score provides various function types for custom scoring :

Score Function describe
script_score Calculate score with script
weight Simply modify the query score regardless of the field value
random_score generate random score
field_value_factor Use field values ​​to calculate score
decay functions Attenuation function, calculate the score according to the distance of the field value, the closer the score is, the higher the score

3. Interpretation of function_score using actual combat

3.1 Structure data

To help you understand better, we'll create a simple index, insert some documents, and perform a function_score query on them.

Suppose we have an index called articles, which stores some blog post data, including the author (author), title (title), content (content), and the likes of this article (likes).

First, create an index and add some documents:

PUT /articles
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "author": { "type": "text" },
      "content": { "type": "text" },
      "likes": { "type": "integer" }
    }
  }
}


POST /_bulk
{ "index" : { "_index" : "articles", "_id" : "1" } }
{ "title": "Elasticsearch Basics", "author": "John Doe", "content": "This article introduces the basics of Elasticsearch.", "likes": 100 }
{ "index" : { "_index" : "articles", "_id" : "2" } }
{ "title": "Advanced Elasticsearch", "author": "Jane Doe", "content": "This article covers advanced topics in Elasticsearch.", "likes": 500 }
{ "index" : { "_index" : "articles", "_id" : "3" } }
{ "title": "Elasticsearch Function Score Query", "author": "John Doe", "content": "This article discusses the function_score query in Elasticsearch.", "likes": 250 }

Now that we have some documents, let's perform a function_score query on them.

3.2 Use the script_score function to implement logarithmic weighted sorting based on the 'likes' field

GET /articles/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "boost": "5",
      "functions": [
        {
          "script_score": {
            "script": {
              "source": "Math.log(1 + doc['likes'].value)"
            }
          }
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

The above query uses Elasticsearch's function_score query.

It first matches all documents in the "articles" index (using the match_all query), then uses a script function (script_score) that calculates the natural logarithm of each document's "likes" field plus one (Math.log (1 + doc['likes'].value)), then multiply this score by the original query score (since boost_mode is set to "multiply"), and the final score is multiplied by 5 (since boost is set to "5 "). This query is used to sort the results weighted by the "likes" field.

The execution results are as follows:

22b92e539f7aa22f553a41ed34b52fd0.png

3.3 Use random_score to generate a full random result query based on the 'likes' field

GET /articles/_search
{
  "query": {
    "function_score": {
      "query": { 
        "match_all": {} 
      },
      "functions": [
        {
          "random_score": {
            "field": "likes"
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

The above query uses the function_score query of Elasticsearch, and uses the random_score function together. The random_score function generates a random score based on the value of the "likes" field.

Importantly, since no fixed seed is provided, each execution of this query will return a new randomly sorted result.

match_all is the base query to match all documents. The random_score function then generates a random score based on the "likes" field value.

boost_mode set to "replace" means to ignore the score of the underlying query and use the score of the random_score function as the final result. So, this query will return fresh randomly sorted results each time it is executed.

The execution result is shown in the figure below:

3b4b43b989c1d748efde48c9b6f526c6.png

3.4 The field_value_factor function modifies _score according to the value of a field

This is useful for fields like "likes": an article with many "likes" is likely to be more relevant than one with few.

Examples are as follows:

GET /articles/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "content": "Elasticsearch"
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "likes",
            "factor": 1.2,
            "modifier": "sqrt",
            "missing": 1
          }
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

In this query:

  • "match": { "content": "Elasticsearch" }

Indicates that the underlying query is to match articles containing "Elasticsearch" in the "content" field.

  • field_value_factor

Function to adjust the query score based on the value of the "likes" field. It first takes the value of the "likes" field, if the document does not have a "likes" field or the value of the field is empty, then the default value of 1 specified by the "missing" parameter will be used. It then multiplies the obtained value by the factor 1.2 specified by the "factor" parameter. Finally, it applies the square root operation ("sqrt") to the result specified by the "modifier" parameter.

  • boost_mode

The parameter is set to "multiply", which means to multiply the score of the base query and the score calculated by the field_value_factor function to get the final document score.

So, this query will return articles containing "Elasticsearch", and the score of the article will be adjusted according to the value of the "likes" field, and the higher the "likes" value, the higher the score will be.

The execution results are as follows:

1d7fbf61188ca18f6cca091c605d5dc9.png

3.5 decay functions adjust _score according to the distance of the value of a field.

If the value is closer to a certain center point, the score will be higher. This is especially useful for date or geolocation fields.

Elasticsearch provides three decay functions: linear (linear), exponential (exp), and Gaussian (gauss).

Here is an example using the gauss function:

GET /articles/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "content": "Elasticsearch"
        }
      },
      "functions": [
        {
          "gauss": {
            "likes": {
              "origin": "100",
              "scale": "20",
              "offset": "0",
              "decay": 0.5
            }
          }
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

The above implementation can be summarized as: use the function_score and gauss functions to adjust the Gaussian decay score based on the 'likes' field for articles containing 'Elasticsearch'".

In this query:

  • "match": { "content": "Elasticsearch" }

Indicates that the underlying query is to match articles containing "Elasticsearch" in the "content" field.

  • gauss

The function is used to apply Gaussian decay to the value of the "likes" field.

in,

parameter value describe
origin 100 The desired center point, which is the ideal value of the "likes" field
scale 20 Indicates the speed of decay, that is, how far away from the "origin" value, the score will decay to half of the original score
offset 0 Indicates that no attenuation will be performed within the distance from "origin"
decay 0.5 Indicates how fast the score will decay when the distance exceeds the "scale", for example, 0.5 means that after the "scale" distance is exceeded, the score will decay to half of the original score
  • boost_mode

The parameter is set to "multiply", which means to multiply the score of the base query and the score calculated by the gauss function to get the final document score.

Therefore, this query will return articles containing "Elasticsearch", and the score of the article will be processed by Gaussian decay according to the value of the "likes" field. The closer the "likes" value is to 100, the higher the score will be.

12a3daea1d5292f630f6aba0ac680194.png

4. Summary

After an in-depth understanding of Elasticsearch's function_score, we can clearly feel its powerful role in search applications. Whether sorting based on specific field values, or fine-tuning search results with custom scripts, function_score works really well.

Although function_score has various parameters and options, it may seem complicated at first glance, but we only need to understand the meaning and function of each parameter, and we can use it flexibly according to our needs. In the actual case, we used functions such as script_score, field_value_factor, random_score and decay functions to demonstrate how to use function_score to meet complex search requirements.

However, we must also pay attention to carefully consider performance issues when using function_score, because complex functions and scripts may occupy a lot of computing resources. In practical applications, we should always pay attention to this to maintain good system performance.

In addition, as data and user behavior continue to change, we need to continuously observe, learn, and adjust search strategies to continuously improve user experience. In this process, function_score will be our powerful tool.

In general, the function_score of Elasticsearch is a powerful and flexible tool. As long as we understand it deeply and use it properly, we can tap its huge potential and improve our search application performance and user experience.

recommended reading

  1. First release on the whole network! From 0 to 1 Elasticsearch 8.X clearance video

  2. Heavyweight | Dead Elasticsearch 8.X Methodology Cognition List

  3. How to systematically learn Elasticsearch?

  4. 2023, do something

  5. Actual combat | N methods of Elasticsearch custom scoring

  6. Dry goods | Disassemble Elasticsearch BM25 model scoring details step by step

  7. How does Elasticsearch limit the score between 0 and 1?

9b5b2a0147ff9bcca5dce3538627f52d.jpeg

Acquire more dry goods faster in a shorter time!

Improve with nearly 2000+ Elastic enthusiasts around the world!

80c49c1c129db7afd4c2a2c3a2338ef2.gif

In the era of large models, learn advanced dry goods one step ahead!

Guess you like

Origin blog.csdn.net/wojiushiwo987/article/details/131950286