[ES] ES 7.6 fuzzy search (fuzzy search)

Insert picture description here

1 Overview

Reprinted: https://www.cnblogs.com/sanduzxcvbnm/p/12085219.html

In the actual search, we sometimes make typos, resulting in failure to search. In Elasticsearch, we can use the fuzzy attribute to perform fuzzy queries, so as to achieve the situation where there are typos in the search.

The match query has the "fuziness" attribute. It can be set to "0", "1", "2" or "auto". "Auto" is the recommended option, which defines the distance based on the length of the query term.

2.Fuzzy query

Return documents containing words similar to the search term, measured by Levenshtein edit distance.

Edit distance is the number of character changes required to convert one term into another. These changes can include:

更改字符(box→fox)
删除字符(black→lack)
插入字符(sic→sick)
转置两个相邻字符(act→cat

In order to find similar words, fuzzy query will create a set of all possible changes or expansions of the search term within the specified edit distance. The query then returns an exact match for each expansion.

3. Examples

We first enter the following document into the fuzzyindex index:

PUT fuzzyindex/_doc/1
{
    
    
  "content": "I like blue sky"
}

If this time, we conduct the following search:

GET fuzzyindex/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "content": "ski"
    }
  }
}

Then there is no search result, this is because there is no word ski after the participle in "I like blue sky".

{
    
    
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

At this time, if we use the following search:

GET fuzzyindex/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "content": {
    
    
        "query": "ski",
        "fuzziness": "1"
      }
    }
  }
}

Then the displayed result is:

{
    
    
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.19178805,
    "hits" : [
      {
    
    
        "_index" : "fuzzyindex",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.19178805,
        "_source" : {
    
    
          "content" : "I like blue sky"
        }
      }
    ]
  }
}

Obviously we have found the result we need. This is because sky and ski differ by only one letter in time.

Similarly, if we choose the "auto" option to see:

GET fuzzyindex/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "content": {
    
    
        "query": "ski",
        "fuzziness": "auto"
      }
    }
  }
}

It shows the same results as above. It can also be matched.

If we match as follows:

   GET fuzzyindex/_search
    {
    
    
      "query": {
    
    
        "match": {
    
    
          "content": {
    
    
            "query": "bxxe",
            "fuzziness": "auto"
          }
        }
      }
    }

Then it cannot match any results, but if we perform the following search:

GET fuzzyindex/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "content": {
    
    
        "query": "bxxe",
        "fuzziness": "2"
      }
    }
  }
}

We can also use the following format:

  GET /_search
    {
    
    
        "query": {
    
    
            "fuzzy": {
    
    
                "content": {
    
    
                    "value": "bxxe",
                    "fuzziness": "2"
                }
            }
        }
    }

Then it can display the results of the search, this is because we can tolerate two editing errors.

Fuzziness is a simple solution for spelling errors, but it has high CPU overhead and very low precision.

Guess you like

Origin blog.csdn.net/qq_21383435/article/details/108939468