ES-Search Recommended

1. Overview
Search generally requires the function of "search recommendation" or "search completion", that is, automatic completion or error correction is performed during the user input search process. In order to improve the matching accuracy of search documents, and then improve the user's search experience, this is Suggest.

##Four kinds of Suggester
2. term suggester
term suggester, as its name suggests, only matches the suggested words based on a single term after the tokenizer, and does not consider the relationship between multiple terms

json POST <index>/_search
{
    
    
    "suggest": {
    
    
        "<suggest_name>": {
    
    
            "text": "<search_content>",
            "term": {
    
    
                "suggest_mode": "<suggest_mode>",
                "field": "<field_name>"
            }
        }
    }
}

Options:

  • text: the text the user searched for
  • field: which field to select recommended data from
  • analyzer: which tokenizer to use
  • size: The maximum number of results to return for each suggestion
  • sort: how to sort according to the prompt items, the parameter value can only be the following two enumerations: score: score > word frequency > term itself frequency: word frequency > score > term itself
  • suggest_mode: The recommended mode for searching and recommending, and the parameter value is also an enumeration:
    insert image description here
  • max_edits: The maximum offset distance a candidate proposal can have in order to be considered a proposal. Can only be a value between 1 and 2. Any other value will cause a bad request error to be thrown. Default is 2
  • prefix_length: When the prefix matches, the minimum characters that must be satisfied
  • minwordlength: the minimum number of words included
  • mindocfreq: minimum document frequency
  • maxtermfreq: maximum term frequency

3. Phrase suggester
Compared with term suggester, phrase suggester will refer to the context of the suggested text, that is, other tokens of a sentence. It is not just a simple token distance matching, it can select better suggestions based on co-occurrence and frequency.

Options:

  • realworderror_likelihood: The default value of this option is 0.95. This option tells Elasticsearch that 5% of the terms in the index
    are misspelled. This means that as the value of this parameter gets lower and lower, Elasticsearch will treat more and more terms that exist in the index as misspelled, even though they are correct
  • max_errors: The maximum percentage of terms considered misspelled in order to form corrections. The default value is 1
  • confidence: The default value is 1.0, and the maximum value is also. This value acts as a threshold relative to the recommendation score. Only suggestions with a score above this value will be displayed. For example, a confidence of 1.0
    will only return suggestions that score higher than the input phrase
  • collate: Tells Elasticsearch
    to check each suggestion against the specified query to prune suggestions for which no matching document exists in the index. In this case, it's a match query. Since this query is a template query, the search query is currently suggested under Parameters in the query. More fields can be added in the "params" object under the query. Similarly, when the parameter "prune" is set to true, we will add a field "collate_match" to the response indicating whether there is a match for all corrected keywords in the suggested results
  • directgenerator:phrase suggester uses a candidate generator to generate a list of possible terms for each term in the given text. A single candidate generator is similar to calling the term suggester for each individual term in the text. The output of the generator is then combined with the proposed candidates to score them. Currently only one candidate generator is supported, the directgenerator. It is recommended that the API accept a list of generators directly under the key generator; each generator in the list is invoked for each item in the raw text.

4. The completion suggester
automatically completes and completes, and supports three types of queries [prefix query (prefix), fuzzy query (fuzzy) and regular expression query (regex)]. The main application scenario is "Auto Completion". In this scenario, every time a user enters a character, a query request needs to be sent to the backend to find a matching item. When the user input speed is high, the response speed of the backend is relatively strict. Therefore, in terms of implementation, it uses a different data structure from the previous two Suggesters. The index is not completed by inversion, but the analyzed data is encoded into FST and stored together with the index. For an index in an open state, the FST will be loaded into the memory by ES, and the prefix search speed is extremely fast. But FST can only be used for prefix lookup, which is also the limitation of Completion Suggester.

  • completion: a unique type of es, specially provided for suggest, based on memory, with high performance.

  • prefix query: search prompt based on prefix query, which is the most commonly used search recommendation query.

  • prefix: client search term

  • field: suggested word field

  • size: the number of suggested words to be returned (default 5)

  • skip_duplicates: whether to filter out duplicate suggestions, default false

fuzzy query

  • fuzziness: allowed offset, default auto
  • transpositions: If set to true, transposition is counted as one change instead of two, defaults to true.
  • min_length: the minimum input length before returning fuzzy suggestions, default 3
  • prefix_length: minimum length of input (does not check for fuzzy alternatives) defaults to 1
  • unicode_aware: If true, all metrics such as fuzzy edit distance, transposition and length are in Unicode code points rather than bytes. This is slightly slower than raw bytes, so it is set to false by default.
  • regex query: regular expressions can be used to represent prefixes, not recommended

5. The context suggester
completes the suggester will consider all the documents in the index, but generally speaking, when we make intelligent recommendations, it is best to filter through certain conditions, and it is possible to increase the weight for certain features.

  • contexts: context object, can define multiple

  • name: The name of the context, used to distinguish different context objects in the same index. You need to specify the current name when querying

  • type: The type of the context object, currently supports two types: category and geo, which are used to classify the suggest item and specify the geographic location respectively.

  • boost: weight value, used to improve the ranking

  • path: If there is no path, it is equivalent to specifying the context.name field when PUT data. If path is specified in Mapping, it is not needed when PUT data, because Mapping is one-time, and PUT data is frequent operation, which simplifies the code.

Guess you like

Origin blog.csdn.net/qq_38747892/article/details/129671346