Elasticsearch study notes 6: Synonym search implementation

The synonym search function of es is implemented through a custom analyzer

We know that an analyzer is a wrapper that combines three functions in a package, and the three functions are executed in order:

  1. char_filter such as: emoticons, html_strip, etc.
  2. tokenizer如:standard, i_smart等
  3. filter such as: lowercase, english_stop, etc.

char_filter is used to process the original search sentence before word segmentation tokenizer is used to divide the searched sentence into multiple phrases filter is used to process the phrases output by the tokenizer, such as deleting certain words, modifying certain words, adding certain words

The principle of implementing synonym search is to customize the filter, when processing the to-be-searched phrase output by the tokenizer, take out the synonym of the word and add it to the to-be-searched phrase.

Create an analyzer:

PUT /synonym
{
	"settings": {
		"analysis": {
			"filter": {
	            "word_sync": {
	                "type": "synonym",
	                "synonyms_path": "analysis/synonyms.txt"
	            }
	        },
	        "analyzer": {
	            "ik_sync_smart": {
	                "filter": [
	                    "word_sync"
	                ],
	                "type": "custom",
	                "tokenizer": "ik_smart"
	            }
	        }
		}	
	}
}

The above example creates an analyzer named ik_sync_smart under the index named synonym, changes the tokenizer of the analyzer to ik_smart, and the filter to word_sync. word_sync is a custom filter, the type of this filter is synonym, and synonyms_path is the specified synonym The path of the dictionary, this path is under config, so we need to add the analysis directory under the config path, and add the synonyms.txt file in it

The format of the synonyms.txt file is a set of synonyms for each row, and the synonym formats for each set are as follows:

  1. tomatoes, tomatoes
  2. tomato, tomato => tomato

In the first case, no matter indexing tomato or tomato, the phrase of the parser analysis result is ['tomato', 'tomato']

In the second case, no matter indexing tomato or tomato, the phrase of the parser analysis result is ['tomato']

able to pass

GET /synonym/_analyze?analyzer=ik_sync_smart&text=注册

See if synonym results match your needs

Then, when setting the mapping of index data, specify the analyzer corresponding to the search field as the custom ik_sync_smart.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325132897&siteId=291194637