028 ElasticSearch ---- --- 03 full-text search technology basics Detailed

1.IK tokenizer

(1) Installation

 

Use IK word can achieve the effect of the Chinese word.
Download IK word breaker: (Github Address: https: //github.com/medcl/elasticsearch-analysis-ik)


Download zip, unzip and copy the file to the directory under plugins ES ik installation directory ( Note: The version ik must be the same as elasticsearch )

Then restart elasticsearch:

Word test results:
Send: POST localhost: 9200 / _analyze
{ "text": "Testing word, a content is behind the test: spring cloud combat", "analyzer": "ik_max_word "}

(2) in two ways word

ik word has two sub-word mode: ik_max_word and ik_smart mode.
<1> ik_max_word
the text is done finest split granularity, such will be "Great Hall of the People's Republic of China" split "People's Republic of China, the Chinese people, the Chinese,
the Chinese People's Republic, the People's Republic, City Hall the General Assembly, halls and other terms.
<2> ik_smart
will do the most coarse resolution of granularity, such as will "Great Hall of the People's Republic of China" split People's Republic of China, the Great Hall.

(3) Custom thesaurus

If you want the word supports some proprietary words, you can customize thesaurus.
iK word breaker comes with a main.dic file, this file dictionary file.

New in the top of the directory in a my.dic file ( note file format-8 UTF (do not select utf-8 BOM, use editplus edit files, do not use the built-txt file editor)
in which you can customize the vocabulary:
such as the definition :
configuration file is my.dic,

 

Restart ES, word test results:
Send: POST localhost: 9200 / _analyze
{ "text": "Testing word, a content is behind the test: spring cloud combat", "analyzer": "ik_max_word "}

 

Guess you like

Origin www.cnblogs.com/luckyplj/p/11593433.html