1. ik configuration file
ik configuration file address: es/plugins/ik/config directory
IKAnalyzer.cfg.xml: used to configure the custom thesaurus
main.dic: ik's native built-in Chinese thesaurus, there are more than 270,000 words in total, as long as these words are grouped together
quantifier.dic: put some units Related words
suffix.dic: put some suffixes
surname.dic: Chinese surname stopword.dic
: English stop word
The two most important configuration files of ik native
main.dic: Contains native Chinese words, which will be segmented according to the words inside stopword.dic:
Contains English stop words
stop word
a the and at but
Generally, like stop words, they will be killed directly during word segmentation and will not be built in the inverted index.
2. Custom thesaurus
(1) Build your own thesaurus: some special buzzwords emerge every year, Internet celebrities, blue and thin mushrooms, shouting wheat, ghost animals, which are generally not in ik's native dictionary
Add your own latest words and go to ik's thesaurus
IKAnalyzer.cfg.xml:ext_dict,custom/mydict.dic
Add your own words, and then you need to restart es to take effect
(2) Build a stop thesaurus by yourself: for example, yes, what, what, we may not want to build an index and let people search
custom/ext_stopword.dic, there are already commonly used Chinese stop words, you can add your own stop words, and then restart es