ElasticSearch word is used in IK

IK word is a plug-es of. We know that only through proper word to inquire into meaningful content. For example: search for "hello Ice Light", we want to include "Hello," Content "Ice Light" is searched out, rather than all included "you", "good", "cold", " ice "," "," the contents of light "search elements are out, because only search every sense of the word, is correct. Word is that word to do, and we Chinese people IK is developed word breaker, is currently the most popular word breaker.

 

1, on github download the latest IK word breaker: https://github.com/medcl/elasticsearch-analysis-ik

 

2, the zip file into the plugins directory can es, the change of name ik:

  

 

 3, save the config directory is all word configuration, just open a file and see .dic, you will find that the content is actually stored inside we believe that meaningful words, and must have hundreds of thousands of ratings:

  

 

4, restart the es, it loads ik. Access HTTP: // localhost: 9100 / , we have to search through elasticsearch-head about to try. Let's search for "hello Ice Light", that is, to see how it would word. In the query box, type http: // localhost: 9200 / _analyze / , means that we need to do some analysis; post content input { "analyzer": "ik_smart" , "text": " Hello Ice Light"}, represents analysis using ik_smart (word smart way, there is a ik_max_word, expressed maximize word, that is, as far as possible to split words very carefully). To analyze the content is "Hello Ice Light." Submit requests:

  

 

  我们发现ik返回的分词结果是“你好”、“寒冰”、“之光”。这说明它没有把“寒冰之光”当成是一个有意义的词。我想大家都想到了,因为我们就没有把“寒冰之光”这个词定义出来,所以ik当然就不知道了。那么,我们来试着自定义一下:

 

5、在config目录下,新建一个文件custom.dic(名字可以自己指定)。我们把“寒冰之光”几个字放进去,保存。

  

 

 6、修改配置文件IKAnalyzer.cfg.xml,把我们自定义的文件配置上:

  

 

 7、重启es,我们再试试:

  

 

  这次发现ik返回的词条是“你好”、“寒冰之光”。说明它已经知道“寒冰之光”是一个词了。

 

8、我们再来看看ik_max_word模式下,ik会怎样分词:

  

 

  可以看到分成了4个词语。这两种模式下,分词的策略是不同的,具体用哪种,取决于我们自己的需要。

Guess you like

Origin www.cnblogs.com/coldlight/p/12048780.html