ElasticSearch学习之路-IK分词插件的安装

转载自:https://blog.csdn.net/chengyuqiang/article/details/78991570,ES版本号6.3.0

插件安装

离线安装
下载安装包:https://github.com/medcl/elasticsearch-analysis-ik/releases
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins目录下,创建ik目录
将下载的压缩包解压到F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik目录下,重启es即可
在线安装
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\bin\目录下
在dos窗口键入命令

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip

注:在线安装的ik分词插件的配置文件在F:\elkStudy\elasticsearch\elasticsearch-6.3.0\config目录下

测试IK中文分词器
(1)ik_smart

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text":"安徽省长江流域"
}

返回

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

(2)ik_max_world

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text":"安徽省长江流域"
}

返回

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "安徽",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "省长",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "长江",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "江流",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "流域",
      "start_offset": 5,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 6
    }
  ]
}

(3)新词的分词结果

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "王者荣耀"
}

返回

{
  "tokens": [
    {
      "token": "王者",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "荣耀",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

扩展已有词典
step1.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config目录创建custom文件夹
step2.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\custom目录,创建文件my_word.dic,并添加内容,注意文件的编码一定要为UTF-8 无Bom编码,老哥卡在这里卡了半天。

王者荣耀

step3.修改F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\IKAnalyzer.cfg.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">custom/my_word.dic</entry>
     <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords"></entry>
    <!--用户可以在这里配置远程扩展字典 -->
    <!-- <entry key="remote_ext_dict">words_location</entry> -->
    <!--用户可以在这里配置远程扩展停止词字典-->
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

step4.重启ES,Kibana

打印出来上述内容,说明自定义词典加载

step5.测试分词

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "王者荣耀"
}

返回

{
  "tokens": [
    {
      "token": "王者荣耀",
      "start_offset": 0,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 0
    }
  ]
} 

猜你喜欢

转载自blog.csdn.net/qq_23536449/article/details/91048333