1、IK分词(Git)
1.1、IK分词插件安装
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service
PS:如果是集群模式,则每个节点都需要安装;
1.2、本地词库配置
vi /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml
修改"ext_dict",可在/etc/elasticsearch/analysis-ik/目录下新建custom,拷贝fresh.dic。
<entry key="ext_dict">custom/fresh.dic</entry>
重启elasticsearch即可
PS:如果是集群模式,则每个节点都需要配置;
1.3、远程词库配置(热更新)
vi /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml
修改"remote_ext_dict"
<entry key="remote_ext_dict">http://ip:port/products/freshdictrequest</entry>
通过配置远程扩展词典,可以完成热词更新。
实现方式:从词库中查询近一分钟新增热词,若存在,则在OpenResty中修改header的ETag(Etag=os.time())
PS:如果是集群模式,则每个节点都需要配置;
该实现方式有一坑:
IKAnalyzer.cfg.xml中配置后,重启ES,报异常
java.security.AccessControlException: access denied (java.net.SocketPermission ip:port connect,resolve)
此时,需要在
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/lib/security/java.policy
中新增信任站点:
permission java.net.SocketPermission "ip:port","accept";
permission java.net.SocketPermission "ip:port","listen";
permission java.net.SocketPermission "ip:port","resolve";
permission java.net.SocketPermission "ip:port","connect";
2、拼音分词(Git)
/usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.2.3/elasticsearch-analysis-pinyin-6.2.3.zip
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service
3、IK-PinYin分词方案(仅供参考,欢迎指正!)
PUT test
{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": [
"my_pinyin",
"word_delimiter"
]
}
},
"filter": {
"my_pinyin" : {
"type" : "pinyin",
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"keep_original" : false,
"limit_first_letter_length" : 10,
"lowercase" : true,
"remove_duplicated_term" : true
}
}
}
}
}