Adopting a unique "forward iterative and finest-grained segmentation algorithm", with a high-speed processing capacity of 800,000 words per second, using a multi-sub-processor analysis mode, supporting: English letters (IP address, Email, URL), numbers (date , Commonly used Chinese quantifiers, Roman numerals, scientific notation), Chinese vocabulary (name, place name processing) and other word segmentation processing. Optimized dictionary storage, smaller memory footprint.
IK tokenizer Elasticsearch plug-in address: https://github.com/medcl/elasticsearch-analysis-ik
#installation method:
Download and unzip to the elasticsearch-6.4.2/plugins directory
elasticsearch-analysis-ik-6.5.4.zip
# unpack
the unzip elasticsearch-analysis-ik-6.5.4.zip
# reboot
./bin/elasticsearch
test:
Please refer to https://github.com/medcl/elasticsearch-analysis-ik/tree/v6.4.2 to complete the installation test