Solr - Chinese word breaker IK Analyzer Introduction and Configuration

Brief introduction

IK Analyzer is an open source, lightweight kit is based on word of java language Chinese. It is an open source project Luence for the application of the main body , the Chinese combination of dictionary word and grammar analysis algorithms word components, IK achieve a simple word ambiguity elimination algorithm marks IK word is word from a simple dictionary derived to semantic word simulation.

Effects: the effect of Chinese semantic analysis of Chinese word good effect.

 

Configuration

First, add IKAnalyzer2012FF_u1.jar to solr lib directory of the project, according to its own installation directory to operate

① IKAnalyzer file into the directory: cd / usr / local / solr / IKAnalyzer /

② copying files: cp IKAnalyzer2012FF_u1.jar / usr / local / solr / tomcat / webapps / solr / WEB-INF / lib /

Second, create a WEB-INF / classes folder

① Enter Tomca the WEB-INF folder: cd / usr / local / solr / tomcat / webapps / solr / WEB-INF /

② create classes folder: mkdir classes

Third, the extended dictionary, stop word dictionary, solr configuration file into the project WEB-INF / classes directory

① IKAnalyzer file into the directory: cd / usr / local / solr / IKAnalyzer /

② copy the configuration file: cp IKAnalyzer.cfg.xml / usr / local / solr / tomcat / webapps / solr / WEB-INF / classes

③ copy disable dictionary: cp ext_stopword.dic / usr / local / solr / tomcat / webapps / solr / WEB-INF / classes

④ modify deactivated dictionary name: mv ext_stopword.dic stopword.dic

Fourth, IKAnalyzer.cfg.xml modify configuration files, stopword.dic already have, and have not ext.dic

① create ext.dic: touch ext.dic

② modify the expanded word dictionary and stop words dictionary: vim IKAnalyzer.cfg.xml 

Fifth, the interpretation stop dictionaries and dictionary expansion

stopword.dic - Stop dictionary: Segmentation word, every word will appear in stopping the dictionary are filtered out

ext.dic - Extended Dictionary: all proper nouns are put here, if natural language is not a word, put here solr cut after word of when it will cut into a word

Sixth, the configuration tokenizer

1. Modify Solrhome of schema.xml file

① into the conf folder: cd / usr / local / solr / solrhome / collection1 / conf

② Modify schema.xml, the file add (Note: in need </ schema> tag)

<fieldType name="text_ik" class="solr.TextField">
     <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>

③ custom domain name using the word is that you create

<field name="content_ik" type="text_ik" indexed="true" stored="true"/>

2. Shut down and restart Tomcat

cd /usr/local/solr/tomcat/bin/

./shutdown.sh

./startup.sh

3. Test

 

Published 61 original articles · won praise 13 · views 5041

Guess you like

Origin blog.csdn.net/qq_40885085/article/details/104103242