Solr configuration installation (2)--the Chinese language splitter IKAnalyzer

Solr's own tokenizer supports English well, but it is not suitable for Chinese word segmentation. We use IKAnalyzer as an example to explain Solr's tokenizer configuration.

1. Download

        see attached

2. Copy the IKAnalyzer2012FF_u1_custom.jar file to the 'application path'/WEB-INF/lib

       Note: There are generally two "application paths". The second type of solr-webapp will be re-extracted from example/webapps/solr.war after each restart of the service.

       1、tomcat部署solr:apache-tomcat/webapp/solr/WEB-INF/lib

       2、Solr自带jetty:solr/example/solr-webapp/webapp/WEB-INF/lib

3. Open core (eg: example/solr/collection1)-->conf-->schema.xml

       Add in the last </scheml>

<fieldType name="text_ik" class="solr.TextField">
   <analyzer type="index" isMaxWordLength="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
   <analyzer type="query" isMaxWordLength="true"  class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>

 
 Fourth, the use of custom tokenizers

        In the field tag that needs to use the Chinese tokenizer to configure the reference of fieldType, type="text_ik" should be consistent with the name of fieldType

<field name="shortName" type="text_ik" indexed="true" required="false" stored="true"/>

 

postscript:

1. The role of copyField in schema.xml: the field will be specified when creating a document index

2. If Solr starts, it prompts java.lang.UnsupportedClassVersionError: org/wltea/analyzer/lucene/IKAnalyzer

     Please download another version of IKAnalyzer and test it again. If the problem persists, please upgrade the JDK version

After the configuration is complete, restart Solr

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326689109&siteId=291194637