Solr添加IKAnalysis中文分词

1.下载中文分词器IKAnalyzer

地址:http://code.google.com/p/ik-analyzer/downloads/list

2.修改schema.xml文件,加入以下配置:

 <fieldType name="textik" class="solr.TextField" >
               <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>  
       
               <analyzer type="index">  
                   <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>  
                   <filter class="solr.StopFilterFactory"  
                           ignoreCase="true" words="stopwords.txt"/>  
                   <filter class="solr.WordDelimiterFilterFactory"  
                           generateWordParts="1"  
                           generateNumberParts="1"  
                           catenateWords="1"  
                           catenateNumbers="1"  
                           catenateAll="0"  
                           splitOnCaseChange="1"/>  
                   <filter class="solr.LowerCaseFilterFactory"/>  
                   <filter class="solr.EnglishPorterFilterFactory"  
                       protected="protwords.txt"/>  
                   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>  
               </analyzer>  
     			<analyzer type="query">  
                   <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/>  
                   <filter class="solr.StopFilterFactory"  
                           ignoreCase="true" words="stopwords.txt"/>  
                   <filter class="solr.WordDelimiterFilterFactory"  
                           generateWordParts="1"  
                           generateNumberParts="1"  
                           catenateWords="1"  
                           catenateNumbers="1"  
                           catenateAll="0"  
                           splitOnCaseChange="1"/>  
                   <filter class="solr.LowerCaseFilterFactory"/>  
                   <filter class="solr.EnglishPorterFilterFactory"  
                       protected="protwords.txt"/>  
                   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>  
               </analyzer>  
       
</fieldType>

 然后定义需要使用中文分词功能的字段,比如我这里是title,代码如下:

 <fields>
  <field name="title" type="textik" indexed="true" stored="true" required="true" /> 
 </fields>

3. 将下载的IKAnalyzer目录下的IKAnalyzer3.2.8.jar放入 TOMCAT/webapps/该solr工程/WEB-INFO/lib 目录下

4. 将下载的IKAnalyzer目录下的IKAnalyzer.cfg.xml和ext_stopword.dic文件放入 TOMCAT/webapps/该solr工程/classes 目录下,你也可以自己定义停用词字典,然后在IKAnalyzer.cfg.xml中进行配置,多个停用词字典之间用逗号隔开

5. 重启tomcat,输入http://域名:端口号/该solr工程/admin/analysis.jsp,效果如下:



 

猜你喜欢

转载自kobe-hz.iteye.com/blog/1976305