拼音汉字建议suggest实现

改功能实现主要参考了该blog
http://lucien-zzy.iteye.com/admin/blogs/2008291
-----------------------------------------------------------------------------------
实现suggest的基本原理是:
在对document建立索引时,将比较多的文本字段copy到一个text字段,
   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="content_type" dest="text"/>
然后,对text字段的词元单独建立字典索引,在solr自带的SpellCheckComponent建立的字典索引中,只有一个字段word,对于英文的处理是够了,但是对中文的拼音和首字母sugeest却是不够的,所以作者引入了另外一个字段key,索引了来自于每个text的词元word的拼音和首字母,
如词元:国防部
----------------------------------

国防
国防部
g
gf
gfb
...
guofangbu
-----------------------------------
    Document doc = new Document();
    Field contents = new StringField("word", word, Field.Store.YES);
    doc.add(contents);
    LOG.info("添加word:"+word);
    Field pys = new TextField("key", word, Field.Store.NO);
    doc.add(pys);

在Suggester查询时就是查询这个词典索引的key字段,将word字段作为展示返回
    String queryString = "key:" + key;
    LOG.info("查询:"+queryString);
    Query query = parser.parse(queryString);
    TopDocs results = null;
    results = searcher.search(query, num);


具体配置如下:-------------------------------------------------------------------
<searchComponent  name="suggest" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">string</str>
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
        <str name="field">text</str>
        <str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
        <float name="threshold">0.0001</float>
        <str name="spellcheckIndexDir">spellchecker</str>
        <str name="comparatorClass">freq</str>
        <!--<str name="buildOnOptimize">true</str>-->
        <str name="buildOnCommit">true</str>
    </lst>
<!--用于输入拼音提示功能-->
<lst name="spellchecker">
        <str name="name">pysuggest</str>
        <str name="classname">shentong.tsearch.spelling.suggest.Suggester</str>
        <str name="lookupImpl">shentong.tsearch.spelling.suggest.py.PYLookup</str>
        <str name="field">text</str>
        <str name="fieldType">string</str>//可以指定搜索词元前对关键字的分词方式
        <float name="threshold">0.0001</float>
<str name="pySuggestIndexDir">suggestIndex</str>
        <str name="comparatorClass">freq</str>
        <!--<str name="buildOnOptimize">true</str>-->
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler  name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">pysuggest</str>
        <!-- 这个参数告诉solr,当查询的结果数多于设定的count数时,返回点击数更多的那些 -->
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.extendedResults">false</str>
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>
----------------------------------------------------------------------------------
看效果图如下:












猜你喜欢

转载自myq526180048.iteye.com/blog/2149862