Solr: Clustering documents with carrot

1. Configure clutering in solrconfig.xml

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>

      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
    </lst>

    <lst name="engine">
      <str name="name">stc</str>
      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
    </lst>

    <lst name="engine">
      <str name="name">kmeans</str>
      <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
    </lst>

  </searchComponent>
 <requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <!--<int name="carrot.numDescriptions">5</int>-->
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">false</bool>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

2. alter clustering/carrot2/lingo-attributes.xml

          <attribute key="MultilingualClustering.defaultLanguage">
            <value type="org.carrot2.core.LanguageCode" value="CHINESE_SIMPLIFIED"/>
          </attribute>

3. add chinese tokenizer jar to classpath in solrconfig.xml

lucene-analyzers-smartcn-4.7.0.jar

<lib dir="../contrib/analysis-extras/lucene-libs" regex=".*\.jar" />

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/tomcattd/archive/2013/08/20/3270143.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html

猜你喜欢

转载自ylzhj02.iteye.com/blog/2149394