1. Configure clutering in solrconfig.xml
<searchComponent name="clustering" enable="true" class="solr.clustering.ClusteringComponent" > <lst name="engine"> <str name="name">lingo</str> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <str name="carrot.resourcesDir">clustering/carrot2</str> </lst> <lst name="engine"> <str name="name">stc</str> <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str> </lst> <lst name="engine"> <str name="name">kmeans</str> <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str> </lst> </searchComponent> <requestHandler name="/clustering" startup="lazy" enable="true" class="solr.SearchHandler"> <lst name="defaults"> <bool name="clustering">true</bool> <str name="clustering.engine">lingo</str> <bool name="clustering.results">true</bool> <!-- Field name with the logical "title" of a each document (optional) --> <str name="carrot.title">content</str> <!-- Field name with the logical "URL" of a each document (optional) --> <str name="carrot.url">id</str> <!-- Field name with the logical "content" of a each document (optional) --> <str name="carrot.snippet">content</str> <!-- Apply highlighter to the title/ content and use this for clustering. --> <bool name="carrot.produceSummary">true</bool> <!-- the maximum number of labels per cluster --> <!--<int name="carrot.numDescriptions">5</int>--> <!-- produce sub clusters --> <bool name="carrot.outputSubClusters">false</bool> <!-- Configure the remaining request handler parameters. --> <str name="defType">edismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> </lst> <arr name="last-components"> <str>clustering</str> </arr> </requestHandler>
2. alter clustering/carrot2/lingo-attributes.xml
<attribute key="MultilingualClustering.defaultLanguage">
<value type="org.carrot2.core.LanguageCode" value="CHINESE_SIMPLIFIED"/>
</attribute>
3. add chinese tokenizer jar to classpath in solrconfig.xml
lucene-analyzers-smartcn-4.7.0.jar
<lib dir="../contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/tomcattd/archive/2013/08/20/3270143.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html