Solr custom FieldType Analyzer does not work problem finding

 

  I recently worked on a project that needs to implement the pinyin search function of the field. A fieldtyp is configured in the schema, as follows:

  <fieldType name="cn_pinyin" class="solr.TextField" positionIncrementGap="100"
       autoGeneratePhraseQueries="true"  omitNorms="true" omitPositions="false">  
	<analyzer  type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern=",\s*" />
	   <filter class="solr.NGramFilterFactory"  minGramSize="1" maxGramSize="7" />
	   <filter class="solr.StandardFilterFactory"/>
	   <filter class="solr.TrimFilterFactory"/>
	   <filter class="com.dfire.tis.solrextend.fieldtype.pinyin.PinyinTokenFilterFactory" />   
	</analyzer>
     <analyzer  type="query">
       <tokenizer class="solr.PatternTokenizerFactory" pattern=",\s*" />
	   <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

 In the index part of the analyzer, a tokenizer and a series of filter filters are configured to process the input field values. First, the commas are separated into title units, then each title unit is subjected to N-word segmentation, and then the left and right spaces are removed from the title unit, and finally Use the custom PinyinTokenFilterFactory for Chinese word segmentation, and the word segmentation results are both Chinese and Pinyin word segmentation results, which will be generated in the term list.

 

Finally, after publishing this configuration online, I did not get the expected results. After actual testing, it was found that even the standard PatternTokenizerFactory tokenizer separated by commas did not work. I tried a lot of methods but couldn't solve it. After a long delay, I finally found that there was a problem with a piece of code in the full build center:

public static IndexWriter createRAMIndexWriter(IndexConf indexConf)
    throws IOException {   
	 RAMDirectory ramDirectory = new RAMDirectory ();

		IndexWriterConfig indexWriterConfig = new IndexWriterConfig(
				new StandardAnalyzer()
		);
		indexWriterConfig.setMaxBufferedDocs(Integer.MAX_VALUE);
		indexWriterConfig
				.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);

		indexWriterConfig.setOpenMode(OpenMode.CREATE);
		IndexWriter addWriter = new IndexWriter(ramDirectory,
				indexWriterConfig);
		addWriter.commit();
		return addWriter;
}	

 In the code of the build center, there is a method to create an instance of indexwriter. The code where the problem occurs is the part of the code that builds the instance of IndexWriterConfig. When the instance of IndexWriterConfig is created, a StandardAnalyzer object is newly created on the constructor . This is the source of the problem, that is, Tell indexWriter that all the analyzer references of the field called when indexing use the standard StandardAnalyzer object, so it explains the phenomenon that the analyzers in the cn_pinyin previously configured in the schema do not take effect.

The correct code should be like this:

IndexWriterConfig indexWriterConfig = new IndexWriterConfig(
	schema.getIndexAnalyzer()
);

This is correct.

 The occurrence of this bug really verifies that sentence. It affects the whole body. We should be in awe of the code.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326991570&siteId=291194637