Some summaries about Lucene participles

 There is a big difference between Lucene 3.6 and 7.2 in declaring variables

 

These two versions can summarize the approximate changes of the previous version and the later version of Lucene

 

QueryParser qp = new QueryParser( String f, Analyzer a);
query = qp.parse(queryStr);
 
QueryParser already includes the implementation of other searchers, as long as the corresponding search method is added to the search results, the effect of using the same searcher can be achieved

 

 

Indexing and tokenizers used at search time can affect search results

 

Chinese word segmentation

If the tokenizer is a unigram tokenizer when the index is created, it can be searched according to the granularity of the most subdivided words.

E.g:

"Baidu"

If the result of the word segmentation with the unary tokenizer is:

Baidu

At this point the search can be done by a single word.

 

But if the Chinese-only tokenizer is used, the search content will be indexed in terms of phrases. At this time, a single word search has no results. Because when creating an index, the granularity of word segmentation is not the smallest, and word segmentation can only be searched in the form of phrases.

E.g:

The result of word segmentation with the intelligent Chinese tokenizer is:

Baidu

At this time, when searching for a single word, no results can be found.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326016054&siteId=291194637