lucene reverse-order index: to find records based on the value of the property
steps down sort:
Article 1: Tom lives in Guangzhou, I live guangzhou too.
Article 2: of He Once lived in shanghai
1. made Keywords
1.1 word - space segmentation
1.2 no actual word filter, filter punctuation
in once filtered too no specific meaning, punctuation, filter out
1.3 unified case, when the unified state (when the past tense ed, now, future tense)
lived, Lives converted to a live
final the result of:
2. establish reverse ordering index
2.1 keyword (sorted in alphabetic order, by positioning the fast-dimensional search algorithm)
2.2 appearance frequency
2.3 appearance position (character position, keyword position), when the position of keywords used lucene
The above were used as 3.lucene
3.1 dictionary file (the file contains a pointer frequency and location of the file)
3.2 files frequency
3.3 file location
Corresponding keyword, frequency, appearance position
3.4 The concept of using a field, where the expression of location information (such as title, article, URL), field is described in the dictionary file, each keyword has a field of information, every keyword must belong to one or more field
4 compression algorithm
4.1 Compression keywords
such as: the first word Arabia
the second word in Arabic
can be compressed into a second word <3, language>
4.2 pairs of digital compressed
digital difference value with a stored value
on an article number: 16382
Current article number: 16389
this compressed storage 7 (only one byte)
5. scenarios:
Lucene first binary dictionary lookup, word documents, and the frequency of occurrence point location, the dictionary is very small, millisecond, ordinary sequence comparison algorithm matching the entire process, not indexed rather slow.