Speaking before es core search engine is the inverted index, each field will maintain its inverted index (unless explicitly turned off), the inverted index structure is composed as follows:
- Word dictionary (Term Dictionary): all the documents recorded words, a greater amount of occupation of data, related information is recorded to the discharge from the word to the list, generally implemented using B + Tree structure;
- Inverted list (Posting List): a collection of documents recording the corresponding word in the dictionary, it consists of an inverted index entries (Posting), inverted index items include:
- Document id, the document used to obtain the raw data
- Word frequency (TF, Term Frequency), records the number of occurrences of the word in the document, it is considered one of the bases of the relevant points
- Position recorded words in the original document data word location , searching for words
- Offset (Offset), the recorded words specific start and end positions in the original document data, such as may be used to highlight certain query result
Previous: elasticsearch the index and documentation
Next: elasticsearch the word