Elasticsearch中的doc是咋回事

doc 存在的背景:

ES的inverted indices结构,使得查找包含某个term的文档的操作十分方便和高效。

例如 某个索引下的倒排索引结构如下:

Term      Doc_1   Doc_2   Doc_3
------------------------------------
brown   |   X   |   X   |
dog     |   X   |       |   X
dogs    |       |   X   |   X
fox     |   X   |       |   X
foxes   |       |   X   |
in      |       |   X   |
jumped  |   X   |       |   X
lazy    |   X   |   X   |
leap    |       |   X   |
over    |   X   |   X   |   X
quick   |   X   |   X   |   X
summer  |       |   X   |
the     |   X   |       |   X
------------------------------------

某个查询语句如下:

GET /my_index/_search
{
  "query" : { #(1)
    "match" : {
      "body" : "brown"
    }
  },
  "aggs" : { #(2)
    "popular_terms": {
      "terms" : {
        "field" : "body"
      }
    }
  }
}

我们知道(1)的query在inverted indices的情况下是简单而高效的。

但是, 对于(2)的aggregation的操作确实什么困难的。因为你要针对每个doc遍历一遍, 看看它包括哪些term。

While the inverted index maps terms to the documents containing the term, doc values maps documents to the terms contained by the document:

Doc      Terms
-----------------------------------------------------------------
Doc_1 | brown, dog, fox, jumped, lazy, over, quick, the
Doc_2 | brown, dogs, foxes, in, lazy, leap, over, quick, summer
Doc_3 | dog, dogs, fox, jumped, over, quick, the
-----------------------------------------------------------------

doc values使用的是uninverted indices的结构, 想要找每个文档具体包括哪些term就很容易了。
doc values适合index:"not_analyzed"的字段, 对于analyzed的字段不适合。
doc values可以用于aggregations/sorts/scripts。

更多详情,可见:https://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues.html

猜你喜欢

转载自blog.csdn.net/smithallenyu/article/details/52210556