Elasticsearch aggregate query based on nested type index of array structure

I have been doing aggregation queries in ES these days. I encountered some problems when querying one-to-many product data. Let me make a record.

1.ES document data structure is as follows

=====doc 文档一
{
    "id": "IEO29R12KN912NDF893",
    "products": [
        {
            "product_name": "电视机",
            "budget": 2000
        },
        {
            "product_name": "手机",
            "budget": 851
        }
    ],
    "publish_year": "2020"
}
=====doc 文档二
{
    "id": "IQFJ019238AHJDFK1L9",
    "products": [
        {
            "product_name": "电视机",
            "budget": 2000
        },
        {
            "product_name": "相机",
            "budget": 5000
        },
        {
            "product_name": "扑克牌",
            "budget": 2
        }
    ],
    "publish_year": "2019"
}

2. The result of my query requirement is

Group for each product name [ product_name ] and sum the budget [ budget ]

3. My expected query results

product name Budget
TV set 4000
cell phone 851
camera 5000
playing cards 2

4. Wrong es query and results

4.1 Query es statements and results

es query statement == simplified version

{
    
    
  "from": 0,
  "size": 14,
  "aggs": {
    
    
    "aggs_of_product": {
    
    
      "terms": {
    
    
        "field": "products.product_name.keyword"
      },
      "aggs": {
    
    
        "aggs_sum_of_budget": {
    
    
          "sum": {
    
    
            "field": "products.budget"
          }
        }
      }
    }
  }doc 文档二
}

es query results == simplified version

{
    
    
    "aggregations": {
    
    
        "aggs_of_product": {
    
    
            "buckets": [
                {
    
    
                    "key": "电视机",
                    "doc_count": 2,
                    "aggs_sum_of_budget": {
    
    
                        "value": 9853
                    }
                },
                {
    
    
                    "key": "手机",
                    "doc_count": 1,
                    "aggs_sum_of_budget": {
    
    
                        "value": 2851
                    }
                },
                {
    
    
                    "key": "相机",
                    "doc_count": 1,
                    "aggs_sum_of_budget": {
    
    
                        "value": 7002
                    }
                },
                {
    
    
                    "key": "扑克牌",
                    "doc_count": 1,
                    "aggs_sum_of_budget": {
    
    
                        "value": 7002
                    }
                }
            ]
        }
    }
}

4.2 Cause of error

Note that the doc_count of the TV is 2. Here we can draw conclusions based on the budget of the TV, camera, and playing cards. Take the camera as an example: the sum value he calculates is the sum of all the budget budgets of the products under [ doc document 2 ] that hit the camera . The es aggregation query here is not applicable here.

If there is only one product in a document, then the result of this es query statement is correct.

5.Solution

After many searches on Google and Baidu, I discovered a problem. My calculation idea was correct. The main problem was the type of the products field. At first, the type of products was the default, and later it needed to be changed. It can be changed to nested[nested] type.

5.1 Modification of field index mapping

Mapping of previous products

{
    
    
    "mappings": {
    
    
        "properties": {
    
    
            "products": {
    
    
                "properties": {
    
    
                    "product_name": {
    
    
                        "type": "text",
                        "fields": {
    
    
                            "keyword": {
    
    
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "budget": {
    
    
                        "type": "float"
                    }
                }
            }
        }
    }
}

Mapping after modifying products

Change the field type of products to nested nested type.
One more "type": "nested"

{
    
    
    "mappings": {
    
    
        "properties": {
    
    
            "products": {
    
    
                "type": "nested",
                "properties": {
    
    
                    "product_name": {
    
    
                        "type": "text",
                        "fields": {
    
    
                            "keyword": {
    
    
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "budget": {
    
    
                        "type": "float"
                    }
                }
            }
        }
    }
}

5.2 Modified es query statement

{
    
    
    "aggs": {
    
    
        "nested_name": {
    
    
            "nested": {
    
    
                "path": "products"
            },
            "aggs": {
    
    
                "aggs_of_product": {
    
    
                    "terms": {
    
    
                        "field": "products.product_name.keyword"     
                    },
                    "aggs": {
    
    
                        "aggs_sum_of_budget": {
    
    
                            "sum": {
    
    
                                "field": "products.budget"
                            }
                        }
                    }
                }
            }
        }
    }
}

5.3 Modified query results

{
    
    
    "aggregations": {
    
    
        "aggr_field_product": {
    
    
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 36126,
            "buckets": [
                {
    
    
                    "key": "电视机",
                    "doc_count": 2,
                    "sum_eventTime": {
    
    
                        "value": 4000
                    }
                },
                {
    
    
                    "key": "手机",
                    "doc_count": 1,
                    "sum_eventTime": {
    
    
                        "value": 851
                    }
                },
                {
    
    
                    "key": "相机",
                    "doc_count": 1,
                    "sum_eventTime": {
    
    
                        "value": 5000
                    }
                },
                {
    
    
                    "key": "扑克牌",
                    "doc_count": 1,
                    "sum_eventTime": {
    
    
                        "value": 2
                    }
                }
            ]
        }
    }
}

6. Reference materials
stackoverflow-nested-array-of-objects-aggregation-in-elasticsearch
Elasticsearch 7.x Nested nested type query - Zhihu
Elasticsearch Nested type in-depth explanation

Guess you like

Origin blog.csdn.net/weixin_42581660/article/details/128560202