elasticsearch(21) 父子关系文档

nested object的建模，有个不好的地方，就是采取的是类似冗余数据的方式，将多个数据都放在一起了，维护成本就比较高

parent child建模方式，采取的是类似于关系型数据库的三范式类的建模，多个实体都分割开来，每个实体之间都通过一些关联方式，进行了父子关系的关联，各种数据不需要都放在一起，父doc和子doc分别在进行更新的时候，都不会影响对方

一对多关系的建模，维护起来比较方便，而且我们之前说过，类似关系型数据库的建模方式，应用层join的方式，会导致性能比较差，因为做多次搜索。父子关系的数据模型，不会，性能很好。因为虽然数据实体之间分割开来，但是我们在搜索的时候，由es自动为我们处理底层的关联关系，并且通过一些手段保证搜索性能。

父子关系数据模型，相对于nested数据模型来说，优点是父doc和子doc互相之间不会影响

要点：父子关系元数据映射，用于确保查询时候的高性能，但是有一个限制，就是父子数据必须存在于一个shard中

父子关系数据存在一个shard中，而且还有映射其关联关系的元数据，那么搜索父子关系数据的时候，不用跨分片，一个分片本地自己就搞定了，性能当然高咯

案例背景：研发中心员工管理案例，一个IT公司有多个研发中心，每个研发中心有多个员工

PUT /company
{
"mappings": {
"rd_center": {},
"employee": {
"_parent": {
"type": "rd_center"
}
}
}
}

父子关系建模的核心，多个type之间有父子关系，用_parent指定父type

POST /company/rd_center/_bulk
{ "index": { "_id": "1" }}
{ "name": "北京研发总部", "city": "北京", "country": "中国" }
{ "index": { "_id": "2" }}
{ "name": "上海研发中心", "city": "上海", "country": "中国" }
{ "index": { "_id": "3" }}
{ "name": "硅谷人工智能实验室", "city": "硅谷", "country": "美国" }

shard路由的时候，id=1的rd_center doc，默认会根据id进行路由，到某一个shard

PUT /company/employee/1?parent=1
{
"name": "张三",
"birthday": "1970-10-24",
"hobby": "爬山"
}

维护父子关系的核心，parent=1，指定了这个数据的父doc的id

此时，parent-child关系，就确保了说，父doc和子doc都是保存在一个shard上的。内部原理还是doc routing，employee和rd_center的数据，都会用parent id作为routing，这样就会到一个shard

就不会根据id=1的employee doc的id进行路由了，而是根据parent=1进行路由，会根据父doc的id进行路由，那么就可以通过底层的路由机制，保证父子数据存在于一个shard中

POST /company/employee/_bulk
{ "index": { "_id": 2, "parent": "1" }}
{ "name": "李四", "birthday": "1982-05-16", "hobby": "游泳" }
{ "index": { "_id": 3, "parent": "2" }}
{ "name": "王二", "birthday": "1979-04-01", "hobby": "爬山" }
{ "index": { "_id": 4, "parent": "3" }}
{ "name": "赵五", "birthday": "1987-05-11", "hobby": "骑马" }

我们已经建立了父子关系的数据模型之后，就要基于这个模型进行各种搜索和聚合了

1、搜索有1980年以后出生的员工的研发中心

GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"query": {
"range": {
"birthday": {
"gte": "1980-01-01"
}
}
}
}
}
}

{
"took": 33,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "company",
"_type": "rd_center",
"_id": "1",
"_score": 1,
"_source": {
"name": "北京研发总部",
"city": "北京",
"country": "中国"
}
},
{
"_index": "company",
"_type": "rd_center",
"_id": "3",
"_score": 1,
"_source": {
"name": "硅谷人工智能实验室",
"city": "硅谷",
"country": "美国"
}
}
]
}
}

2、搜索有名叫张三的员工的研发中心

GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"query": {
"match": {
"name": "张三"
}
}
}
}
}

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "company",
"_type": "rd_center",
"_id": "1",
"_score": 1,
"_source": {
"name": "北京研发总部",
"city": "北京",
"country": "中国"
}
}
]
}
}

3、搜索有至少2个以上员工的研发中心

GET /company/rd_center/_search
{
"query": {
"has_child": {
"type": "employee",
"min_children": 2,
"query": {
"match_all": {}
}
}
}
}

{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "company",
"_type": "rd_center",
"_id": "1",
"_score": 1,
"_source": {
"name": "北京研发总部",
"city": "北京",
"country": "中国"
}
}
]
}
}

4、搜索在中国的研发中心的员工

GET /company/employee/_search
{
"query": {
"has_parent": {
"parent_type": "rd_center",
"query": {
"term": {
"country.keyword": "中国"
}
}
}
}
}

{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "company",
"_type": "employee",
"_id": "3",
"_score": 1,
"_routing": "2",
"_parent": "2",
"_source": {
"name": "王二",
"birthday": "1979-04-01",
"hobby": "爬山"
}
},
{
"_index": "company",
"_type": "employee",
"_id": "1",
"_score": 1,
"_routing": "1",
"_parent": "1",
"_source": {
"name": "张三",
"birthday": "1970-10-24",
"hobby": "爬山"
}
},
{
"_index": "company",
"_type": "employee",
"_id": "2",
"_score": 1,
"_routing": "1",
"_parent": "1",
"_source": {
"name": "李四",
"birthday": "1982-05-16",
"hobby": "游泳"
}
}
]
}
}

elasticsearch(21) 父子关系文档

猜你喜欢