【ElasticSearch系列-08】ElasticSearch处理对象间的关联关系

ElasticSearch系列整体栏目

内容	链接地址
【一】ElasticSearch下载和安装	https://zhenghuisheng.blog.csdn.net/article/details/129260827
【二】ElasticSearch概念和基本操作	https://blog.csdn.net/zhenghuishengq/article/details/134121631
【三】ElasticSearch的高级查询Query DSL	https://blog.csdn.net/zhenghuishengq/article/details/134159587
【四】ElasticSearch的聚合查询操作	https://blog.csdn.net/zhenghuishengq/article/details/134159587
【五】SpringBoot整合elasticSearch	https://blog.csdn.net/zhenghuishengq/article/details/134212200
【六】Es集群架构的搭建以及集群的核心概念	https://blog.csdn.net/zhenghuishengq/article/details/134258577
【七】ES的开发场景和索引分片的设置及优化	https://blog.csdn.net/zhenghuishengq/article/details/134302130
【八】ElasticSearch处理对象间的关联关系	https://blog.csdn.net/zhenghuishengq/article/details/134327295

Es处理对象间的关联关系

一，Es处理对象间的关联关系

一，Es处理对象间的关联关系

es属于是nosql类型的非关系型数据库，而在处理关联关系时，往往是不擅长处理这种关联关系的，不像mysql，通过范式化来处理这种关联关系。

范式化有利于减少数据的冗余，减少数据库的空间，让整体维护更加简单，但是在查询时需要多步查询，join联表查询也会让整个查询时间增加；反范式需要将数据冗余，不需要考虑关联关系，也无需进行这些join的操作，在读取数据时性能更高，但是缺点也很明显，在修改数据时是比较麻烦的，可能就是因为一个字段的修改，就可能会引起多条数据的修改。

在ElasticSearch中，主要也是考虑这种非关系型数据库的走向，内部主要有四种方法处理这种关联数据的场景，分别是对象类型、嵌套类型、父子关系类型、应用端关联

1，对象类型

1.1，对象类型的kibana操作

如在文档中包含对象的数据类型，举例如下，在article文章的索引中，里面有一个属性为一个对象属性user，就是每一篇文章中都包含着一个user用户的信息，创建索引的语句如下

PUT /article
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type":"text"
      },
      "createTime":{
    
    
        "type": "date"
      },
      "user":{
    
    
        "properties": {
    
    
          "username":{
    
    
            "type":"keyword"
          },
          "age":{
    
    
            "type":"long"
          },
          "sex":{
    
    
            "type":"text"
          }
        }
      }
    }
  }
}

随后往这个索引中插入一条数据，并且设置用户的信息

PUT /article/_doc/1
{
    
    
  "title":"ElasticSearch学习",
  "createTime":"2023-11-09T00:00:00",
  "user":{
    
    
    "username":"zhenghuisheng",
    "age":"18",
    "sex":"男"
  }
}

用户的查询如下，通过用户名进行查询，这里可以直接通过 对象.属性 的方式进行数据查询

GET /article/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "user.username": "zhenghuisheng"
    }
  }
}

1.2，对象类型的java操作

在创建索引之前，需要通过配置获取es的连接，其配置类如下

@Bean
public RestHighLevelClient esRestClient(){
    
    
    RestHighLevelClient client = new RestHighLevelClient(
            RestClient.builder(new HttpHost("xxx.33.xxx.xxx", 9200, "http")));
    return  client;
}

创建articl索引和插入数据的java代码如下，里面的client参数为springboot整合篇的数据

//插入数据
IndexRequest userIndex = new IndexRequest("article");
User user = new User();
user.setUsername("zhenghuisheng");
user.setAge(18);
user.setSex("男");
//添加数据
userIndex.source(JSON.toJSONString(user), XContentType.JSON);
//client为前面springBoot整合的客户端,通过resource导入
client.index(userIndex, ElasticSearchConfig.COMMON_OPTIONS);

查询数据的方式如下，在设置这个字段时，即使是子字段，也可以直接通过拼接的方式设置即可user.username

SearchRequest request = new SearchRequest("article");
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchQuery("user.username","zhenghuisheng"));
request.source(builder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
System.out.println(search);

2，嵌套类型

2.1，嵌套类型的kibana操作

嵌套对象指的是对象数组的对象可以被独立索引。就是说在es内部会将数据进行分词操作，但是在查询时，可能就是会因这个操作，导致将不正确的数据查询出来，如英文名字，有firstName和lastName，但是因为组合问题，将不必要的字段查询出来,如 zhan san、li si 但是在查询zhan si的时候，是会将这两条数据查询出来的，按理是不存在的.

为了解决这个嵌套类型的问题，可以通过关键字 nested 类型来解决，这个底层就是会将文档保存在两个索引库中，在做查询时，就会有一个join的连接查询

如下面的案例，先创建一个索引数据，依旧是创建一个文章的索引，然后内部有一个author作者的信息

PUT /article
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "text"
      },
      "author":{
    
    
        "type": "nested",
        "properties": {
    
    
          "first_name":{
    
    
            "type":"keyword"
          },
          "last_name":{
    
    
            "type":"keyword"
          }
        }
      }
    }
  }
}

往这个文档中插入一条数据，如下，里面的作者有两个，以数组的形式存储

POST /article/_doc/1
{
    
    
  "title":"ElasticSearch教学",
  "author":[
    {
    
    
      "first_name":"zheng",
      "last_name":"huisheng"
    },
    {
    
    
      "first_name":"li",
      "last_name":"si"
    }
  ]
}

那么在查询的时候，只需要用nested 进行数据查询即可，后面的path路径就是对应的需要查询的对象，这样在查询时，就不会将不需要的数据给查询出来

GET /article/_search
{
    
    
  "query": {
    
    
    "nested": {
    
    				//固定搭配，可以直接进紧跟在query后面
      "path": "author",
      "query": {
    
    
        "bool": {
    
    
          "must": [
            {
    
    
              "match": {
    
    
                "author.first_name": "zheng"
              }
            },
            {
    
    
              "match": {
    
    
                "author.last_name": "si"
              }
            }
          ]
        }
      }
    }
  }
}

在聚合查询时，也需要指定这个nested这个属性值，设置路径为查询的对象

GET /article/_search
{
    
    
  "aggs": {
    
    
    "author": {
    
    
      "nested": {
    
    
        "path":"author"
      }
    }
  }
}

2.2，嵌套类型的java操作

嵌套类型对应的java代码如下，首先先创建一个索引，对参数进行设置

    @Test
    public void createIndex() throws Exception{
    
    
        XContentBuilder mapping = XContentFactory.jsonBuilder()
                .startObject()
                .startObject("properties")
                .startObject("title")
                .field("type","text")
                .endObject()
                .startObject("author")
                .field("type","nested")
                .startObject("properties")
                .startObject("first_name")
                .field("type","keyword")
                .endObject()
                .startObject("last_name")
                .field("type","keyword")
                .endObject()
                .endObject()
                .endObject()
                .endObject()
                .endObject();
        CreateIndexRequest request = new CreateIndexRequest("article")
                .settings(Settings.builder()
                        .put("number_of_shards", 3)		//设置分片数
                        .put("number_of_replicas", 1)	//设置副本数
                        .build())
                .mapping(mapping);
        //执行创建
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println("执行结果为" + response);
    }

插入数据的方式直接如下，和上面用kibana操作的一样，插入两条数据

//插入数据
IndexRequest userIndex = new IndexRequest("article");
List<Author> list = new ArrayList<>();
Author author1 = new Author();
author1.setFirstName("zheng");
author1.setLastName("huisheng");
Author author2 = new Author();
author2.setFirstName("li");
author2.setLastName("si");
list.add(author1);
list.add(author2);
Article article = new Article();
article.setTitle("ElasticSearch教学");
article.setAuthor(list);
//添加数据
userIndex.source(JSON.toJSONString(article), XContentType.JSON);
client.index(userIndex, ElasticSearchConfig.COMMON_OPTIONS);

接下来就是查询数据，直接通过构建这个NestedQueryBuilder即可

//查询数据
SearchRequest request = new SearchRequest("article");
String path = "author";
QueryBuilder builder =
        new NestedQueryBuilder(path, QueryBuilders.boolQuery()
                .must(QueryBuilders.matchQuery("author.first_name", "zheng"))
                .must(QueryBuilders.matchQuery("author.last_name","si")), ScoreMode.None);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(builder);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
System.out.println(search);

3，父子类型类型

3.1，父子类型的kibana操作

在ElasticSearch中，也存在着父子关系的文档，其内部通过join的方式连接父子文档的关系，并且在es中，实现了父文档和子文档之间的独立，也就是说当某一个父文档要更新时，可以不需要修改子文档的数据，而使用最上面的对象类型，就是只要某一些数据更新，就可能引起大范围数据的更新。通过文档独立的方式，单个文档间的操作不会互相影响。

但是也有一个缺点，就是大数据的联表，其效率肯定是不高的，但是优点是在更新时效率会更高。

在使用这个父子关系文档时，需要在建立索引的时候，确定父索引和子索引之间的关系。如下，需要通过关键字join来表明是连接关系，在relations中，设置key为父索引的名称，value为子索引名称

"teacher_student_relation": {
    
    
  "type": "join",	//指明类型
  "relations": {
    
    	//确认关系
    "teacher": "student"	//teacher为父文档、student为子文档
  }
}

如创建一个用户索引，分别对应的是student学生和teacher老师的信息，设置分片数为3，teacher为父文档，student为子文档

PUT /user
{
    
    
  "settings": {
    
    
    "number_of_shards": 3
  },
  "mappings": {
    
    
    "properties": {
    
    
      "relation": {
    
    
        "type": "join",
        "relations": {
    
    
          "teacher": "student"
        }
      },
      "username": {
    
    
        "type": "keyword"
      },
      "sex": {
    
    
        "type": "text"
      }
    }
  }
}

接下来往父文档中插入一条数据，依旧需要 relation 这个属性，并且表名为teacher父文档

PUT /user/_doc/1
{
    
    
  "username":"Tom",
  "sex":"男",
  "relation":{
    
    
    "name":"teacher"	//表明为父文档
  }
}

接下来指定子文档，为了解决这个join查询的性能，需要通过routing路由功能，让子文档和父文档路由到相同的分片上面，其次就是也需要指定这个父子文档的属性，除了设置子文档的名称之外，还需要指定父文档的名称

PUT /user/_doc/student1?routing=1
{
    
    
  "username":"zhenghuisheng",
  "sex":"男",
  "relation":{
    
    
    "name":"student",
    "parent":"teacher"
  }
}

那么一下就是一些查询的方式，如通过id的方式查询如下

GET /user/_doc/1		//根据父文档id查询
GET /user/_doc/student1?routing=1		//通过子文档查询

还可以查询子文档中是否包含某些数据，里面需要注意使用的是 has_child ，并且为类型为type

//查询子文档中，是否包含某些数据
GET /user/_search
{
    
    
  "query": {
    
    
    "has_child": {
    
    
      "type": "student",
      "query": {
    
    
        "match": {
    
    
          "username": "zhenghusiheng"
        }
      }
    }
  }
}

同时也存在查询父文档中，是否包含某些数据，这里需要使用 has_parent ，类型为parent_type

GET /user/_search
{
    
    
  "query": {
    
    
    "has_parent": {
    
    
      "parent_type": "teacher",
      "query": {
    
    
        "match": {
    
    
          "username": "Tom"
        }
      }
    }
  }
}

3.2，父子类型的java代码

首先也是创建索引，设置副本信息这些

XContentBuilder mapping = XContentFactory.jsonBuilder()
                .startObject()
                .startObject("properties")
                .startObject("teacher_student_relation")
                .field("type","join")
                .startObject("relations")
                .field("teacher","student")
                .endObject()
                .endObject()
                .startObject("username")
                .field("type","keyword")
                .endObject()
                .startObject("sex")
                .field("type","text")
                .endObject()
                .endObject()
                .endObject();
        CreateIndexRequest request = new CreateIndexRequest("user")
                .settings(Settings.builder()
                        .put("number_of_shards", 3)
                        .put("number_of_replicas", 1)
                        .build())
                .mapping(mapping);
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println("执行结果为" + response);

随后也是插入数据，首先是父文档插入数据，需要设置索引名，并且最好设置文档id，后面子文档查询时，也是需要通过routing路由指定和父文档一样的id，所以最好自己指定

//指定索引和路由
IndexRequest request = new IndexRequest("user");
request.id("teacher1");
User user = new User();
user.setUsername("Tom");
user.setSex("男");
Relation relation = new Relation();
relation.setName("teacher");
user.setRelation(relation);
request.source(JSON.toJSONString(user), XContentType.JSON);
IndexResponse response = client.index(request, ElasticSearchConfig.COMMON_OPTIONS);
System.out.println(response);

随后是插入子文档的数据，需要指定路由，可以通过该路由使得子文档数据和父文档数据再一个分片上，这样有利于提升join的关联查询。除此之外，还需要设置这个parent的值

//指定索引和路由
IndexRequest request = new IndexRequest("user").routing("teacher1");
User user = new User();
user.setUsername("zhenghuisheng");
user.setSex("男");
Relation relation = new Relation();
relation.setName("student");
relation.setParent("teacher");
user.setRelation(relation);
request.source(JSON.toJSONString(user), XContentType.JSON);
IndexResponse response = client.index(request, ElasticSearchConfig.COMMON_OPTIONS);
System.out.println(response);

嵌套文档和父子文档的主要区别如下：嵌套文档通过nested文档实现，其文档都是以冗余的方式存储在一起，其读取数据的性能相对较高，但是更新性能低；父子文档通过join的方式实现，父子文档数据独立，但是需要额外维护父子关系，读取数据的性能相对来说比较差

嵌套文档的场景主要适用于以查询为主的数据，更新的数据比较少；父子文档的场景主要在于子文档可能会出现频繁更新。