ES compares the data difference between the two indexes

I. Introduction

      After we modify the index mapping, in order not to affect the online service, we generally need to create a new index and then refresh the data. However, whether the data in the new index is normal and what is the difference compared with the old index data is always difficult to verify.

Fortunately, referring to the article of the big brother, the following two schemes have been implemented in detail. Compared with the data of the old and new indexes, the link of the big brother's article: Diagram | Elasticsearch Four schemes for obtaining the difference between the two index data

2. The way of kibana

1. Kibana compares the data difference between the two indexes

      Sometimes we need to compare the field difference between two indexes, such as Idthe difference between two indexes, so as to find the missing data, we can use the following to get it sqldone. (This method can be used in local or other environments)

(1)打开kibana的dev tools
(2)输入以下sql
(3)index_old,index_new是要对比的索引名称
(4)id 是对比的字段,最好是业务上的唯一字段
(5)执行,查看结果即可。
原理:使用聚合的方式,如果两个索引id相同,则聚合结果为2.我们查询聚合结果<2的数据,那么结果里面就是缺失的id.


POST index_new,index_old/_search
{
  "size": 0,
  "aggs": {
    "group_by_uid": {
      "terms": {
        "field": "id",
        "size": 1000000
      },
      "aggs": {
        "count_indices": {
          "cardinality": {
            "field": "_index"
          }
        },
        "values_bucket_filter_by_index_count": {
          "bucket_selector": {
            "buckets_path": {
              "count": "count_indices"
            },
            "script": "params.count < 2"
          }
        }
      }
    }
  }
}

result:

注意:这里的 "key" : 6418 就代表差值里面有id为6418的记录,需要自己去检查为什么会出现差异。。

{
  "took" : 1851,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 21969,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_uid" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 6418,
          "doc_count" : 1,
          "count_indices" : {
            "value" : 1
          }
        },
        {
          "key" : 6419,
          "doc_count" : 1,
          "count_indices" : {
            "value" : 1
          }
        }
}}}

2. Other wheels

github: esdiff
ps: The author of this plugin is the author of olivere/elastic, produced by a big guy, you can try it

1. Local use steps

1.下载
go install github.com/olivere/esdiff@latest

2.执行命令
./esdiff -u=true -d=false 'http://localhost:9200/index_old/type' 'http://localhost:9200/index_new/type'

3.效果
Unchanged       1
Updated 3       {*diff.Document}.Source["message"]:
        -: "Playing the piano is fun as well"
        +: "Playing the guitar is fun as well"
 
Created 4       {*diff.Document}:
        -: (*diff.Document)(nil)
        +: &diff.Document{ID: "4", Source: map[string]interface {}{"message": "Climbed that mountain", "user": "sandrae"}}

2. Common parameters

      When adding or deleting fields, it is easier to use excludeor to compare the accuracy of data other than the specified fields.include

esdiff [flags] <source-url> <destination-url>

 -dsort string  [根据destination索引字段排序] {"term":{"name.keyword":"Oliver"}}
-ssort string   [根据source索引字段排序]"id" or "-id"
-exclude string  [source中排除某些字段]"hash_value,sub.*"
-include string  [source中包含某些字段] "obj.*"

3. Custom Document Id

      Since the blogger's current document IDfields are based on the index name, for example:

//虽然id都是1,但是文档Id不一样,导致会出现在差异中
index_old_1
index_new_1

Our requirement is mainly to compare sourcethe fields inside, so a new -replace-withparameter is added to specify the uniqueness ID.
For example:

//使用id来替换文档ID,实现source字段的对比,获取差异

go run main.go -ssort=unit_id -dsort=unit_id -replace-with=id'http://localhost:9200/index_old/type' 'http://localhost:9200/index_new/type'

4. Wheel contrast difference principle

1.根据参数批量读取es数据,使用scroll游标查询,默认一次100条
2.使用go-cmp包的cmp.Equal(srcDoc.Source, dstDoc.Source) 对比数据
3.根据参数打印created,updated,deleted等差异数据

end

Guess you like

Origin blog.csdn.net/LJFPHP/article/details/125882840