ElasticSearch Must Know - Advanced

JD Logistics: Kang Rui, Yao Zaiyi, Li Zhen, Liu Bin, Wang Beiyong

Note: All of the following are based on ElasticSearch 8.1 version

1. Cross-cluster retrieval - ccr

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

Background and implications of cross-cluster retrieval

Retrieving Definitions Across Clusters

Building a cross-cluster retrieval environment

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html

Step 1: Build two local single-node clusters, and the security configuration can be canceled for local exercises

Step 2: Execute the following command for each cluster

PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "172.21.0.14:9301" ] },"cluster_two": { "seeds": [ "172.21.0.14:9302" ] } } } } }

Step 3: Verify whether the clusters communicate with each other

Solution 1: Kibana visual inspection: stack Management -> Remote Clusters -> status should be connected! And must be marked with a green checkmark.

​ Solution 2: GET _remote/info

Cross-cluster query exercise

# 步骤1 在集群 1 中添加数据如下
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster01..."}

# 步骤2 在集群 2 中添加数据如下:
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster02..."}

# 步骤 3:执行跨集群检索如下: 语法:POST 集群名称1:索引名称,集群名称2:索引名称/_search
POST cluster_one:test01,cluster_two:test01/_search
{
  "took" : 7,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "_clusters" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cluster_two:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster02..."
        }
      },
      {
        "_index" : "cluster_one:test01",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "this is from cluster01..."
        }
      }
    ]
  }
}



2. Cross-cluster replication - ccs - this function requires payment

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html

How to ensure the high availability of the cluster

  1. copy mechanism
  2. snapshot and recovery
  3. Cross-cluster replication (similar to mysql master-slave synchronization)

Overview of cross-cluster replication

Copy configuration across clusters

  1. Prepare two clusters for network communication
  2. Open the license and use it, you can try it for 30 days
  • Open location: Stack Management -> License management.

3. Define who is the Leads cluster and who is the follower cluster

4. Configure the Leader cluster in the follower cluster

5. Configure the index synchronization rules of the Leader cluster in the follower cluster (kibana page configuration)

a.stack Management -> Cross Cluster Replication -> create a follower index.

6. Enable the configuration of step 5


Three Index Template

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

8.X component template

1. Create a component template-index setting related

# 组件模板 - 索引setting相关
PUT _component_template/template_sttting_part
{
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 0
    }
  }
}


2. Create a component template - index mapping related

# 组件模板 - 索引mapping相关
PUT _component_template/template_mapping_part
{
  "template": {
    "mappings": {
      "properties": {
        "hosr_name":{
          "type": "keyword"
        },
        "cratet_at":{
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    }
  }
}


3. Create a component template - configure the association between the template and the index

// **注意:composed_of 如果多个组件模板中的配置项有重复,后面的会覆盖前面的,和配置的顺序有关**
# 基于组件模板,配置模板和索引之间的关联
# 也就是所有 tem_* 该表达式相关的索引创建时,都会使用到以下规则
PUT _index_template/template_1
{
  "index_patterns": [
    "tem_*"
  ],
  "composed_of": [
    "template_sttting_part",
    "template_mapping_part"
  ]
}


4. Test

# 创建测试
PUT tem_001


Basic operations of index templates

Practical exercise

Requirement 1: By default, if the Mapping is not explicitly specified, the value type will be dynamically mapped to the long type, but in fact, the business values ​​are relatively small, and there will be waste of storage. Need to specify default value as Integer

Index template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-dynamic template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        }
      ]
    }
  }
}

# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}

# 3.创建测试数据
POST tem1_001/_doc/1
{
  "age":18
}

# 4.查看mapping结构验证
get tem1_001/_mapping



Requirement 2: Fields beginning with date_* are uniformly matched as date types.

Index template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html

mapping-dynamic template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html

# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "integers": {
            "match_mapping_type": "long",
            "mapping": {
              "type": "integer"
            }
          }
        },
        {
        "date_type_process": {
          "match": "date_*",
          "mapping": {
            "type": "date",
            "format":"yyyy-MM-dd HH:mm:ss"
          }
        }
      }
      ]
    }
  }
}

# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
  "index_patterns": ["tem1_*"],
  "composed_of": ["template_mapping_part_01"]
}


# 3.创建测试数据
POST tem1_001/_doc/2
{
  "age":19,
  "date_aoe":"2022-01-01 18:18:00"
}

# 4.查看mapping结构验证
get tem1_001/_mapping


4. LIM index life cycle management

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html

What is index life cycle

Indexed life -> old -> sick -> dead

Have you ever considered that if an index is created, it will no longer be managed? what happens?

What is Index Lifecycle Management

What happens if the index is too large?

The recovery time of a large index is much slower than the recovery of a small index. After the multi-index is large, the retrieval will be very slow, and writing and updating will also be affected to varying degrees. When the index is large to a certain extent, when the index has health problems, Will cause the core business of the entire cluster to be unavailable

Best Practices

The upper limit of the maximum number of documents in a single shard of a cluster: 2 to the 32nd power minus 1, that is, about 2 billion Official recommendations: The size of a shard should be controlled at 30GB-50GB. If the amount of index data increases infinitely, it will definitely exceed this value

Users do not pay attention to the full amount

In some business scenarios, the business pays more attention to the recent data. For example, the large index of the last 3 days or the last 7 days will bring together all the historical data, which is not conducive to the query in this scenario.

The Historical Evolution of Index Lifecycle Management

LIM Prelude - rollover scrolling index

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html

# 0.自测前提,lim生命周期rollover频率。默认10分钟
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}

# 1. 创建索引,并指定别名
PUT test_index-0001
{
  "aliases": {
    "my-test-index-alias": {
      "is_write_index": true
    }
  }
}

# 2.批量导入数据
PUT my-test-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}

# 3.rollover 滚动规则配置
POST my-test-index-alias/_rollover
{
  "conditions": {
    "max_age": "7d",
    "max_docs": 5,
    "max_primary_shard_size": "50gb"
  }
}

# 4.在满足条件的前提下创建滚动索引
PUT my-test-index-alias/_bulk
{"index":{"_id":7}}
{"title":"testing 07"}

# 5.查询验证滚动是否成功
POST my-test-index-alias/_search


LIM prelude - shrink index compression

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.html

Core steps:

1. Migrate all data to an independent node

2. The index prohibits writing

3. Before compressing

# 1.准备测试数据
DELETE kibana_sample_data_logs_ext
PUT kibana_sample_data_logs_ext
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0
  }
}
POST _reindex
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "kibana_sample_data_logs_ext"
  }
}


# 2.压缩前必要的条件设置
# number_of_replicas :压缩后副本为0
# index.routing.allocation.include._tier_preference 数据分片全部路由到hot节点
# "index.blocks.write 压缩后索引不再允许数据写入
PUT kibana_sample_data_logs_ext/_settings
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.routing.allocation.include._tier_preference": "data_hot",
    "index.blocks.write": true
  }
}

# 3.实施压缩
POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
{
  "settings":{
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.codec":"best_compression"
  },
  "aliases":{
    "kibana_sample_data_logs_alias":{}
  }
}


LIM combat

Global cognition establishment - four stages

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html

Lifecycle management phase (Policy):
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html

Hot stage (raw)

Set priority

Unfollow

Rollover

Read-only

Shrink

Force Merge

Search snapshot

Warm phase (old)

Set priority

Unfollow

Read-only

Allocate

migrate

Shirink

Force Merge

Cold stage (disease)

Search snapshot

Delete stage (dead)

delete

drill

1. Create a policy

  • Hot stage settings, rollover: max_age:3d, max_docs:5, max_size:50gb, priority: 100

  • Warm stage setting: min_age: 15s, forcemerage segment merge, hot node migrated to warm node, number of copies set to 0, priority: 50

  • Cold stage setting: min_age 30s, warm migrates to cold stage

  • Delete stage setting: min_age 45s, perform delete operation

PUT _ilm/policy/kr_20221114_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_size": "50gb",
            "max_primary_shard_size": "50gb",
            "max_age": "3d",
            "max_docs": 5
          }
        }
      },
      "warm": {
        "min_age": "15s",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "cold": {
        "min_age": "30s",
        "actions": {
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}


2. Create index template

PUT _index_template/kr_20221114_template
{
  "index_patterns": ["kr_index-**"],
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "kr_20221114_policy",
          "rollover_alias": "kr-index-alias"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data-hot"
            }
          }
        },
        "number_of_shards": "3",
        "number_of_replicas": "1"
      }
    },
    "aliases": {},
    "mappings": {}
  }
}



3. The test needs to modify the lim rollover refresh frequency

PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s"
  }
}


4. Run the test

# 创建索引,并制定可写别名
PUT kr_index-0001
{
  "aliases": {
    "kr-index-alias": {
      "is_write_index": true
    }
  }
}
# 通过别名新增数据
PUT kr-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 通过别名新增数据,触发rollover
PUT kr-index-alias/_bulk
{"index":{"_id":6}}
{"title":"testing 06"}
# 查看索引情况
GET kr_index-0001

get _cat/indices?v


process summary

Step 1: Configure lim police

  • Horizontal: Phrase stage (Hot, Warm, Cold, Delete) Birth, old age, sickness and death

  • Vertical: Action operation (rollover, forcemerge, readlyonly, delete)

Step 2: Create a template binding policy and specify an alias

Step 3: Create a starting index

Step 4: The index scrolls based on the policy specified in the first step


5. Data Stream

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html

Feature analysis

Data Stream allows us to store time series data across multiple indexes, and at the same time gives a unique external interface (data stream name)

  • Write and retrieve requests are sent to the data stream

  • The data stream routes these requests to the backing index (background index)

Backing indices

Each data stream consists of multiple hidden background indexes

  • automatically created

  • Request Template Index

The rollover rolling index mechanism is used to automatically generate background indexes

  • Will become the new write index of data stream

Application Scenario

  1. Logs, events, metrics, and other business data that are continuously created (less updated)
  2. Two core features
  3. time series data
  4. Data is rarely or never updated

Create Data Stream core steps

Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html

Set up a data stream

To set up a data stream, follow these steps:

  1. Create an index lifecycle policy
  2. Create component templates
  3. Create an index template
  4. Create the data stream
  5. Secure the data stream

drill

1. Create a data stream named my-data-stream

2. The name of index_template is my-index-template

3. All indexes satisfying the index format ["my-data-stream*"] must be applied to

4. When data is inserted, in the data_hot node

5. Rollover to the data_warm node after 3 minutes

6. Go to the data_cold node in another 5 minutes

# 步骤1 。创建 lim policy
PUT _ilm/policy/my-lifecycle-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "3m",
            "max_docs": 5
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "5m",
        "actions": {
          "allocate": {
            "number_of_replicas": 0
          }, 
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "6m",
        "actions": {
          "freeze":{}
        }
      },
      "delete": {
        "min_age": "45s",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

# 步骤2 创建组件模板 - mapping
PUT _component_template/my-mappings
{
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "date_optional_time||epoch_millis"
        },
        "message": {
          "type": "wildcard"
        }
      }
    }
  },
  "_meta": {
    "description": "Mappings for @timestamp and message fields",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤3 创建组件模板 - setting
PUT _component_template/my-settings
{
  "template": {
    "settings": {
      "index.lifecycle.name": "my-lifecycle-policy",
      "index.routing.allocation.include._tier_preference":"data_hot"
    }
  },
  "_meta": {
    "description": "Settings for ILM",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤4 创建索引模板
PUT _index_template/my-index-template
{
  "index_patterns": ["my-data-stream*"],
  "data_stream": { },
  "composed_of": [ "my-mappings", "my-settings" ],
  "priority": 500,
  "_meta": {
    "description": "Template for my time series data",
    "my-custom-meta-field": "More arbitrary metadata"
  }
}

# 步骤5 创建 data stream  并 写入数据测试
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }

POST my-data-stream/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}


# 步骤6 查看data stream 后台索引信息
GET /_resolve/index/my-data-stream*

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/6304775