JD Logistics: Kang Rui, Yao Zaiyi, Li Zhen, Liu Bin, Wang Beiyong
Note: All of the following are based on ElasticSearch 8.1 version
1. Cross-cluster retrieval - ccr
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
Background and implications of cross-cluster retrieval
Retrieving Definitions Across Clusters
Building a cross-cluster retrieval environment
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
Step 1: Build two local single-node clusters, and the security configuration can be canceled for local exercises
Step 2: Execute the following command for each cluster
PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "172.21.0.14:9301" ] },"cluster_two": { "seeds": [ "172.21.0.14:9302" ] } } } } }
Step 3: Verify whether the clusters communicate with each other
Solution 1: Kibana visual inspection: stack Management -> Remote Clusters -> status should be connected! And must be marked with a green checkmark.
Solution 2: GET _remote/info
Cross-cluster query exercise
# 步骤1 在集群 1 中添加数据如下
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster01..."}
# 步骤2 在集群 2 中添加数据如下:
PUT test01/_bulk
{"index":{"_id":1}}
{"title":"this is from cluster02..."}
# 步骤 3:执行跨集群检索如下: 语法:POST 集群名称1:索引名称,集群名称2:索引名称/_search
POST cluster_one:test01,cluster_two:test01/_search
{
"took" : 7,
"timed_out" : false,
"num_reduce_phases" : 3,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"_clusters" : {
"total" : 2,
"successful" : 2,
"skipped" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "cluster_two:test01",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "this is from cluster02..."
}
},
{
"_index" : "cluster_one:test01",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "this is from cluster01..."
}
}
]
}
}
2. Cross-cluster replication - ccs - this function requires payment
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html
How to ensure the high availability of the cluster
- copy mechanism
- snapshot and recovery
- Cross-cluster replication (similar to mysql master-slave synchronization)
Overview of cross-cluster replication
Copy configuration across clusters
- Prepare two clusters for network communication
- Open the license and use it, you can try it for 30 days
- Open location: Stack Management -> License management.
3. Define who is the Leads cluster and who is the follower cluster
4. Configure the Leader cluster in the follower cluster
5. Configure the index synchronization rules of the Leader cluster in the follower cluster (kibana page configuration)
a.stack Management -> Cross Cluster Replication -> create a follower index.
6. Enable the configuration of step 5
Three Index Template
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
8.X component template
1. Create a component template-index setting related
# 组件模板 - 索引setting相关
PUT _component_template/template_sttting_part
{
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}
2. Create a component template - index mapping related
# 组件模板 - 索引mapping相关
PUT _component_template/template_mapping_part
{
"template": {
"mappings": {
"properties": {
"hosr_name":{
"type": "keyword"
},
"cratet_at":{
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
}
}
}
3. Create a component template - configure the association between the template and the index
// **注意:composed_of 如果多个组件模板中的配置项有重复,后面的会覆盖前面的,和配置的顺序有关**
# 基于组件模板,配置模板和索引之间的关联
# 也就是所有 tem_* 该表达式相关的索引创建时,都会使用到以下规则
PUT _index_template/template_1
{
"index_patterns": [
"tem_*"
],
"composed_of": [
"template_sttting_part",
"template_mapping_part"
]
}
4. Test
# 创建测试
PUT tem_001
Basic operations of index templates
Practical exercise
Requirement 1: By default, if the Mapping is not explicitly specified, the value type will be dynamically mapped to the long type, but in fact, the business values are relatively small, and there will be waste of storage. Need to specify default value as Integer
Index template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
mapping-dynamic template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html
# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
"template": {
"mappings": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "integer"
}
}
}
]
}
}
}
# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
"index_patterns": ["tem1_*"],
"composed_of": ["template_mapping_part_01"]
}
# 3.创建测试数据
POST tem1_001/_doc/1
{
"age":18
}
# 4.查看mapping结构验证
get tem1_001/_mapping
Requirement 2: Fields beginning with date_* are uniformly matched as date types.
Index template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-templates.html
mapping-dynamic template, official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic-templates.html
# 结合mapping 动态模板 和 索引模板
# 1.创建组件模板之 - mapping模板
PUT _component_template/template_mapping_part_01
{
"template": {
"mappings": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "integer"
}
}
},
{
"date_type_process": {
"match": "date_*",
"mapping": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
]
}
}
}
# 2. 创建组件模板与索引关联配置
PUT _index_template/template_2
{
"index_patterns": ["tem1_*"],
"composed_of": ["template_mapping_part_01"]
}
# 3.创建测试数据
POST tem1_001/_doc/2
{
"age":19,
"date_aoe":"2022-01-01 18:18:00"
}
# 4.查看mapping结构验证
get tem1_001/_mapping
4. LIM index life cycle management
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-lifecycle-management.html
What is index life cycle
Indexed life -> old -> sick -> dead
Have you ever considered that if an index is created, it will no longer be managed? what happens?
What is Index Lifecycle Management
What happens if the index is too large?
The recovery time of a large index is much slower than the recovery of a small index. After the multi-index is large, the retrieval will be very slow, and writing and updating will also be affected to varying degrees. When the index is large to a certain extent, when the index has health problems, Will cause the core business of the entire cluster to be unavailable
Best Practices
The upper limit of the maximum number of documents in a single shard of a cluster: 2 to the 32nd power minus 1, that is, about 2 billion Official recommendations: The size of a shard should be controlled at 30GB-50GB. If the amount of index data increases infinitely, it will definitely exceed this value
Users do not pay attention to the full amount
In some business scenarios, the business pays more attention to the recent data. For example, the large index of the last 3 days or the last 7 days will bring together all the historical data, which is not conducive to the query in this scenario.
The Historical Evolution of Index Lifecycle Management
LIM Prelude - rollover scrolling index
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-rollover.html
# 0.自测前提,lim生命周期rollover频率。默认10分钟
PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "1s"
}
}
# 1. 创建索引,并指定别名
PUT test_index-0001
{
"aliases": {
"my-test-index-alias": {
"is_write_index": true
}
}
}
# 2.批量导入数据
PUT my-test-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 3.rollover 滚动规则配置
POST my-test-index-alias/_rollover
{
"conditions": {
"max_age": "7d",
"max_docs": 5,
"max_primary_shard_size": "50gb"
}
}
# 4.在满足条件的前提下创建滚动索引
PUT my-test-index-alias/_bulk
{"index":{"_id":7}}
{"title":"testing 07"}
# 5.查询验证滚动是否成功
POST my-test-index-alias/_search
LIM prelude - shrink index compression
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-shrink.htmlCore steps:
1. Migrate all data to an independent node
2. The index prohibits writing
3. Before compressing
# 1.准备测试数据
DELETE kibana_sample_data_logs_ext
PUT kibana_sample_data_logs_ext
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0
}
}
POST _reindex
{
"source": {
"index": "kibana_sample_data_logs"
},
"dest": {
"index": "kibana_sample_data_logs_ext"
}
}
# 2.压缩前必要的条件设置
# number_of_replicas :压缩后副本为0
# index.routing.allocation.include._tier_preference 数据分片全部路由到hot节点
# "index.blocks.write 压缩后索引不再允许数据写入
PUT kibana_sample_data_logs_ext/_settings
{
"settings": {
"index.number_of_replicas": 0,
"index.routing.allocation.include._tier_preference": "data_hot",
"index.blocks.write": true
}
}
# 3.实施压缩
POST kibana_sample_data_logs_ext/_shrink/kibana_sample_data_logs_ext_shrink
{
"settings":{
"index.number_of_replicas": 0,
"index.number_of_shards": 1,
"index.codec":"best_compression"
},
"aliases":{
"kibana_sample_data_logs_alias":{}
}
}
LIM combat
Global cognition establishment - four stages
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/overview-index-lifecycle-management.html
Lifecycle management phase (Policy):
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-index-lifecycle.html
Hot stage (raw)
Set priority
Unfollow
Rollover
Read-only
Shrink
Force Merge
Search snapshot
Warm phase (old)
Set priority
Unfollow
Read-only
Allocate
migrate
Shirink
Force Merge
Cold stage (disease)
Search snapshot
Delete stage (dead)
delete
drill
1. Create a policy
-
Hot stage settings, rollover: max_age:3d, max_docs:5, max_size:50gb, priority: 100
-
Warm stage setting: min_age: 15s, forcemerage segment merge, hot node migrated to warm node, number of copies set to 0, priority: 50
-
Cold stage setting: min_age 30s, warm migrates to cold stage
-
Delete stage setting: min_age 45s, perform delete operation
PUT _ilm/policy/kr_20221114_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
},
"rollover": {
"max_size": "50gb",
"max_primary_shard_size": "50gb",
"max_age": "3d",
"max_docs": 5
}
}
},
"warm": {
"min_age": "15s",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
},
"allocate": {
"number_of_replicas": 0
}
}
},
"cold": {
"min_age": "30s",
"actions": {
"set_priority": {
"priority": 0
}
}
},
"delete": {
"min_age": "45s",
"actions": {
"delete": {
"delete_searchable_snapshot": true
}
}
}
}
}
}
2. Create index template
PUT _index_template/kr_20221114_template
{
"index_patterns": ["kr_index-**"],
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "kr_20221114_policy",
"rollover_alias": "kr-index-alias"
},
"routing": {
"allocation": {
"include": {
"_tier_preference": "data-hot"
}
}
},
"number_of_shards": "3",
"number_of_replicas": "1"
}
},
"aliases": {},
"mappings": {}
}
}
3. The test needs to modify the lim rollover refresh frequency
PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "1s"
}
}
4. Run the test
# 创建索引,并制定可写别名
PUT kr_index-0001
{
"aliases": {
"kr-index-alias": {
"is_write_index": true
}
}
}
# 通过别名新增数据
PUT kr-index-alias/_bulk
{"index":{"_id":1}}
{"title":"testing 01"}
{"index":{"_id":2}}
{"title":"testing 02"}
{"index":{"_id":3}}
{"title":"testing 03"}
{"index":{"_id":4}}
{"title":"testing 04"}
{"index":{"_id":5}}
{"title":"testing 05"}
# 通过别名新增数据,触发rollover
PUT kr-index-alias/_bulk
{"index":{"_id":6}}
{"title":"testing 06"}
# 查看索引情况
GET kr_index-0001
get _cat/indices?v
process summary
Step 1: Configure lim police
-
Horizontal: Phrase stage (Hot, Warm, Cold, Delete) Birth, old age, sickness and death
-
Vertical: Action operation (rollover, forcemerge, readlyonly, delete)
Step 2: Create a template binding policy and specify an alias
Step 3: Create a starting index
Step 4: The index scrolls based on the policy specified in the first step
5. Data Stream
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-actions.html
Feature analysis
Data Stream allows us to store time series data across multiple indexes, and at the same time gives a unique external interface (data stream name)
-
Write and retrieve requests are sent to the data stream
-
The data stream routes these requests to the backing index (background index)
Backing indices
Each data stream consists of multiple hidden background indexes
-
automatically created
-
Request Template Index
The rollover rolling index mechanism is used to automatically generate background indexes
- Will become the new write index of data stream
Application Scenario
- Logs, events, metrics, and other business data that are continuously created (less updated)
- Two core features
- time series data
- Data is rarely or never updated
Create Data Stream core steps
Official website document address:
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/set-up-a-data-stream.html
Set up a data stream
To set up a data stream, follow these steps:
- Create an index lifecycle policy
- Create component templates
- Create an index template
- Create the data stream
- Secure the data stream
drill
1. Create a data stream named my-data-stream
2. The name of index_template is my-index-template
3. All indexes satisfying the index format ["my-data-stream*"] must be applied to
4. When data is inserted, in the data_hot node
5. Rollover to the data_warm node after 3 minutes
6. Go to the data_cold node in another 5 minutes
# 步骤1 。创建 lim policy
PUT _ilm/policy/my-lifecycle-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "3m",
"max_docs": 5
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "5m",
"actions": {
"allocate": {
"number_of_replicas": 0
},
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
}
}
},
"cold": {
"min_age": "6m",
"actions": {
"freeze":{}
}
},
"delete": {
"min_age": "45s",
"actions": {
"delete": {}
}
}
}
}
}
# 步骤2 创建组件模板 - mapping
PUT _component_template/my-mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"message": {
"type": "wildcard"
}
}
}
},
"_meta": {
"description": "Mappings for @timestamp and message fields",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤3 创建组件模板 - setting
PUT _component_template/my-settings
{
"template": {
"settings": {
"index.lifecycle.name": "my-lifecycle-policy",
"index.routing.allocation.include._tier_preference":"data_hot"
}
},
"_meta": {
"description": "Settings for ILM",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤4 创建索引模板
PUT _index_template/my-index-template
{
"index_patterns": ["my-data-stream*"],
"data_stream": { },
"composed_of": [ "my-mappings", "my-settings" ],
"priority": 500,
"_meta": {
"description": "Template for my time series data",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# 步骤5 创建 data stream 并 写入数据测试
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
POST my-data-stream/_doc
{
"@timestamp": "2099-05-06T16:21:15.000Z",
"message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}
# 步骤6 查看data stream 后台索引信息
GET /_resolve/index/my-data-stream*