「Elasticsearch」- “Hot-Warm“ Architecture @20210222

brief introduction

When using Elasticsearch as a "big time" data analysis, it is recommended to use time as an index and set up three different types of nodes (main, hot, and warm), which is what we call the "Hot-Warm" architecture.

This article introduces some basic concepts of the "Hot-Warm" architecture, some issues to pay attention to, and how to build a cluster.

Node type

Each node has its own role, and the role of each node will be introduced below.

# Master Nodes (Master Nodes)

Role: The dedicated Master node is responsible for handling the management and status of the cluster to provide cluster stability.

Configuration: In the cluster, three dedicated Master nodes are set up to provide maximum flexibility. In the dedicated Master node, the data participating in the search will not be saved, and document indexing will not be performed. Therefore, there will not be a long GC pause, and the requirements for resources such as CPU, RAM, and DISK are also lower than other data nodes.

To configure the discovery.zen.minimum_master_nodes=2 option to prevent cluster split brain.

# Hot Nodes

Function: A dedicated data node, which replicates the index of all data in the cluster. They also have the latest indexes, and these indexes are usually the most frequently queried.

Configuration: Since indexing is a CPU and IO-intensive operation, these servers require high configuration and need to be connected to SSD storage. It is recommended to run at least three hot nodes to achieve high availability. Depending on the amount of data required, it may be necessary to increase the number to achieve certain performance goals.

# Warm Nodes

Role: This type of node is designed to handle a large number of read-only indexes that are not frequently queried. Since these indexes are read-only, nodes tend to use large-capacity disks, just mechanical disks.

Configuration: As with hot nodes, it is recommended to use at least 3 warm nodes to achieve high availability. It should be noted that, as before, if the amount of data is large, additional nodes may be required to meet the performance requirements. In addition, the CPU and memory configuration usually needs to be the same as that of the hot node. The specific configuration can only be determined by simulating the throughput in the production environment.

Construction overview

# Node distinction

First, there must be a way to distinguish the three types of nodes to tell Elasticsearch to allocate indexes on that node. This can be done by setting properties for the node:

# in elasticsearch.yml
node.attr.box_type: hot

# in elasticsearch.yml
node.attr.box_type: warm

The box_type is a custom attribute, you can use your favorite name.

Therefore, if you want an index to be assigned to a hot node, you can specify a label when creating the index:

PUT /logs_2016-12-26
{
  "settings": {
    "index.routing.allocation.require.box_type": "hot"
  }
}

# Index configuration

If you use logstash or beats to manage the index template, you should update the index template to include index allocation filtering. Use index.routing.allocation.require.box_type: The hot setting will cause a new index to be created on the hot node. In the configuration file template, you can set:

{
  "template" : "indexname-*",
  # "template" : "*",
  "version" : 50001,
  "settings" : {
             "index.routing.allocation.require.box_type": "hot"
 ...

# Index migration

After a period of time, when the usage rate of the index drops and you need to migrate to the warm node, you can modify the label:

PUT /logs_2016-12-26/_settings
{
  "settings": {
    "index.routing.allocation.require.box_type": "warm"
  }
}

After the setting is complete, Elasticsearch will automatically migrate the data to the warm node.

# Index compression

You can also enable better compression on all hot data nodes by setting index.codec: best_compression option in elasticsearch.yml .

When the data moves to the warm node, we can call _forcemerge to merge the segments: by reducing the segments not only can save memory, disk space, file handles, but also use the new best_compression codec to rewrite the index.

When indexes are allocated on hot nodes, do not force merge indexes, because the optimization process will use the I/O on these nodes and affect the index speed of today's log. However, there are not many operations on the warm node, so it is safe to force the merge index.

# Index automatic migration

The above process is done manually, we can't always stare at it. You can use " Curator " to automatically migrate expired indexes.

! ! ! You can also use " Index Lifecycle Management " to manage, but at least the " Easticsearch 6.6 " version. Although it belongs to the X-Pack package, the function has been opened up. But since my environment is "Elasticsearch 6.3" version, so I use " Curator " for index management.

Cluster deployment

- "Elasticsearch"-Hot warm cluster deployment

related articles

"Elasticsearch"-Cluster Construction (Elasticsearch 6.8.6)

references

“Hot-Warm” Architecture in Elasticsearch 5.x
Sizing Hot-Warm Architectures for Logging and Metrics in the Elasticsearch Service on Elastic Cloud
Implementing a Hot-Warm-Cold Architecture with Index Lifecycle Management
What's New in Elastic Stack 6.6.0?
Elasticsearch version 6.6.0

Guess you like

Origin blog.csdn.net/u013670453/article/details/113958271