9 Tips for Configuring a High-Performance Elasticsearch Cluster

Many core functions underlying the Loggly service use Elasticsearch as a search engine. As Jon Gifford stated in his recent article on "Elasticsearch vs Solr", log management creates some demanding requirements in search technology, and in order to meet these requirements, it must be able to :

Reliable near real-time indexing on very large datasets - in our case over 100,000 log events per second

At the same time, handle extremely high volumes of search requests reliably and efficiently on the index

At the time, we were building a Gen2 log management service and wanted to ensure that all the Elasticsearch configuration information used could get the best indexing and search performance. Tragically, we found it very difficult to find such information in the Elasticsearch documentation, because it's not just in one place. This article summarizes our learnings and serves as a reference checklist of configuration properties for optimizing ES in your own applications.


Tip 1: Plan for indexing, sharding, and cluster growth

ES makes it very easy to create a large number of indexes and a very large number of shards, but it is more important to understand that each index and shard is an overhead. If you have too many indexes or shards, the management load alone will affect the performance of the ES cluster and potentially also the availability aspect. Here we focus on managing the load, but running a large number of indexes/shards can still significantly impact indexing and retrieval performance.

We found that the biggest factor affecting the management load was the size of the cluster state data, as it contained all the mapping data for each index in the cluster. At one point we had a single cluster with over 900MB of cluster state data. The cluster is running but not available.

Let's run through some data to understand what's going on. . . . . .

Suppose there is an index containing 50k mapping data (we had 700 fields at the time). If you generate an index every hour, that will add 24 x 50k of cluster state data per day, or 1.2MB. If you need to keep a year's worth of data in the system, then the cluster state data will be up to 438MB (and 8670 indexes, 43800 shards). If you compare it with one index per day (18.25MB, 365 indexes, 1825 shards), you will see that the hourly indexing strategy will be a completely different situation.

Fortunately, these predictions are actually pretty easy to make once you have some real data in the system. We should be able to see how much state data and how many indexes/shards the cluster has to handle. You should really do a rehearsal before going to production in order to prevent getting a call alert that the cluster hangs up at 3:00 in the morning.

In terms of configuration, we have full control over how many indexes (and how many shards) we have in the system, which will keep us out of the danger zone.


Tip 2: Know the cluster topology before configuration

Loggly runs ES through separate master and data nodes. I won't discuss too many deployment details here (look out for a follow-up blog post), but in order to make the right configuration choices, you need to first determine the topology of your deployment.

Additionally, we use separate ES client nodes for indexing and searching. This will take some of the load off the data nodes, and more importantly, so that our pipeline can communicate with local clients and thus with the rest of the cluster.

ES data nodes and master nodes can be created by setting the values ​​of the following two properties to true or false:

Master node: node.master:true node.data:false

Data node: node.master:false node.data:true

Client node: node.master:false node.data:false

The above is the relatively easy part, now let's look at some ES advanced properties worth paying attention to. The default settings are sufficient for most deployment scenarios, but if your ES usage is as difficult as we encountered with log management, you will benefit greatly from the advice below.


Tip 3: Memory Settings

Linux divides its physical RAM into blocks of memory called paging. Swapping is the process of freeing memory pages by copying them to a pre-set hard disk space called swap. The combined size of physical memory and swap area is the available amount of virtual memory.

The disadvantage of memory swapping is that hard disks are very slow compared to memory. The read and write speed of memory is measured in nanoseconds, while the hard disk is measured in milliseconds, so accessing hard disks is tens of thousands of times slower than accessing memory. The more swaps, the slower the process, so memory swapping should be avoided at all costs.

ES's mlockall attribute allows ES nodes to not swap memory. (Note that only Linux/Unix systems can be set.) This property can be set in the yaml file:

bootstrap.mlockall: true

In version 5.x, it has been changed to bootstrap.memory_lock: true.

mlockall is set to false by default, that is, the ES node allows memory swapping. Once this value is added to the properties file, the ES node needs to be restarted for it to take effect. You can determine if the value is set correctly by:

curl http://localhost:9200/_nodes/process?pretty

If you are setting this property, use the -DXmx option or the ES_HEAP_SIZE property to ensure that enough memory is allocated for the ES node.


Tip 4: The discovery.zen property controls ElasticSearch's discovery protocol

Elasticsearch uses service discovery (Zen discovery) by default as the mechanism for discovery and communication between cluster nodes. Azure, EC2, and GCE also use other discovery mechanisms. Service discovery is controlled by a series of properties starting with discovery.zen.*.

Both unicast and multicast are supported in versions 0.x and 1.x, and the default is multicast. So to use unicast in these versions of ES, you need to set the property discovery.zen.ping.multicast.enabled to false.

From 2.0 onwards, service discovery only supports unicast.

First you need to specify a set of communication hosts using the property discovery.zen.ping.unicast.hosts. For convenience, set the same value for this property on all hosts in the cluster, using the names of the cluster nodes to define the list of hosts.

The property discovery.zen.minimum_master_nodes determines the minimum number of nodes that are eligible to be a master, i.e. a node that should "see" cluster-wide operations. If there are more than 2 nodes in the cluster, it is recommended to set this value to greater than 1. One way to calculate this is, assuming the number of nodes in the cluster is N, then this property should be set to N/2+1.

Data and master nodes probe each other in two different ways:

Ping other nodes in the cluster from the master node to verify that they are running Ping the master node from other nodes in the cluster to verify that they are running or if an election process needs to be initiated

The node detection process is controlled by the discover.zen.fd.ping_timeout property. The default value is 30s, which determines how long the node will wait for a response before timing out. When running a slow or congested network, this property should be adjusted; if on a slow network, increase this property; the larger the value, the less chance of a probe failure.

Loggly's discovery.zen related properties are configured as follows:

discovery.zen.fd.ping_timeout: 30s

discovery.zen.minimum_master_nodes: 2

discovery.zen.ping.unicast.hosts: [“esmaster01″,”esmaster02″,”esmaster03″]

The above property configuration means that the node probe will happen within 30 seconds because the discovery.zen.fd.ping_timeout property is set. Also, other nodes should detect at least two master nodes (we have 3 masters). Our unicast hosts are esmaster01, esmaster02, esmaster03.


Tip 5: Beware of DELETE _all

One thing that must be understood is that ES's DELETE API allows users to delete indexes with just one request, supports the use of wildcards, and even uses _all as the index name to represent all indexes. E.g:

curl -XDELETE ‘http://localhost:9200/*/’

This feature is very useful, but also very dangerous, especially in a production environment. In all our clusters it has been disabled by setting action.destructive_requires_name:true.

This configuration was referenced in version 1.0 and replaced the configuration property disable_delete_all_indices used in version 0.90.


Tip 6: Use Doc Values

The Doc Values ​​feature is enabled by default in version 2.0 and above, but must be explicitly set in earlier ES versions. When doing large-scale sorting and aggregation operations, Doc Values ​​have obvious advantages over ordinary properties. Essentially, ES is transformed into a columnar store, so that many of ES's analytical features far exceed expectations in performance.

To find out, we can compare Doc Values ​​and normal properties in ES.

When a normal attribute is used to sort or aggregate, the attribute is loaded into the attribute data cache. When a property is first cached, ES must allocate a heap large enough to hold each value, and then fill it incrementally with the value of each document. This process may take some time as their values ​​may need to be read from disk. Once this process is complete, any related operations on this data will use this cached data, and will be very fast. If you try to fill too many properties into the cache, some properties will be reclaimed, and subsequent use of those properties will force them to be reloaded into the cache, with the same startup overhead. To be more efficient, one would think of minimization or elimination, which means that our number of attributes will be limited by the size of the cache in this way.

In contrast, Doc Values ​​properties use a hard-disk-based data structure and can be memory-mapped into process space, thus not affecting heap usage, while providing essentially the same performance as property data caching. There is still a small startup overhead when these properties are first read from disk, but this is handled by the OS cache, so only really needed data is actually read.

Doc Values ​​thus minimize heap usage (due to garbage collection) and take advantage of the operating system's file cache, which further minimizes the pressure on disk reads.


Tip 7: Elasticsearch Quota Class Property Setting Guidelines

Shard allocation is the process of allocating shards to nodes, which may occur during initialization recovery, replica allocation, or cluster rebalancing, or even when processing nodes join or exit.

The property cluster.routing.allocation.cluster_concurrent_rebalance determines the number of shards allowed for concurrent rebalancing. This property needs to be properly configured according to the hardware usage, such as the number of CPUs, IO load, etc. If this property is set improperly, it will affect the indexing performance of ES.

cluster.routing.allocation.cluster_concurrent_rebalance:2

The default value is 2, which means that only 2 shards are allowed to move at the same time at any time. It is best to set this property small to suppress shard rebalancing so that it does not affect indexes.

Another shard allocation related property is cluster.routing.allocation.disk.threshold_enabled. If this property device is true (the default), the available disk space will be counted towards the quota when allocating shards to a node. Turning off this property will cause ES to allocate shards to a node with insufficient disk free space, thus affecting the growth of shards.

When turned on, shard allocation adds two threshold properties to the quota: low water mark and high water mark.

The low water mark defines the percentage of disk usage that ES will no longer allocate new shards to this node. (Default is 85%) The high water mark defines the percentage of disk usage that the allocation will start migrating away shards from this node. (default is 90%)

Both properties can be defined as a percentage of disk usage (eg "80%" means 80% of the disk space is used, or 20% unused), or the minimum free space size (eg "20GB" means the The node also has 20GB of free space).

If there are many small shards, the default value is very conservative. For example, if you have a 1TB hard drive and the shards are typically 10GB in size, then theoretically 100 shards can be allocated on that node. In the case of default settings, only 80 shards can be allocated to the node, after which ES considers the node to be full.

To get the right configuration parameters, you should look at how large shards will end up in their lifetimes, and then work backwards from there, making sure to include a safety factor. In the above example, only 5 shards are written, so 50GB of free space needs to be guaranteed at all times. For a 1TB hard drive, this situation becomes the 95% low water mark, with no margin of safety. Additional, like a 50% safety factor, means there should be a guaranteed 75GB of free space, or a 92.5% low water mark.


Tip 8: Recovery property allows fast restart

ES has many recovery-related properties that can improve the speed of cluster recovery and restart. The best property settings depend on the hardware currently in use (hard disk and network are the most common bottlenecks), and the best advice we can give is to test, test, or test.

To control how many shards can be restored simultaneously on a single node, use:

   cluster.routing.allocation.node_concurrent_recoveries

Restoring a shard is a very IO-intensive operation, so this value should be adjusted carefully. In version 5.x, this property is divided into two:

   cluster.routing.allocation.node_concurrent_incoming_recoveries

   cluster.routing.allocation.node_concurrent_outgoing_recoveries

To control the number of concurrently initialized primary shards on a single node, use:

   cluster.routing.allocation.node_initial_primaries_recoveries

To control the number of parallel streams opened when restoring a shard, use:

   indices.recovery.concurrent_streams

Closely related to the number of streams is the total available network bandwidth for recovery:

   indices.recovery.max_bytes_per_sec

In addition to all these properties, the optimal configuration will depend on the hardware used. If you have SSD hard drives and a 10 Gigabit fiber optic network, then the optimal configuration will be completely different from using ordinary disks and Gigabit network cards.

All the above properties will take effect after the cluster restarts.


Tip 9: Thread Pool Properties Prevent Data Loss

Elasticsearch nodes have many thread pools to improve the efficiency of thread management in a node.

At Loggly, the bulk operation mode is used for indexing, and we found that setting the correct size for the thread pool for bulk operations via the threadpool.bulk.queue_size property is extremely critical to prevent data loss that may be caused by bulk retries.

   threadpool.bulk.queue_size: 5000

This tells ES the number of shard requests that can be queued for execution on this node when there are no threads available to execute a batch request. This value should be set according to the load of the batch request. If the number of batch requests is larger than the queue size, you will get a RemoteTransportException exception shown below.

As mentioned above, a shard contains a queue of batch operations, so this number needs to be greater than the number of concurrent batch requests you want to send multiplied by the number of shards for those requests. For example, a single batch request may contain data for 10 shards, so even if only one batch request is sent, the queue size must be at least 10. Setting this value too high will eat up a lot of JVM heap space (and indicate that more data is being pushed that the cluster cannot easily index), but it does shift some of the queuing to ES, simplifying the client.

Both keep the property value above an acceptable load and handle RemoteTransportException exceptions from client code smoothly. If the exception is not handled, data will be lost. We simulated using a queue of size 10 to send batch requests larger than 10 and got the exception shown below.

RemoteTransportException[[<Bantam>][inet[/192.168.76.1:9300]][bulk/shard]]; nested: 
EsRejectedExecutionException[rejected execution (queue capacity 10) on 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13fe9be];

Another tip for users before version 2.0: Minimize the Mapping refresh time

If you are still using ES before version 2.0 and frequently update the attribute mapping, you may find that the task waiting queue of the cluster has a large number of refresh_mappings requests. On its own, this isn't bad, but there can be a snowball effect that can severely impact cluster performance.

If you do encounter this situation, ES provides a configurable parameter to help deal with it. This parameter can be used as follows:

indices.cluster.send_refresh_mapping: false

So, what does this mean, and why does it work?

When a new attribute appears in the index, the data node that added the attribute updates its own mapping, and then sends the new mapping to the master node. If the new mapping is still in the queue of pending tasks on the master node, and the master node publishes its own next cluster state, then the data node will receive an outdated old version of the mapping. Usually this will cause it to send a request to update the mapping to the master, because the master has always had the wrong mapping information until it is related to the data node. This is a poor default behavior - the node should do something to ensure that the master has the correct mapping, and resending a new mapping is a good option.

However, when there are a lot of mapping updates happening, and the master node cannot persist, there will be a stampeding horde effect, and the refresh messages sent by the data nodes to the master node may flood.

The parameter indices.cluster.send_refresh_mapping disables the default behavior, so eliminating these refresh_mapping requests sent from the data nodes to the master keeps the master up to date. Even without a refresh request, the master will eventually see the original mapping change and will publish a cluster state update that includes that change.


Summary: Elasticsearch's configurable properties are the key to its resiliency

Elasticsearch's deeply configurable properties are a huge advantage for Loggly, as the parameter power of Elasticsearch has been maximized (and sometimes more) in our use case. If the default ES configuration works well enough at the current stage of your own application evolution, rest assured that you will have a lot of room for optimization as your application evolves.

Translator introduction : Yang Zhentao (Gentle Yang), search engine architect. Now working in vivo mobile Internet, responsible for the architecture design, development and implementation of search engine related products and systems. Currently focusing on the design and engineering implementation of Internet system architecture, especially real-time distributed systems, and focusing on the storage, retrieval and visualization of big data. He has participated in entrepreneurship before, and has been responsible for the development of maternal and child B2C, mobile IM, smart watches and other products; before that, he worked at BGI, engaged in scientific research in the field of genomics, focusing on the storage, retrieval and visualization of genomic data. Reviewed the book "Circos Data Visualization How-to", and participated in the collaborative translation of the book "Elasticsearch Definitive Guide", which is to be published.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326301195&siteId=291194637