Detailed explanation of Elasticsearch memory allocation settings

The default installed memory of Elasticsearch is 1GB, which is too small for any real business. If you are using this default heap memory configuration, your cluster configuration may quickly break down. There are two ways to modify the heap memory of Elasticsearch (the memory is mentioned below). The easiest way is to specify the ES_HEAP_SIZE environment variable. The service process will read this variable at startup and set the heap size accordingly. The setting command is as follows:
 

export ES_HEAP_SIZE=10g

Alternatively, you can pass the memory size to the program at startup as a command-line argument:

./bin/elasticsearch -Xmx10g -Xms10g

Remarks: Ensure that the sizes of Xmx and Xms are the same. The purpose is to waste resources without the need to re-split the size of the heap area after the Java garbage collection mechanism cleans up the heap area, which can reduce the pressure caused by the size of the scalable heap.
Generally speaking, setting the ES_HEAP_SIZE environment variable is better than writing -Xmx10g -Xms10g directly.

Give half of the memory to Lucene


A common problem is to configure a large memory. Suppose you have a machine with 64G memory. According to normal thinking, you may think that it is better to give all 64G memory to Elasticsearch, but is this the reality, the bigger the better? Of course, memory is definitely important for Elasticsearch, for more in-memory data to provide faster operations, and there is also a big memory consumer - Lucene. Lucene is designed to cache data in the underlying OS into memory. The segments of Lucene are stored in a single file, and these files will not change, so it is very useful for caching. At the same time, the operating system will also cache these segment files for faster access. The performance of Lucene depends on the interaction with the OS. If you allocate all the memory to Elasticsearch and leave nothing to Lucene, your full-text search performance will be poor. The final standard recommendation is to give 50% of the memory to elasticsearch, and the remaining 50% will not be useless. Lucene will quickly swallow the remaining part of the memory for file caching.
 

 

 

 

Do not exceed 32G


There is another reason not to allocate large memory to Elasticsearch. In fact, the jvm will use a memory object pointer compression technology when the memory is less than 32G. In java, all objects are allocated on the heap and then there is a pointer to refer to it. The size of the pointer to these objects is usually the word size of the CPU, either 32bit or 64bit, depending on your processor, and the pointer points to the exact location of your value. For 32-bit systems, you can use up to 4G of memory. More memory can be used for 64 systems. But 64-bit pointers mean more waste, because your pointers themselves are bigger. Wasting memory doesn't count, what's worse is that larger pointers take up more bandwidth when moving data between main memory and caches (such as LLC, L1, etc.). Java uses a technique called memory pointer compression to solve this problem. Its pointer no longer represents the exact location of the object in memory, but an offset. This means that a 32-bit pointer can refer to 4 billion objects, not 4 billion bytes. Ultimately, that is to say, the heap memory grows to 32G of physical memory, which can also be represented by a 32bit pointer. Once you cross that magic 30-32G boundary, the pointer switches back to the normal object's pointer, each object's pointer grows longer, more CPU memory bandwidth is used, which means you effectively lose more memory. In fact, when the memory reaches 40-50GB, the effective memory is equivalent to 32G memory when using the memory object pointer compression technology. The meaning of this description is: even if you have enough memory, try not to exceed 32G, because it wastes memory, reduces CPU performance, and makes GC cope with large memory.
 

 

 

 

 

1TB RAM machine


32GB is a memory setting limit of ES, so what if your machine has a lot of memory? There is a general increase in machine memory these days, and you can now see machines with 300-500GB of memory.

First, we recommend coding to use such a mainframe
. Second, if you already have such a machine, you have two options:

  • Do you mainly do full text searches? Consider giving Elasticsearch 32G of memory, and leaving the rest to Lucene as the operating system's file system cache. All segments are cached, which will speed up full-text retrieval.

  • Do you need more sorting and aggregation? You want larger heap memory. Instead of deploying a node with 32+GB of memory, you can consider creating two or more ES nodes on one machine. Still stick to the 50% rule, assuming you have a machine with 128G of memory, you can create two nodes and use 32G of memory. That is to say, 64G memory is given to the heap memory of ES, and the remaining 64G is given to Lucene.

If you choose the second, you need to configure cluster.routing.allocation.same_shard.host:true. This will prevent the primary replica of the same shard from existing on the same physical machine (because if it exists on one machine, the high availability of the replica will not be available).

 

swapping is a performance grave


This is obvious, but it is necessary to make it clear that swapping memory to disk is fatal to server performance. Think about a memory operation must be fast. If memory is swapped to disk, a 100 microsecond operation may become 10 milliseconds, and think about the cumulative latency of those 10 microsecond operations. It's not hard to see how terrible swapping is for performance. Your best bet is to disable swapping entirely in your OS. This temporarily disables:
 
 

 

swapoff -a

In order to permanently disable it, you may need to modify the /etc/fstab file according to your operating system documentation. If you disable swap completely, it's not feasible for you. You can lower the swappiness value, which determines how often the operating system swaps memory. This prevents swapping from happening under normal circumstances. But still allow os to swap in case of emergency. For most Linux operating systems, this can be configured in sysctl:
 

 

vm.swappiness = 1

Note: setting swappiness to 1 is better than setting it to 0, because on some kernel versions, swappiness=0 can cause OOM (out of memory)

Simply put, this parameter defines the system's tendency to use swap. The default value is 60. The larger the value, the more inclined it is to use swap. It can be set to 0, which does not prohibit the use of swap, but minimizes the possibility of using swap.

The current settings of the parameters can be viewed through sysctl -q vm.swappiness.

The way to modify the parameters is to modify the /etc/sysctl.conf file, add vm.swappiness=xxx, and restart the system. This operation is equivalent to modifying the /proc/sys/vm/swappiness file in the virtual system and changing the value to the XXX value.

If you don't want to restart, you can dynamically load the /etc/sysctl.conf file through sysctl -p, but it is recommended to clear swap before doing so.


 
Finally, if none of the above methods can be done, you need to turn on the mlockall switch in the configuration file. Its function is to run the JVM to lock the memory and prevent the OS from swapping out. The configuration in elasticsearch.yml is as follows:

bootstrap.mlockall: true

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326290108&siteId=291194637