Some problems encountered with elasticsearch and solutions

1. The node leaves the cluster caused by gc

     because the jvm stops working during gc. If the gc time of a node is too long, the master pings 3 times (the zen discovery defaults to ping failure and retry 3 times), and then the node will be removed from the cluster. , causing the index to be reallocated.
Solution:
(1) Optimize gc to reduce gc time. (2) Increase the number of retries (es parameter: ping_retries) and timeout (es parameter: ping_timeout) of zen discovery. Later, it was found that the root cause was that the hard disk where the system of one node was located was full. cause system performance to degrade.

2. Out of memory error
     Because by default es has unlimited size of Field Data Cache, the field value will be put into memory when querying, especially facet query, which has very high memory requirements, it will put the field value in memory. The results are placed in the memory, and then sorted and other operations are performed, and the memory is used until the memory is used up. When the memory is not enough, an out of memory error may occur.
Solution:
(1) Set the cache type of es to Soft Reference, its main feature is that it has a strong reference function. Such memory is only reclaimed when there is not enough memory, so they are usually not reclaimed when there is enough memory. In addition, these reference objects are also guaranteed to be set to null before Java throws an OutOfMemory exception. It can be used to implement the cache of some commonly used pictures, realize the function of Cache, and ensure the maximum use of memory without causing OutOfMemory. Add index.cache.field.type: soft to the es configuration file.
(2) Set the maximum number of cached data and cache expiration time in es, set the maximum value of the cache field to 50000 by setting index.cache.field.max_size: 50000, and set the expiration time by setting index.cache.field.expire: 10m Set to 10 minutes.

3. Failed to create native thread problem
es recovery error: RecoverFilesRecoveryException[[index][3] Failed to transfer [215] files with total size of [9.4gb]]; nested: OutOfMemoryError[unable to create new native thread]; ] ]
At first I thought it was the limit of the number of file handles, but I thought that the error of too many open files was reported before, and the data was also enlarged. According to the data, the maximum number of threads of the jvm process of a process is: virtual memory/(stack size*1024*1024), that is to say, the larger the virtual memory or the smaller the stack, the more threads can be created. After resetting, the error will still be reported. It stands to reason that the number of threads that can be created is completely sufficient, so I wonder if it is some system limitation. Later, I found the problem of max user processes on the Internet. The default value is 1024. This parameter is the maximum number of processes opened by the user, but according to the official description, the user can create the maximum number of threads, because a process has at least one threads, so indirectly affects the maximum number of processes. After increasing this parameter, this error is not reported.
Solution:
(1) Increase the heap memory of the jvm or reduce the stack size of the xss (the default is 512K).
(2) Open /etc/security/limits.conf and change the 1024 in the line of soft nproc 1024 to a larger size.

4. When the cluster status is yellow, an error is reported when concurrently inserting data
[7]: index [index], type [index], id [1569133], message [UnavailableShardsException[[index][1] [4] shardIt, [2] active : Timeout waiting for [1m], request: org. elasticsearch.action.bulk.BulkShardRequest@5989fa07]]
This is the error message, when the cluster status is yellow, that is, the replica is not allocated. At that time, the replica is set to 2, and there is only one node. When the replica you set is larger than the machine that can be allocated, if you insert data at this time, you may report the above error, because the write consistency of es uses quorum by default, that is, the quorum value It must be greater than (number of copies/2+1), I am here 2/2+1=2, that is to say, it must be inserted into at least two indexes. Since there is only one node, quorum is equal to 1, so only the main index is inserted, and the copy Can't find it and report the above error.
Solution: (1) Remove the unallocated copy. (2) Change the write consistency to one, that is, write only one index.

5. Set the startup warning when the jvm locks the memory
When setting bootstrap.mlockall: true, start the es and report the warning Unknown mlockall error 0, because the Linux system defaults to 45k of memory that the process can lock.
Solution: set to unlimited, linux command: ulimit -l unlimited

6. The wrong use of api causes the cluster to freeze
In fact, this is a very low-level error. The function is to update some data, and some data may be deleted, but when deleting, colleagues use the deleteByQuery interface, and pass in the id of the data to be deleted by constructing a BoolQuery to find out these data deletions. But the problem is that BoolQuery only supports up to 1024 conditions, and 100 conditions are already a lot, so such a query will suddenly freeze the es cluster.
Solution: Use bulkRequest for bulk deletion.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326967409&siteId=291194637