ES Manual
How to improve the performance of ES
Do not return large result sets
ES is designed as a search engine, and is only good at returning documents that match fewer queries. If you need to return a lot of documents, you need to use Scroll.
avoid sparse
Because ES is based on Lucene to index and store data, it is more efficient for dense data. Lucene can effectively determine the document is by an integer document id, regardless of whether there is data or not, the session charges a byte to store the id. Sparse mainly affects norms and doc_values, some recommendations to avoid sparse:
Avoid putting irrelevant data in the same index
Canonical document structure
Use the same field names to store the same data.
avoid type
Do not use norms and doc_values in sparse fields
Adjust indexing speed
use bulk request
And each request does not exceed dozens of M, because too large will lead to excessive memory usage
Send data to ES using multiple workers/threads
Multi-process or thread, if you see the TOO_MANY_REQUESTS (429)
sum EsRejectedExecutionException
, it means that the ES cannot keep up with the speed of the index. When the I/O or CPU of the cluster is saturated, the number of workers is obtained.
Increase refresh interval
index.refresh_interval
The default is 1s, which can be changed to 30s to reduce merging pressure.
When loading a large amount of data, refresh and replicas can be temporarily omitted
index.refresh_interval to -1 and index.number_of_replicas to 0
disable swapping
Allocate memory for file cache
The cache is used to cache I/O operations, at least with general memory to run the ES file cache.
Use faster hardware
- Use SSD as storage device.
- Use local storage, avoid NFS or SMB
- Beware of using virtual storage such as Amazon's EBS
index buffer size
indices.memory.index_buffer_size
Usually 0.1 of the JVM, make sure it is enough to handle indexes up to 512MB.
Adjust search speed
Cache large memory for the file system
Give at least half of the available memory to the filesystem cache.
Use faster hardware
- Use SSD as storage device.
- Use a CPU with better performance, high concurrency
- Use local storage, avoid NFS or SMB
- Beware of using virtual storage such as Amazon's EBS
document modeling
Avoid chaining, nesting makes queries several times slower, and in-person relations make queries hundreds of times slower, so if the same question can be answered non-canonically without chaining it can be faster.
Pre-indexed data
Unconsciously
map
Numeric data does not have to be mapped to integer or long integer
avoid scripts
如果实在要使用,就用painless和expressions
强势合并只读索引
https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-forcemerge.html
不要强势合并正在写的索引
准备全局顺序
准备文件系统缓存
index.store.preload
,如果内存不是很大会使搜索变得缓慢。
调整磁盘使用
禁用不需要的功能
- 不需要过滤时可以禁用索引
“index”:false
- 如果你不需要text字段的score,可以禁用
”norms”:false
- 如果不需要短语查询可以不索引positions
"indexe_options":"freqs"
不用默认的动态字符串匹配
不要使用_all
使用best_compression
使用最小的足够用的数值类型
byte,short,integer,long
half_float,float,double
https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-create-index.html#mappings
https://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules.html#dynamic-index-settings
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-scroll.html