Official Recommendations for ElasticSearch Performance Optimization

ES Manual

How to improve the performance of ES

Do not return large result sets

ES is designed as a search engine, and is only good at returning documents that match fewer queries. If you need to return a lot of documents, you need to use Scroll.

avoid sparse

Because ES is based on Lucene to index and store data, it is more efficient for dense data. Lucene can effectively determine the document is by an integer document id, regardless of whether there is data or not, the session charges a byte to store the id. Sparse mainly affects norms and doc_values, some recommendations to avoid sparse:

Avoid putting irrelevant data in the same index

Canonical document structure

Use the same field names to store the same data.

avoid type

Do not use norms and doc_values ​​in sparse fields

Adjust indexing speed

use bulk request

And each request does not exceed dozens of M, because too large will lead to excessive memory usage

Send data to ES using multiple workers/threads

Multi-process or thread, if you see the TOO_MANY_REQUESTS (429)sum EsRejectedExecutionException, it means that the ES cannot keep up with the speed of the index. When the I/O or CPU of the cluster is saturated, the number of workers is obtained.

Increase refresh interval

index.refresh_intervalThe default is 1s, which can be changed to 30s to reduce merging pressure.

When loading a large amount of data, refresh and replicas can be temporarily omitted

index.refresh_interval to -1 and index.number_of_replicas to 0

disable swapping

disable swapping

Allocate memory for file cache

The cache is used to cache I/O operations, at least with general memory to run the ES file cache.

Use faster hardware

  • Use SSD as storage device.
  • Use local storage, avoid NFS or SMB
  • Beware of using virtual storage such as Amazon's EBS

index buffer size

indices.memory.index_buffer_sizeUsually 0.1 of the JVM, make sure it is enough to handle indexes up to 512MB.

Adjust search speed

Cache large memory for the file system

Give at least half of the available memory to the filesystem cache.

Use faster hardware

  • Use SSD as storage device.
  • Use a CPU with better performance, high concurrency
  • Use local storage, avoid NFS or SMB
  • Beware of using virtual storage such as Amazon's EBS

document modeling

Avoid chaining, nesting makes queries several times slower, and in-person relations make queries hundreds of times slower, so if the same question can be answered non-canonically without chaining it can be faster.

Pre-indexed data

Unconsciously

map

Numeric data does not have to be mapped to integer or long integer

avoid scripts

如果实在要使用,就用painless和expressions

强势合并只读索引

https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-forcemerge.html
不要强势合并正在写的索引

准备全局顺序

准备文件系统缓存

index.store.preload,如果内存不是很大会使搜索变得缓慢。

调整磁盘使用

禁用不需要的功能

  • 不需要过滤时可以禁用索引“index”:false
  • 如果你不需要text字段的score,可以禁用”norms”:false
  • 如果不需要短语查询可以不索引positions"indexe_options":"freqs"

不用默认的动态字符串匹配

不要使用_all

使用best_compression

使用最小的足够用的数值类型

byte,short,integer,long
half_float,float,double

https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-create-index.html#mappings
https://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules.html#dynamic-index-settings
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-scroll.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325609468&siteId=291194637