Elasticsearch: What is the near real time of ES? This article will take you to understand the principle of ES's translog refresh flush


This article mainly introduces Elasticsearch's index working mechanism, and explores the reasons for its near real-time query. Explore how it uses translog to ensure data security, and how we optimize the parameters of translog in a production environment to maximize performance.

It will mainly introduce two common operations in elastic: refresh and flush, and how these two interfaces ensure that data can be retrieved.

1 WAL translog data persistence

1.1 Data fsync placement

When we write data to the disk, we usually write the data to the virtual file system of the operating system first, also in the memory, and then we need to call fsync to flush the data in the virtual file system to the disk. When the system is powered off, it will cause data loss. I believe everyone knows this principle.

1.2 ES Write Ahead Log

Elasticsearch must persist all changes to disk for high reliability.

The bottom layer of elastic uses the lucene library to realize the function of inverted index. In the concept of lucene, each record is called document (document), lucene uses segment to store data, and commit point to record all segments. Metadata.

For a record to be searched, it must be written into the segment. This is very important. I will introduce why elastic search is near-realtime instead of real-time.

Elastic uses translog to record all operations, we call it write-ahead-log, when we add a new record, es will write the data to translog and in-memory buffer (memory buffer), as shown in the figure below :

Insert picture description here

1.3 Translog sequential write

Translog is real-time fsync, and it also writes es data. The corresponding translog content is written to the disk in real time, and it is in the way of sequential append files, so the performance of writing to the disk is very high.

As long as the data is written into the translog, it can be guaranteed that the original information has been placed on the disk, which further ensures the reliability of the data.

The memory buffer area and translog are the key to near-realtime. We mentioned earlier that the new index must be written to the segment before it can be searched. Therefore, after we write the data to the memory buffer, it cannot be searched. If If you want the document to be searched immediately, you need to manually call the refresh operation.

2 The refresh operation forms a new segment, and writes it to the OS virtual file system, while opening the new segment can be queried

By default, es executes refresh every one second. You can modify this refresh interval through the parameter index.refresh_interval. What do you do when you execute the refresh operation?

All the documents in the memory buffer are written to a new segment, but no fsync is called, so the data in the memory may be lost. The
segment is opened so that the documents inside can be searched . The state after the refresh is executed in the
memory buffer
. As shown in the figure below:
(Note: translog is on disk)
Insert picture description here

2.1 Reasons for near real time

Because the data in the memory buffer is written into the segment file, and the segment file is written into the virtual file system, the new segment file can be opened and retrieved.

Therefore, the data in the memory buffer cannot be retrieved before the new segment file is formed.

The refresh operation is executed once in 1s by default, and the newly inserted doc can be retrieved only after 1s by default. This is the reason for near real time.

2.2 refresh actual combat

The overhead of refresh is relatively large. I tested the refresh time of 10W records in my own environment and it took about 14ms. Therefore, when building indexes in batches, I can set the refresh interval to -1 to temporarily close refresh, and wait until the indexes are submitted. Turn on refresh, you can modify this parameter through the following interface:

curl -XPUT 'localhost:9200/test/_settings' -d '{ "index" : { "refresh_interval" : "-1" }}'

In addition, when you are doing batch indexing, you can consider setting the number of copies to 0, because when the document is copied from the primary shard to the replica shard, the same analysis and indexing should be performed on the secondary shard. And the merge process, this overhead is relatively large, you can open the copy after the index is built, so that you only need to copy the data from the primary shard to the secondary shard:

 curl -XPUT 'localhost:9200/my_index/_settings' -d ' { "index" : { "number_of_replicas" : 0 }}'

After executing the batch index, change the refresh interval back:

 curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index" : { "refresh_interval" : "1s" } }'

You can also force a refresh and merge of index segments:

 curl -XPOST 'localhost:9200/my_index/_refresh'curl -XPOST 'localhost:9200/my_index/_forcemerge?max_num_segments=5'

3 Flush operation, clear translog, fsync flash disk of segment files in virtual file system

As the translog file becomes larger and larger, it is necessary to consider flushing the data in the memory to the disk. This process is called flush. The flush process mainly does the following operations:

  1. Write all the documents in the memory buffer to a new segment
  2. Empty the memory buffer
  3. Write commit point information to disk
  4. Page cache (segments) of the virtual file system fsync to disk
  5. Delete the old translog file, so at this time the segments in the memory have been written to the disk, there is no need for translog to ensure data security

The status after flush is as follows:
Insert picture description here

es has several conditions to determine whether to flush to the disk. Different versions of es parameters are different. You can refer to the corresponding version of es document to view these parameters: es translog, here is the flush parameter of version 1.7:

index.translog.flush_threshold_ops,执行多少次操作后执行一次flush,默认无限制
index.translog.flush_threshold_size,translog的大小超过这个参数后flush,默认512mb
index.translog.flush_threshold_period,多长时间强制flush一次,默认30m
index.translog.interval,es多久去检测一次translog是否满足flush条件

The above parameter is how often es performs the flush operation. During the system recovery process, es will compare the data in translog and segments to ensure data integrity. For data security, es will refresh translog (fsync) every 5 seconds by default. In the disk, that is to say, when the system is powered off, es will lose up to 5 seconds of data. If you are more sensitive to data security, you can reduce this interval or change it to fsync translog to disk after each request. But it will take up more resources; this interval is controlled by the following two parameters:

index.translog.sync_interval 控制translog多久fsync到磁盘,最小为100ms
index.translog.durability translog是每5秒钟刷新一次还是每次请求都fsync,这个参数有2个取值:request(每次请求都执行fsync,es要等translog fsync到磁盘后才会返回成功)async(默认值,translog每隔5秒钟fsync一次)

Readers need to figure out the difference between flush and fsync. Flush is to flush data in memory (including translog and segments) to disk, while fsync is just a disk that flushes translog (to ensure that data is not lost).

Guess you like

Origin blog.csdn.net/hugo_lei/article/details/106519069