Refresh operation of the Flush Elasticsearch

Initial contact to these two concepts, it is estimated that they will feel no difference, it is to let the index after the index operation can be searched in real-time in nature, but they are still a little different.
Elasticsearch dependent on the underlying Lucene, Lucene here we introduce the segment, Reopen, commit.
Segment
in the ES, the memory cell is substantially Shard (slicing), but in a slightly different Lucene lower level, each Shard ES Lucene is an index (index), Lucene index segment composed of a plurality of, each segment is inverted index ES document, which contains some term (word) of the mapping (mapping).

Refresh operation of the Flush Elasticsearch

When each ES document creation, will write a new segment, so every time a new segment is written, so no need to modify the previous segment. When you delete a document, only segment where it belongs marked as deleted can be no real erased from the disk. Update is the same, only segment where marked tombstone, and then create a new segment in prior correspondence.
Reopen Lucene
Reopen in order to make the data can be searched, although this time the data can be searched, but does not necessarily guarantee that the data has been persisted to disk.
The Commit Lucene
the Commit is to make data persistence, every Commit, different segment of data will be persisted to disk, although this can make the data more secure, but each operation will consume system resources, there will be a large number of IO operations .
, Translog,
ES persisted when introducing a new way, translog (transaction log), after a document is indexed, it will be added to the memory buffer, and added to the translog.

Refresh operation of the Flush Elasticsearch

It is 的 Refresh

By default, ES will refresh every second, each operation will copy the contents of the memory buffer to the newly created segment to go, this step is operating in memory, this time the new document will be searched. That ES is near real-time search, almost 1s bell, in order to make the data can be searched.
Refresh operation of the Flush Elasticsearch

ES的Flush

Flush operation means that all documents be written in the memory buffer of a new Lucene Segment, that is all in the memory segment is submitted to the disk, and clear translog.

Refresh operation of the Flush Elasticsearch

General Flush interval will be relatively long, 30 minutes by default, or when translog reached a certain size, will trigger flush operation.

At last

In short, ES The refresh operation is to make the latest data can be searched immediately. The flush operation is to make the data persisted to disk, additional ES search is processed in memory, so Flush does not affect the data can be searched.
translog general during the flush when they were emptied, usually in the fsync and commit time to be persisted to disk, the default is translog after 6.x versions, each request will fsync to disk. But some index.translog configuration can be set

Guess you like

Origin blog.51cto.com/13981400/2402526