Why is Elasticsearch search said to be near real-time?


Through the introduction of the previous two articles, we probably already know the
process of Elasticsearch processing data. There is also a layer of system cache called FileSystem Cache between Elasticsearch and disk. It is because of the existence of this layer of cache that es can have Faster search responsiveness.


We all know that an index is composed of several segments. With the continuous growth of each segment, after we index a piece of data, it may take a minute-level delay to be searched. Why is there such a large delay? The bottleneck here Mainly on disk.

Persistence of a segment requires an fsync operation to ensure that the segment can be physically written to disk to truly avoid data loss, but the fsync operation is time-consuming, so it cannot be performed once after each indexed piece of data. If so, indexing and searching The delay will be very large.

Therefore, a more lightweight processing method is needed here to ensure that the search delay is smaller. This requires the use of the FileSystem Cache mentioned above, so the newly added document in es will be collected into the indexing buffer area and rewritten into a segment and then directly written into the filesystem cache. This operation is very lightweight. It is relatively time-consuming, and it will be flushed to the disk after a certain interval or external trigger. This operation is very time-consuming. But as long as the segment file is written to the cache, the segment can be opened and queried, so as to ensure that it can be found in a short time without performing a full commit or fsync operation, which is a very lightweight processing method. And it can be executed at high frequency without destroying the performance of es.

As shown below:








In elasticsearch, this lightweight operation of writing and opening a segment in a cache is called refresh. By default, each shard in the es cluster will be automatically refreshed every 1 second, which is why we say es is A near real-time search engine is not real-time, that is to say, after inserting a piece of data into the index, we need to wait for 1 second before the data can be searched. This is a balanced setting method of es for writing and querying. This setting not only It improves the index writing efficiency of es and also enables es to retrieve data in near real time.


The usage of refresh is as follows:

````
POST /_refresh //Refresh all indexes
POST /blogs/_refresh //Refresh the specified index
````


The refresh operation is very lightweight compared to the commit operation, but it still consumes a certain amount of performance. Therefore, it is not recommended to execute the refresh command after each insert of data. The default delay of 1 second is fine for most scenarios. accept.


Of course, not all business scenarios need to be refreshed every second. If you need to index a large amount of data in a short period of time, in order to optimize the writing speed of the index, we can set a larger refresh interval to improve the writing performance. Command as follows:

````
PUT /my_logs
{
  "settings": {
    "refresh_interval": "30s"
  }
}
````


The above parameters can be dynamically set to an existing index at any time. If we are inserting a large index, we can completely close the refresh mechanism first, and then reopen it after the writing is completed, so that the writing can be greatly improved. input speed.

The command is as follows:
````
PUT /my_logs/_settings
{ "refresh_interval": -1 } //Disable refresh mechanism

PUT /my_logs/_settings
{ "refresh_interval": "1s" } //Set to refresh every second
````




Note that the refresh_interval parameter can have a time period. If you only write a 1, it means that the index is refreshed every 1 millisecond, so be careful when setting this parameter.

If you have any questions, you can scan the code and follow the WeChat public account: I am the siege division (woshigcs), leave a message in the background for consultation. Technical debts cannot be owed, and health debts cannot be owed. On the road of seeking the Tao, walk with you.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326136686&siteId=291194637