Explore | How does Elasticsearch physically delete historical data for a given period?

1. Inscription

When thinking of deletion, the basic cognition is delete, which is subdivided into deleting documents (document) and deleting indexes; to delete historical data, the basic cognition is: delete data with a given condition, use delete_by_query.
The actual operation found:
- After deleting the document, the disk space did not decrease immediately, but increased?
- In addition to the scheduled task + delete_by_query, is there a better way?

2. Common delete operations

2.1 Deleting a single document

DELETE /twitter/_doc/1

2.2 Delete documents that meet a given condition

POST twitter/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

Note: When performing batch deletion, version conflicts may occur. Deletion is enforced as follows:

POST twitter/_doc/_delete_by_query?conflicts=proceed
{
  "query": {
    "match_all": {}
  }
}

2.3 Deleting a single index

DELETE /twitter

2.4 Delete all indexes

DELETE /_all

or

DELETE /*

Deleting all indexes is a very dangerous operation and should be done with caution.

3. What does the background of the deleted document do?

The return result after the deletion is executed:

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "22",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 17
}

Interpretation:

Every document indexed is versioned.
When deleting a document, a version can be specified to ensure that the related document we are trying to delete is actually deleted and has not been changed in the meantime.

Every write operation performed on a document, including deletes, increments its version .

The real time to delete:

deleting a document doesn’t immediately remove the document from disk; it just marks it as deleted. Elasticsearch will clean up deleted documents in the background as you continue to index more data.

4. What is the difference between deleting an index and deleting a document?

1) Deleting an index will release space immediately, and there is no so-called "marking" logic.

2) When deleting a document, the new document is written, and the old document is marked as deleted. Whether the disk space is released depends on whether the old and new documents are in the same segment file, so the segment merge in the ES background may trigger the physical deletion of the old document in the process of merging the segment files .

However, because a shard may have hundreds of segment files, there is still a high chance that old and new documents exist in different segments and cannot be physically deleted. If you want to manually release the space, you can only do a force merge periodically and set max_num_segments to 1.

POST /_forcemerge

5. How to save only the last 100 days of data?

With the above knowledge, the task of saving data for only nearly 100 days is broken down into:
- 1) delete_by_query settings to retrieve data for the past 100 days;
- 2) perform a forcemerge operation to manually release disk space.

The delete script is as follows:

#!/bin/sh
curl -H'Content-Type:application/json' -d'{
    "query": {
        "range": {
            "pt": {
                "lt": "now-100d",
                "format": "epoch_millis"
            }
        }
    }
}
' -XPOST "http://192.168.1.101:9200/logstash_*/
_delete_by_query?conflicts=proceed"

The merge script is as follows:

#!/bin/sh
curl -XPOST 'http://192.168.1.101:9200/_forcemerge?
only_expunge_deletes=true&max_num_segments=1'

6. Is there a more general method?

Yes, use the ES official website tool - curator tool.

6.1 Introduction to curator

Main purpose: to plan and manage indexes for ES. Support common operations: create, delete, merge, reindex, snapshot and other operations.

6.2 Curator official website address

http://t.cn/RuwN0oM

Git address: https://github.com/elastic/curator

6.3 Curator Installation Wizard

Address: http://t.cn/RuwCkBD

Note:
Various blog tutorials of curator emerge in an endless stream, but the old version of curator is quite different from the new version. It is recommended to refer to the latest manual on the official website for deployment.
The old version of the command line mode is no longer supported by the new version.

6.4 curator command line operation

$ curator --help
Usage: curator [OPTIONS] ACTION_FILE

  Curator for Elasticsearch indices.

  See http://elastic.co/guide/en/elasticsearch/client/curator/current

Options:
  --config PATH  Path to configuration file. Default: ~/.curator/curator.yml
  --dry-run      Do not perform any changes.
  --version      Show the version and exit.
  --help         Show this message and exit.

Core:
- Configuration file config.yml: configure the ES address to be connected, log configuration, log level, etc.;

  • Execution file action.yml: configure the operations to be performed (in batches), configure the format of the index (prefix matching, regular matching, etc.)

6.5 Curator Applicable Scenarios

the most important is:

  • Just take the delete operation as an example: the premise that the curator can delete the index after x days is very simple: the index naming should follow a specific naming pattern - for example: the index named after the day: logstash_2018.04.05.

  • The naming pattern needs to correspond to the timestring under delete_indices in action.yml.

7. Summary

  • Refer to the latest documents on the official website, historical documents of historical versions are easy to mislead people;
  • Do more real practice, not just know;
  • medcl: ES new version 6.3 has an Index LifeCycle Management that can easily manage the storage period of the index.

refer to:

[1]http://t.cn/RuwOTv
[2]http://t.cn/RuwXHBr
[3]http://t.cn/RuwOofC

write picture description here

2018-04-22 14:51 Thinking at home in front of the bed

Author: Mingyi World Please indicate the source of the
reprint , the original address:
https://blog.csdn.net/laoyang360/article/details/80038930
If you feel that this article is helpful to you, please click 'Like' to support, your support It's the biggest motivation for me to keep writing, thank you!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324685918&siteId=291194637