How to backup ElasticSearch index data to HDFS

The backup strategy in ElasticSearch is relatively mature.

Currently , the backup storage methods supported in ES5.x are as follows:

````
fs //locally mounted disk
url //Network protocol storage supports http, https, ftp
repository-s3 //Amazon
repository-hdfs  //HDFS
repository-azure //Microsoft
repository-gcs   //google
````

Here we mainly introduce how to backup index data to HDFS.

First of all, let's understand the terms related to backup from the noun concept:

(1) Repositories (warehouse)

In an es cluster, if you want to back up data, you must first create a warehouse to store snapshots, and a cluster can create multiple warehouses .

(2) Snapshot

After we have a warehouse, we can create a snapshot. When creating a snapshot, we must select a specified warehouse to create it. Each snapshot can contain multiple indexes. By default, the indexes of the entire cluster are backed up. Of course, we can also specify to back up the data of the indexes we think are important.

(3) Restore (Restore)

After the backup is completed, the restoration of the data, that is, the snapshot, is called restore

ok. After understanding the above three concepts, let's take a look at the specific operation. This time, the main introduction
mainly involves two ES versions:

ElasticSearch2. 3.4

ElasticSearch5.6.4

(1) How to backup index data in ElasticSearch2.x

(1) Install the repository-hdfs plugin on each node

````
bin/plugin install elasticsearch/elasticsearch-repository-hdfs/2.2.0
````

(2) Modify the config/elasticsearch.yml file on each node and add the following properties

````
security.manager.enabled: false
````

(3) Restart the entire cluster

(4) Build a warehouse

````
PUT /_snapshot/my_backup
{
  "type": "hdfs",
  "settings": {
         "path": "/back/es/",
         "load_defaults": "true",
         "compress": "true",
         "uri": "hdfs://192.168.10.160:8020"
  }
}
````

View warehouse information:

````
//View the specified warehouse
GET /_snapshot/my_backup  

//All current warehouse information s below
GET /_snapshot
GET /_snapshot/_all

````

Delete a repository:

````
DELETE /_snapshot/my_backup
````

Note that after deletion, only the references in ES are deleted, and the files backed up on HDFS will not be deleted

(5) Build a snapshot

````
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
{
  "indices": "index_1,index_2",//Be careful not to set this property, the default is to back up all
  "ignore_unavailable": true,
  "include_global_state": false
}
````

Several ways to query snapshots:

````
GET /_snapshot/my_backup/snapshot_1 //Query the specified snapshot
GET /_snapshot/my_backup/snapshot_*,some_other_snapshot //Support wildcard query
GET /_snapshot/my_backup/_all //Query all snapshots

````

Delete a snapshot:

````
DELETE /_snapshot/my_backup/snapshot_1
````

Note that after deletion, only the references in ES are deleted, and the files backed up on HDFS will not be deleted

(6) Restore snapshots

````
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": "index_1,index_2", //Specify index recovery, if not specified is all
  "ignore_unavailable": true,//Ignore the exception index during recovery
  "include_global_state": false,//Whether the global transition information is stored, fasle represents one or several failures, which will not cause the entire task to fail
  "rename_pattern": "index_(.+)",//Do you need to rename the index
  "rename_replacement": "restored_index_$1"//Replaced index name
}
````

(2) How to back up index data in

ElasticSearch5.x The backup method of ElasticSearch5.x is similar to that of ElasticSearch2.x, and only the differences are introduced here.

First of all, the es5 requirement must be the JDK8 version. If your system has multiple jdk versions, and you do not want to change the existing jdk version, then you need to declare the JDK in the following two scripts:

````
vi bin / elasticsearch
vi bin / elasticsearch plugin
````

Add the following jdk specified versions respectively:

````
export JAVA_HOME=/usr/java/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH
````

Then install the backed up plugin on each node:

````
bin/elasticsearch-plugin install repository-hdfs
````

After you are done, just restart the cluster. Note that ElasticSearch5.x does not need to modify the elasticsearch.yml file.

The usage is the same as Elasticsearch2.x and will not be repeated here.

Finally, let me add:

The index of the es1.x backup can be directly restored in

es2.x The index of the es2.x backup can be directly restored in es5.x

However , the index data of es1.x cannot be Used directly in es5.x.

Compatible indexes can only span one major version

Summary :

This article mainly introduces how to back up and restore index data in Elasticsearch 2.x and 5.x versions, and describes the differences between 2.x and 5.x versions At the same time, data backup is a very important part of the production environment. With backup, we can face some sudden online failures more calmly.

If you have any questions, you can scan the code and follow the WeChat public account: I am the siege division (woshigcs), leave a message in the background for consultation. Technical debts cannot be owed, and health debts cannot be owed. On the road of seeking the Tao, walk with you.

How to backup ElasticSearch index data to HDFS

Guess you like