ES 数据迁移至HDFS

es 备份存储方式支持以下几种方式:
fs 文件挂载
url 网络协议存储 (http,https,ftp)
s3 亚马逊
hdfs
azure 微软
gcs 谷歌


(1) repository(仓库)
es集群中,想要备份数据,必须创建仓库,用来存储快照,一个集群可以创建多个仓库


(2) snapshot (快照)
创建仓库后,我们可以创建快照,创建快照时必须指定一个仓库,需要依附某个仓库
某个快照包含多个index(数据库,schemea),默认备份整个集群index;当然可以指定备份的索引

(3)restore (恢复)
备份后导入到hdfs,进行快照恢复.

(4) es:准备数据: index: myindex1 ,type:mytype2
curl  -XPOST http://hadoop:9200/myindex1/mytype2/6?pretty -d '{
 "name":"Rose6",
 "age":266,
 "addr":"beijing",
 "sex":"male",
 "nickname":"jack lover"
 }'

curl  -XPOST http://hadoop:9200/myindex1/mytype2/7?pretty -d '{
 "name":"Rose7",
 "age":267,
 "addr":"beijing",
 "sex":"male",
 "nickname":"jack lover"
 }'

curl  -XPOST http://hadoop:9200/myindex1/mytype2/8?pretty -d '{
 "name":"Rose8",
 "age":268,
 "addr":"beijing",
 "sex":"male",
 "nickname":"jack lover"
 }'

curl  -XPOST http://hadoop:9200/myindex1/mytype2/9?pretty -d '{
 "name":"Rose9",
 "age":269,
 "addr":"beijing",
 "sex":"male",
 "nickname":"jack lover"
 }'

curl  -XPOST http://hadoop:9200/myindex1/mytype2/10?pretty -d '{
 "name":"Rose10",
 "age":2610,
 "addr":"beijing",
 "sex":"male",
 "nickname":"jack lover"
 }'


(5) 安装hdfs数据抽取插件
2.x:
1. 安装插件:
bin/plugin install elasticsearch/elasticsearch-repository-hdfs/2.2.0
2.修改yml属性,禁用安全管理
security.manager.enabled:false
3.重启集群

4.构建仓库
PUT /_snapshot/my_backup{
   "type":"hdfs",
   "settings":{
     "path":/back/es/,
     "load_defaults":"true",
     "compress":"true",
     "uri":"hdfs://hadoop:9000"
  }

}

查看仓库信息
GET /_snapshot/my_backup

//当前所有仓库信息
GET /_snapshot
GET /_snapshot/_all


删除仓库信息:仅仅删除index,不会删除hdfs数据
DELETE /_snapshot/my_backup

5.创建快照
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
{
 "indices":"index_1,index_2",  //不设置指定备份哪个index,默认备份所有index
 "ignore_unavailable":true,
 "include_global_state":false

}

查询快照
GET /_snapshot/my_backup/snapshot_1  //查询指定快照
GET /_snapshot/my_backup/snapshot_*,otherSnapshot //支持通配符
GET /_snapshot/my_backup/_all //查询所有快照

删除一个快照:仅删除引用,未删除hdfs数据
DELETE /_snapshot/my_backup/snapshot_1


6.迁移数据到hdfs
7.恢复数据
POST /_snapshot/my_backup/snapshot_1/_restore{
    "indices":"myindex1,myindex2", //指定恢复(数据库)index,不指定就是所有
    "ignore_unavailable":true, //忽略恢复时异常
    "include_global_state":false, //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
    "rename_pattern":"index_(.+), //是否需要重命名索引
    "rename_replacement":"restored_index_$1" //替换后的索引名

}

ES 5.x :

1.安装插件:要在每个节点进行安装
spark@hadoop:/usr/cdh/elasticsearch/bin$ elasticsearch-plugin  install  repository-hdfs
-> Downloading repository-hdfs from elastic
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.security.krb5
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.jaas_nt
* java.lang.RuntimePermission loadLibrary.jaas_unix
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.net.SocketPermission localhost:0 listen,resolve
* java.security.SecurityPermission insertProvider.SaslPlainServer
* java.security.SecurityPermission putProviderProperty.SaslPlainServer
* java.util.PropertyPermission * read,write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission getSubject
* javax.security.auth.AuthPermission modifyPrincipals
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.AuthPermission modifyPublicCredentials
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KerberosTicket * "*" read
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KeyTab * "*" read
* javax.security.auth.PrivateCredentialPermission org.apache.hadoop.security.Credentials * "*" read
* javax.security.auth.kerberos.ServicePermission * initiate
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-hdfs
spark@hadoop:/usr/cdh/elasticsearch/bin$ GET 
Usage: GET [-options] <url>...
    -m <method>   use method for the request (default is 'GET')
    -f            make request even if GET believes method is illegal
    -b <base>     Use the specified URL as base
    -t <timeout>  Set timeout value
    -i <time>     Set the If-Modified-Since header on the request
    -c <conttype> use this content-type for POST, PUT, CHECKIN
    -a            Use text mode for content I/O
    -p <proxyurl> use this as a proxy
    -P            don't load proxy settings from environment
    -H <header>   send this HTTP header (you can specify several)
    -C <username>:<password>
                  provide credentials for basic authentication

    -u            Display method and URL before any response
    -U            Display request headers (implies -u)
    -s            Display response status code
    -S            Display response status chain (implies -u)
    -e            Display response headers (implies -s)
    -E            Display whole chain of headers (implies -S and -U)
    -d            Do not display content
    -o <format>   Process HTML content in various ways

    -v            Show program version
    -h            Print this message

2.每个集群节点安装完毕后,重启集群,无需修改参数(yml文件内)

3.重启集群

4.构建仓库
curl -XPUT 'http://hadoop:9200/_snapshot/my_backup'  -d  '{
   "type":"hdfs",
   "settings":{
     "path":"/back/es/",
     "load_defaults":"true",
     "compress":"true",
     "uri":"hdfs://hadoop:8020"  //或者9000
  }

}'


curl -XPUT 'http://hadoop:9200/_snapshot/my_backup2'  -d  '{
   "type":"hdfs",
   "settings":{
     "path":"/back/es/",
     "load_defaults":"true",
     "compress":"true",
     "uri":"hdfs://hadoop:8020"  //或者9000
  }

}'
 

查看仓库信息
curl -XGET  http://hadoop:9200/_snapshot/my_backup?pretty


//当前所有仓库信息
curl -XGET  http://hadoop:9200/_snapshot/?pretty
curl -XGET  http://hadoop:9200/_snapshot_all?pretty

删除仓库信息:仅仅删除index,不会删除hdfs数据
curl -DELETE /_snapshot/my_backup

5.创建快照
curl -XPUT 'http://hadoop:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true' -d '{
 "indices":"myindex1,spark2", 
 "ignore_unavailable":true,
 "include_global_state":false

}'

{"snapshot":
{"snapshot":"snapshot_1",
"uuid":"HIuYAlqxT5OJlZqXhu1HAw",
"version_id":5060499,
"version":"5.6.4","indices":["myindex1","spark2"],
"state":"SUCCESS",
"start_time":"2018-05-05T11:49:47.305Z",
"start_time_in_millis":1525520987305,
"end_time":"2018-05-05T11:49:49.743Z",
"end_time_in_millis":1525520989743,
"duration_in_millis":2438,
"failures":[],
"shards":{"total":10,"failed":0,"successful":10
}
}
}

查询快照

查询指定快照:
curl -XGET 'http://hadoop:9200/_snapshot/my_backup/snapshot_1?pretty'
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_1",
      "uuid" : "HIuYAlqxT5OJlZqXhu1HAw",
      "version_id" : 5060499,
      "version" : "5.6.4",
      "indices" : [
        "myindex1",
        "spark2"
      ],
      "state" : "SUCCESS",
      "start_time" : "2018-05-05T11:49:47.305Z",
      "start_time_in_millis" : 1525520987305,
      "end_time" : "2018-05-05T11:49:49.743Z",
      "end_time_in_millis" : 1525520989743,
      "duration_in_millis" : 2438,
      "failures" : [ ],
      "shards" : {
        "total" : 10,
        "failed" : 0,
        "successful" : 10
      }
    }
  ]
}

GET /_snapshot/my_backup/snapshot_1  //查询指定快照
GET /_snapshot/my_backup/snapshot_*,otherSnapshot //支持通配符
GET /_snapshot/my_backup/_all //查询所有快照

删除一个快照:仅删除引用,未删除hdfs数据,取消一个快照
DELETE /_snapshot/my_backup/snapshot_1

curl -XDELETE ‘http://172.20.33.3:9200/_snapshot/backup/snapshot_20171223’
6.迁移数据到hdfs
7.恢复数据
恢复时如果报错: "[my_backup:snapshot_1/HIuYAlqxT5OJlZqXhu1HAw] cannot restore index [spark2] because it's open"

需要先关闭index: 如果不报错,则不用
curl -XPOST “http://127.0.0.1:9200/myindex1/_close'
curl -XPOST “http://127.0.0.1:9200/spark2/_close'

恢复:index

curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty'

打开index:
curl -XPOST “http://127.0.0.1:9200/myindex1/_close'
curl -XPOST “http://127.0.0.1:9200/spark2/_close'


 curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty' -d '{
    "indices":"myindex1,spark2", //指定恢复(数据库)index,不指定就是所有
    "ignore_unavailable":true, //忽略恢复时异常
    "include_global_state":false, //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
    "rename_pattern":"index_(.+), //是否需要重命名索引
    "rename_replacement":"restored_index_$1" //替换后的索引名
}'

 curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty' -d '{
    "indices":"myindex1", //指定恢复(数据库)index,不指定就是所有
    "ignore_unavailable": true, //忽略恢复时异常
    "include_global_state": false //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
     }'


8.收集快照脚本
#!/usr/bin 
current_time=$(date +%Y%m%d%H%M%S) 
command_prefix=” http://172.20.33.3:9200/_snaps … ot%3B 
command=commandprefixcommandprefixcurrent_time 
echo commandcurl−xputcommandcurl−xputcommand -d ‘{“indices”:”index*,logstash*,nginx*,magicianlog*,invokelog*,outside*”}’ 
/home/hadoop/elk/script/snapshot_gatewaylog_hdfs.sh 
3.5. crontab 
0 0 /1 * /bin/bash /home/hadoop/elk/script/snapshot_all_hdfs.sh>>/home/hadoop/elk/task_log/snapshot_all_day.log 2>&1

9.补充:
es1.x备份索引可以直接在es2.x恢复
es2.x备份索引可以直接在es5.x恢复
但是es1.x不可以在es5.x恢复,兼容只垮一个版本

猜你喜欢

转载自blog.csdn.net/dymkkj/article/details/81278491