ES 数据迁移至HDFS

es 备份存储方式支持以下几种方式:
fs 文件挂载
url 网络协议存储 (http,https,ftp)
s3 亚马逊
hdfs
azure 微软
gcs 谷歌

(1) repository(仓库)
es集群中,想要备份数据,必须创建仓库,用来存储快照,一个集群可以创建多个仓库

(2) snapshot (快照)
创建仓库后,我们可以创建快照,创建快照时必须指定一个仓库,需要依附某个仓库
某个快照包含多个index(数据库,schemea),默认备份整个集群index；当然可以指定备份的索引

(3)restore (恢复)
备份后导入到hdfs,进行快照恢复.

(4) es:准备数据: index: myindex1 ,type:mytype2
curl -XPOST http://hadoop:9200/myindex1/mytype2/6?pretty -d '{
"name":"Rose6",
"age":266,
"addr":"beijing",
"sex":"male",
"nickname":"jack lover"
}'

curl -XPOST http://hadoop:9200/myindex1/mytype2/7?pretty -d '{
"name":"Rose7",
"age":267,
"addr":"beijing",
"sex":"male",
"nickname":"jack lover"
}'

curl -XPOST http://hadoop:9200/myindex1/mytype2/8?pretty -d '{
"name":"Rose8",
"age":268,
"addr":"beijing",
"sex":"male",
"nickname":"jack lover"
}'

curl -XPOST http://hadoop:9200/myindex1/mytype2/9?pretty -d '{
"name":"Rose9",
"age":269,
"addr":"beijing",
"sex":"male",
"nickname":"jack lover"
}'

curl -XPOST http://hadoop:9200/myindex1/mytype2/10?pretty -d '{
"name":"Rose10",
"age":2610,
"addr":"beijing",
"sex":"male",
"nickname":"jack lover"
}'

(5) 安装hdfs数据抽取插件
2.x:
1. 安装插件:
bin/plugin install elasticsearch/elasticsearch-repository-hdfs/2.2.0
2.修改yml属性,禁用安全管理
security.manager.enabled:false
3.重启集群

4.构建仓库
PUT /_snapshot/my_backup{
"type":"hdfs",
"settings":{
"path":/back/es/,
"load_defaults":"true",
"compress":"true",
"uri":"hdfs://hadoop:9000"
}

}

查看仓库信息
GET /_snapshot/my_backup

//当前所有仓库信息
GET /_snapshot
GET /_snapshot/_all

删除仓库信息:仅仅删除index,不会删除hdfs数据
DELETE /_snapshot/my_backup

5.创建快照
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
{
"indices":"index_1,index_2", //不设置指定备份哪个index,默认备份所有index
"ignore_unavailable":true,
"include_global_state":false

}

查询快照
GET /_snapshot/my_backup/snapshot_1 //查询指定快照
GET /_snapshot/my_backup/snapshot_*,otherSnapshot //支持通配符
GET /_snapshot/my_backup/_all //查询所有快照

删除一个快照:仅删除引用,未删除hdfs数据
DELETE /_snapshot/my_backup/snapshot_1

6.迁移数据到hdfs
7.恢复数据
POST /_snapshot/my_backup/snapshot_1/_restore{
"indices":"myindex1,myindex2", //指定恢复(数据库)index,不指定就是所有
"ignore_unavailable":true, //忽略恢复时异常
"include_global_state":false, //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
"rename_pattern":"index_(.+), //是否需要重命名索引
"rename_replacement":"restored_index_$1" //替换后的索引名

}

ES 5.x :

1.安装插件:要在每个节点进行安装
spark@hadoop:/usr/cdh/elasticsearch/bin$ elasticsearch-plugin install repository-hdfs
-> Downloading repository-hdfs from elastic
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.security.krb5
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.jaas_nt
* java.lang.RuntimePermission loadLibrary.jaas_unix
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.net.SocketPermission localhost:0 listen,resolve
* java.security.SecurityPermission insertProvider.SaslPlainServer
* java.security.SecurityPermission putProviderProperty.SaslPlainServer
* java.util.PropertyPermission * read,write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission getSubject
* javax.security.auth.AuthPermission modifyPrincipals
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.AuthPermission modifyPublicCredentials
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KerberosTicket * "*" read
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KeyTab * "*" read
* javax.security.auth.PrivateCredentialPermission org.apache.hadoop.security.Credentials * "*" read
* javax.security.auth.kerberos.ServicePermission * initiate
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-hdfs
spark@hadoop:/usr/cdh/elasticsearch/bin$ GET
Usage: GET [-options] <url>...
-m <method> use method for the request (default is 'GET')
-f make request even if GET believes method is illegal
-b <base> Use the specified URL as base
-t <timeout> Set timeout value
-i <time> Set the If-Modified-Since header on the request
-c <conttype> use this content-type for POST, PUT, CHECKIN
-a Use text mode for content I/O
-p <proxyurl> use this as a proxy
-P don't load proxy settings from environment
-H <header> send this HTTP header (you can specify several)
-C <username>:<password>
provide credentials for basic authentication

-u Display method and URL before any response
-U Display request headers (implies -u)
-s Display response status code
-S Display response status chain (implies -u)
-e Display response headers (implies -s)
-E Display whole chain of headers (implies -S and -U)
-d Do not display content
-o <format> Process HTML content in various ways

-v Show program version
-h Print this message

2.每个集群节点安装完毕后,重启集群,无需修改参数(yml文件内)

3.重启集群

4.构建仓库
curl -XPUT 'http://hadoop:9200/_snapshot/my_backup' -d '{
"type":"hdfs",
"settings":{
"path":"/back/es/",
"load_defaults":"true",
"compress":"true",
"uri":"hdfs://hadoop:8020" //或者9000
}

curl -XPUT 'http://hadoop:9200/_snapshot/my_backup2' -d '{
"type":"hdfs",
"settings":{
"path":"/back/es/",
"load_defaults":"true",
"compress":"true",
"uri":"hdfs://hadoop:8020" //或者9000
}

查看仓库信息
curl -XGET http://hadoop:9200/_snapshot/my_backup?pretty

//当前所有仓库信息
curl -XGET http://hadoop:9200/_snapshot/?pretty
curl -XGET http://hadoop:9200/_snapshot_all?pretty

删除仓库信息:仅仅删除index,不会删除hdfs数据
curl -DELETE /_snapshot/my_backup

5.创建快照
curl -XPUT 'http://hadoop:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true' -d '{
"indices":"myindex1,spark2",
"ignore_unavailable":true,
"include_global_state":false

{"snapshot":
{"snapshot":"snapshot_1",
"uuid":"HIuYAlqxT5OJlZqXhu1HAw",
"version_id":5060499,
"version":"5.6.4","indices":["myindex1","spark2"],
"state":"SUCCESS",
"start_time":"2018-05-05T11:49:47.305Z",
"start_time_in_millis":1525520987305,
"end_time":"2018-05-05T11:49:49.743Z",
"end_time_in_millis":1525520989743,
"duration_in_millis":2438,
"failures":[],
"shards":{"total":10,"failed":0,"successful":10
}
}
}

查询快照

查询指定快照:
curl -XGET 'http://hadoop:9200/_snapshot/my_backup/snapshot_1?pretty'
{
"snapshots" : [
{
"snapshot" : "snapshot_1",
"uuid" : "HIuYAlqxT5OJlZqXhu1HAw",
"version_id" : 5060499,
"version" : "5.6.4",
"indices" : [
"myindex1",
"spark2"
],
"state" : "SUCCESS",
"start_time" : "2018-05-05T11:49:47.305Z",
"start_time_in_millis" : 1525520987305,
"end_time" : "2018-05-05T11:49:49.743Z",
"end_time_in_millis" : 1525520989743,
"duration_in_millis" : 2438,
"failures" : [ ],
"shards" : {
"total" : 10,
"failed" : 0,
"successful" : 10
}
}
]
}

GET /_snapshot/my_backup/snapshot_1 //查询指定快照
GET /_snapshot/my_backup/snapshot_*,otherSnapshot //支持通配符
GET /_snapshot/my_backup/_all //查询所有快照

删除一个快照:仅删除引用,未删除hdfs数据,取消一个快照
DELETE /_snapshot/my_backup/snapshot_1

curl -XDELETE ‘http://172.20.33.3:9200/_snapshot/backup/snapshot_20171223’
6.迁移数据到hdfs
7.恢复数据
恢复时如果报错: "[my_backup:snapshot_1/HIuYAlqxT5OJlZqXhu1HAw] cannot restore index [spark2] because it's open"

需要先关闭index: 如果不报错,则不用
curl -XPOST “http://127.0.0.1:9200/myindex1/_close'
curl -XPOST “http://127.0.0.1:9200/spark2/_close'

恢复:index

curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty'

打开index:
curl -XPOST “http://127.0.0.1:9200/myindex1/_close'
curl -XPOST “http://127.0.0.1:9200/spark2/_close'

curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty' -d '{
"indices":"myindex1,spark2", //指定恢复(数据库)index,不指定就是所有
"ignore_unavailable":true, //忽略恢复时异常
"include_global_state":false, //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
"rename_pattern":"index_(.+), //是否需要重命名索引
"rename_replacement":"restored_index_$1" //替换后的索引名
}'

curl -XPOST 'http://hadoop:9200/_snapshot/my_backup/snapshot_1/_restore?pretty' -d '{
"indices":"myindex1", //指定恢复(数据库)index,不指定就是所有
"ignore_unavailable": true, //忽略恢复时异常
"include_global_state": false //是否存储全局状态信息,false:一个或几个失败,不会导致整个任务失败
}'

8.收集快照脚本
#!/usr/bin
current_time=$(date +%Y%m%d%H%M%S)
command_prefix=” http://172.20.33.3:9200/_snaps … ot%3B
command=commandprefixcommandprefixcurrent_time
echo commandcurl−xputcommandcurl−xputcommand -d ‘{“indices”:”index*,logstash*,nginx*,magicianlog*,invokelog*,outside*”}’
/home/hadoop/elk/script/snapshot_gatewaylog_hdfs.sh
3.5. crontab
0 0 /1 * /bin/bash /home/hadoop/elk/script/snapshot_all_hdfs.sh>>/home/hadoop/elk/task_log/snapshot_all_day.log 2>&1

9.补充:
es1.x备份索引可以直接在es2.x恢复
es2.x备份索引可以直接在es5.x恢复
但是es1.x不可以在es5.x恢复,兼容只垮一个版本

猜你喜欢