KubeSphere log backup and recovery practice

Why do I need log backups

KubeSphere logging system using Fluent Bit + ElasticSearch log collection storage solutions, and implement life cycle management of the Index by Curator, regularly clean up old log. For scenes have log auditing and disaster recovery needs for, KubeSphere default log retention policy of seven days is not enough, only the data disk backup ElasticSearch does not guarantee data integrity and recoverability .

ElasticSearch SnapShot API provides the open source community to help us achieve long-term storage and recovery snapshot. This article describes how (version 2.1.0) built ElasticSearch (version 6.7.0) components to transform the practice of log backups to meet the demand for disaster recovery and audit KubeSphere.

Note: If it is a small amount of data, query logs with export scene, you can use KubeSphere a key export feature, or try elasticsearch-dump tool. External commercial version ElasticSearch of KubeSphere users can directly open SnapShot Lifecycle Management function ElasticSearch X-Pack provided.

Prerequisites

Before performing a snapshot storage, we need to register the snapshot file storage warehouse in ElasticSearch cluster. Snapshots warehouse can use a shared file system, such as NFS. Other memory types, such as AWS S3, installed separately repository plug support.

We NFS example. Shared snapshots warehouse need to mount all the master nodes and data nodes ElasticSearch and configuration parameters elasticsearch.yaml in path.repo. NFS support ReadWriteMany access mode , it is very appropriate to use NFS.

The first step, we first prepare a NFS server, for example QingCloud vNAS service used in this tutorial, the shared directory path is / mnt / shared_dir.

Then prepare on KubeSphere environment NFS type of StorageClass, we apply Persistent Volume back to a snapshot when the warehouse will be used. This article NFS storage environment has been configured at installation, so no additional action. There need to install reader is referred to KubeSphere official documents , modify the conf / common.yaml and re-execute the install.sh script.

1. ElasticSearch Setup

In the KubeSphere, ElasticSearch master node is a state where the replica set elasticsearch-logging-discovery, data node is elasticsearch-logging-data, the tutorial environment for a master from the two:

$ kubectl get sts -n kubesphere-logging-system
NAME                              READY   AGE
elasticsearch-logging-data        2/2     18h
elasticsearch-logging-discovery   1/1     18h

The first step, we are ready to ElasticSearch cluster snapshot repository persistence volume:

cat <<EOF | kubectl create -f -
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: elasticsearch-logging-backup
  namespace: kubesphere-logging-system
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 100Gi
  # 根据你的环境填充 storageClassName 字段
  storageClassName: nfs-client
EOF

The second step, to modify elasticsearch.yml profile, the NFS directory path to each register from the master node. In KubeSphere in, elasticsearch.yml configuration can be found in ConfigMap elasticsearch-logging in. In the last line, addpath.repo: ["/usr/share/elasticsearch/backup"]

The third step is to modify StatefulSet YAML, storage volume will mount ElasticSearch each node, and through the chown command in initContainer
startup, owner users and groups to initialize the snap warehouse folder is elasticsearch.

In this step is important to note that we can not directly edit kubectl modify Stateful, you need to first yaml content backed up, revised and re-applied after kubectl apply.

kubectl get sts -n kubesphere-logging-system elasticsearch-logging-data -oyaml > elasticsearch-logging-data.yml
kubectl get sts -n kubesphere-logging-system elasticsearch-logging-discovery -oyaml > elasticsearch-logging-discovery.yml

修改 yaml 文件,以修改上面生成的 elasticsearch-logging-data.yml 为例,主节点的 yaml 文件一样修改。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    # 由于篇幅原因,此处省略
    # ...
  name: elasticsearch-logging-data
  namespace: kubesphere-logging-system  
  # -------------------------------------------------
  #   注释或删除非 labels、name、namespace 的元信息字段   
  # ------------------------------------------------- 
  # resourceVersion: "109019"
  # selfLink: /apis/apps/v1/namespaces/kubesphere-logging-system/statefulsets/elasticsearch-logging-data
  # uid: 423adffe-271f-4657-9078-1a75c387eedc
spec:
  # ...
  template:
    # ...
    spec:
      # ...
      containers:
      - name: elasticsearch
        # ...
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
        # --------------------------
        #   添加 backup Volume 挂载   
        # --------------------------
        - mountPath: /usr/share/elasticsearch/backup
          name: backup  
        - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          name: config
          subPath: elasticsearch.yml
        # ...
      initContainers:
      - name: sysctl
        # ...
      - name: chown
        # --------------------------------------
        #   修改 command,调整快照仓库文件夹拥有者   
        # --------------------------------------
        command:
        - /bin/bash
        - -c
        - |
          set -e; set -x; chown elasticsearch:elasticsearch /usr/share/elasticsearch/data; for datadir in $(find /usr/share/elasticsearch/data -mindepth 1 -maxdepth 1 -not -name ".snapshot"); do
            chown -R elasticsearch:elasticsearch $datadir;
          done; chown elasticsearch:elasticsearch /usr/share/elasticsearch/logs; for logfile in $(find /usr/share/elasticsearch/logs -mindepth 1 -maxdepth 1 -not -name ".snapshot"); do
            chown -R elasticsearch:elasticsearch $logfile;
          done; chown elasticsearch:elasticsearch /usr/share/elasticsearch/backup; for backupdir in $(find /usr/share/elasticsearch/backup -mindepth 1 -maxdepth 1 -not -name ".snapshot"); do
            chown -R elasticsearch:elasticsearch $backupdir;
          done
        # ...
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
        # --------------------------
        #   添加 backup Volume 挂载   
        # --------------------------
        - mountPath: /usr/share/elasticsearch/backup
          name: backup  
      # ...
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: dedicated
        value: log
      volumes:
      - configMap:
          defaultMode: 420
          name: elasticsearch-logging
        name: config
      # -----------------------
      #   指定第一步中创建的 PVC   
      # -----------------------
      - name: backup
        persistentVolumeClaim:
          claimName: elasticsearch-logging-backup
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: nfs-client
      volumeMode: Filesystem
# --------------------------------------
#   注释或删除 status 字段   
# --------------------------------------
#     status:
#       phase: Pending
# status:
#   ...

修改完后,可以删除 ElasticSearch StatefulSet,并重新应用新 yaml:

kubectl delete sts -n kubesphere-logging-system elasticsearch-logging-data
kubectl delete sts -n kubesphere-logging-system elasticsearch-logging-discovery

kubectl apply -f elasticsearch-logging-data.yml
kubectl apply -f elasticsearch-logging-discovery.yml

最后一步,等待 ElasticSearch 全部节点启动后,调用 Snapshot API 创建出一个名为 ks-log-snapshots 的 repository,并开启压缩功能:

curl -X PUT "elasticsearch-logging-data.kubesphere-logging-system.svc:9200/_snapshot/ks-log-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/backup",
    "compress": true
  }
}
'

返回 "acknowledged": true 表示成功。至此,ElasticSearch 集群快照功能的准备工作已经就绪。后面只需要定时的调用 Snapshot API 实现增量备份即可。ElasticSearch 自动化增量备份可以借助 Curator 来完成。

2. 使用 Curator 定时快照

ElasticSearch Curator 能帮助管理 ElasticSearch 索引和快照。接下来,我们使用 Curator 来实现自动化定时日志备份。KubeSphere 日志组件默认包含了 Curator(被部署为一个 CronJob,每天凌晨 1 点执行)来管理索引,我们可以借助同一个 Curator。Curator 的执行规则在 ConfigMap 中可以找到。

这里我们需要在 action_file.yml 字段值中增加两个 action:snapshot 和 delete_snapshots。并把这个规则优先级提高到 delete_indices 前。该配置规定了 snapshot 创建命名方式为 snapshot-%Y%m%d%H%M%S,保留 45 天的 snapshots。具体参数含义可参考 Curator Reference。

actions:
  1:
    action: snapshot
    description: >-
      Snapshot ks-logstash-log prefixed indices with the default snapshot 
      name pattern of 'snapshot-%Y%m%d%H%M%S'.
    options:
      repository: ks-log-snapshots
      name: 'snapshot-%Y%m%d%H%M%S'
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
      # If disable_action is set to True, Curator will ignore the current action
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      # You may change the index pattern below to fit your case
      value: ks-logstash-log-
  2: 
    action: delete_snapshots
    description: >-
      Delete snapshots from the selected repository older than 45 days
      (based on creation_date), for 'snapshot' prefixed snapshots.
    options:
      repository: ks-log-snapshots
      ignore_empty_list: True
      # If disable_action is set to True, Curator will ignore the current action
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: snapshot-
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y%m%d%H%M%S'
      unit: days
      unit_count: 45
  3:
    action: delete_indices
    # 原有内容不变
    # ...

3. 日志恢复与查看

当我们需要回顾某几天前的日志时,我们可以通过快照恢复,比如 11 月 12 日的日志。首先我们需要检查最新的 Snapshot:

curl -X GET "elasticsearch-logging-data.kubesphere-logging-system.svc:9200/_snapshot/ks-log-snapshots/_all?pretty"

然后通过最新的 Snapshot 恢复指定日期的索引(也可以选择恢复全部)。这个 API 会恢复日志索引到数据盘,所以请确保数据盘的存储空间足够充足。另外,你也可以直接备份对应的 PV(Snapshot 仓库对应的存储卷是可以直接被用来备份的),挂载到其他 ElasticSearch 集群,将日志恢复到其他 ElasticSearch 集群中使用。

curl -X POST "elasticsearch-logging-data.kubesphere-logging-system.svc:9200/_snapshot/ks-log-snapshots/snapshot-20191112010008/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "ks-logstash-log-2019.11.12",
  "ignore_unavailable": true,
  "include_global_state": true,
}
'

根据日志量的大小,需要等到的时间几分钟不等。我们就可以通过 KubeSphere 日志 Dashboard 查看日志了。

参考文档

ElasticSearch Reference: Snapshot And Restore

Curator Reference: snapshot

Meetup 预告


KubeSphere (https://github.com/kubesphere/kubesphere) 是一个开源的以应用为中心的容器管理平台,支持部署在任何基础设施之上,并提供简单易用的 UI,极大减轻日常开发、测试、运维的复杂度,旨在解决 Kubernetes 本身存在的存储、网络、安全和易用性等痛点,帮助企业轻松应对敏捷开发与自动化监控运维、端到端应用交付、微服务治理、多租户管理、多集群管理、服务与网络管理、镜像仓库、AI 平台、边缘计算等业务场景。

Guess you like

Origin www.cnblogs.com/kubesphere/p/12030733.html