Low-cost Elasticsearch cloud backup storage based on JuiceFS

Hangzhou Firestone Creation is a domestic data intelligence service provider focusing on industrial big data. In order to solve the needs of data storage and efficient customer service, it chose the Elasticsearch search engine for cloud storage. Based on performance and cost considerations, we chose to use local SSD ECS models to build our own cluster on Alibaba Cloud. But since it is a self-built cluster, how to simultaneously solve the data backup problem and achieve optimal costs?

1. Background introduction

Elasticsearch's data backup is implemented through the snapshot mechanism. In order to complete the snapshot of the cluster, we need to rely on a shared storage system, that is, all nodes need to be mounted to the same directory of the shared storage, and each node needs to have read and write permissions to this directory. Initially we used NAS (ie NFS) to achieve this. Backup, this solution has been running stably for many years.

Here, I would like to emphasize the importance of data backup. Many friends mistakenly believe that Elasticsearch has a copy mechanism. As long as multiple copies are configured, there is no fear of data loss. Why do we need to back up? It needs to be pointed out that no matter how many copies there are, a DELETE misoperation cannot be tolerated; and the copy mechanism must also balance costs and be redundant to a certain extent. Exceeding the threshold will still cause data loss. Backup is an important guarantee for business continuity. Only by being prepared can No worries!

Continuous optimization of cloud costs is a challenge that operation and maintenance personnel always face. Snowflake's use of S3 storage impressed us a lot in terms of cost efficiency. After coming across JuiceFS, we thought it was a very good storage product. Based on the principle of step-by-step, backup storage is a very good entry point, so we built a general low-cost cloud backup storage solution based on JuiceFS and started to practice it.

2.Cost comparison

The title of this article is low cost. Where is the low cost? Let us speak with data. The price comparison of 10T NAS and OSS resource packages is as shown in the following table:

Resource type Original price (yuan/year) Discounted price (yuan/year)
NAS storage-general type 36,864 27,648
OSS-Standard Local Redundancy 13,272 9,954

If OSS is used to replace NAS, the cost will be reduced to 36%, nearly 1/3. The cost reduction effect is significant, and we must do this!

Wait, what about the other costs? JuiceFS Community Edition also requires metadata storage. Indeed, this also requires computational costs. But these days, no one has a shared or auxiliary RDS on the cloud. As a backup system, the demand for random reading and writing of IO is not high. Here we share a MySQL RDS as metadata storage.

3. Deployment process

The deployment process is basically completed with reference to the official documentation of JuiceFS , which is divided into three steps:

3.1 Installation

The installation process is very simple and can be done with just one command. The default is under installation /usr/local/bin. Considering that not all operating systems use this directory as the default path of PATH, from a more general and trouble-free perspective, I recommend installing to the directory and /usr/sbinexecuting the installation command:

curl -sSL https://d.juicefs.com/install | sh - /usr/sbin

Note: This command must be executed on all nodes (all nodes must be installed)

3.2 Create file system

There are two prerequisite steps that are skipped here:

  1. The preparation of OSS Bucket and AK is skipped here. The name of the created Bucket is  juicefs-backup:;

  2. Because MySQL is used for metadata storage, the creation of the library and account is also skipped. The created library name and user name are both: juicefs.

There was an episode. Because MySQL is used for metadata, reference and examples could not be found in the two chapters of the official document, Getting Started Quickly and Metadata Engine Best Practices. There was PostgreSQL but not MySQL. At first, I followed the PostgreSQL writing method, but the syntax was incorrect. Finally, Relevant instructions were found in the Reference-How to Set Up the Metadata Engine chapter:

I don't quite understand why you need to add this bracket, it can only mean that you don't know what's going on. However, it is recommended to add a MySQL chapter to the best practices section of the metadata engine in the official document, so that it can be echoed before and after, making it easier for readers to refer to it.

My final creation command is as follows:


juicefs format \
    --storage oss \
    --bucket juicefs-backup.oss-cn-hangzhou-internal.aliyuncs.com \
    --access-key 【KEY】 \
    --secret-key 【SECRET】 \
    mysql://juicefs:【PASSWORD】@(【RDS-URL】:3306)/juicefs \
    elasticsearch

Notice:

  1. This command only needs to be executed once on any node
  2. [KEY][SECRET][PASSWORD][RDS-URL] needs to be replaced with the actual value

3.3 Mount the file system

The mount command is as follows:

juicefs mount \
    --update-fstab \
    --background \
    --writeback \
    --cache-dir /data/juicefs-cache \
    --cache-size 10240 \
    -o user_id=$(id -u elasticsearch)  \
    mysql://juicefs:【PASSWORD】@\(【RDS-URL】:3306\)/juicefs \
    /backup

The mounting related parameters are explained as follows:

  1. --update-fstab: Update /etc/fstabso that the node will be automatically mounted after restarting.

  2. --writeback: Write data to the local cache and then to OSS to improve backup efficiency. It is recommended to enable it for backup purposes.

  3. --cache-dir /data/juicefs-cacheAnd --cache-size 10240: Allocate 10G as cache on the Elasticsearch storage SSD (the default value is 100GB, considering cost factors, 10GB was selected) to improve read and write performance.

  4. -o user_id=$(id -u elasticsearch): Allows elasticsearch users to read and write. After consulting the official engineer, this parameter can be left unspecified.

Notice:

  1. This command needs to be executed once on each node
  2. [PASSWORD][RDS-URL] needs to be replaced with the actual value

3.4 Set mounting directory permissions

Finally, make sure that the mounted directory can be read and written by Elasticsearch 

chown elasticsearch:elasticsearch /backup

Note: This command needs to be executed once on any node.

3.5 Register Elasticsearch snapshot warehouse

First, you need to configure path.repo in the Elasticsearch configuration file elasticsearch.yaml, for example:

path:
      repo: 
             - /backup

Note: Each node needs to modify the configuration, and the service needs to be restarted after modification.

After each node is restarted, it can be registered through Kibana or using the Elasticsearch Snapshot API.

PUT _snapshot/es-backup
{
    "type": "fs",
    "settings": {
      "location": "'/backup'",
      "compress": "true",
      "max_snapshot_bytes_per_sec": "100m",
      "max_restore_bytes_per_sec": "100m"
    }
  }

Parameter Description:

  1.  es-bakup Is the name of the snapshot warehouse and can be customized

  2. compress Whether to enable compression? We enable it to save space.

  3. max_snapshot_bytes_per_sec/max_restore_bytes_per_sec The maximum snapshot and recovery speed can be set according to your own situation. We set it to: 100M/sec.

Finally, the specific backup implementation operations will not be detailed here. Please refer to the Elasticsearch official documentation .

4. Trapping experience

After completing the above preparations, I was happily waiting for the backup to be successful, but unexpectedly, there was a must-do on the road to trying new things: stepping on pitfalls!

During the creation process of the backup point, an abnormal permission problem of individual nodes occurred. This problem encountered a common problem in distributed clusters reading and writing shared storage: Are the username and id of different node processes exactly the same? There are generally two ideas to solve this problem:

  1. There is no doubt that this is the best way to solve this problem through user mapping without leaving the existing environment. However, I read the official documents several times and tried to mount according to the different IDs of the node Elasticsearch users (see 3.3 Mount Command). After verification, the user attributes of the mounted file system still depended on the actual process; so I thought of NFS. The file system has a parameter called  all_squash, which maps all users to a specific user such as nobody. Unfortunately, JuiceFS can only implement root_squash at present and cannot implement all_squash. This issue was finally reported to the developers of JuiceFS. For details, see PR on Github  .

  2. Change the existing environment to keep the IDs of all Elasticsearch users consistent. Thanks to Elasticsearch's excellent disaster recovery and migration capabilities, I finally solved this problem by reinstalling Elasticserach on a specific node (I finally found out that this problem actually occurred Derived from the installation sequence of Elasticsearch and kibana).

5 Conclusion

Through the implementation of the above steps and measures, the Elasticsearch snapshot backup solution was finally implemented and continues to operate, and the backup efficiency is not inferior to NAS storage.

This article takes distributed cluster backup as an example. Its solution can be used in various other stand-alone system backups. At the same time, with the adaptability of JuiceFS's extensive data storage and metadata engine, it can also become a general low-cost cloud. on backup storage solutions.

I hope this content can be of some help to you. If you have any other questions, please join the JuiceFS community to communicate with everyone.

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5389802/blog/10143359