Let JuiceFS help you do "remote backup"

Xiao Zhang, who lives in Xierqi, Beijing, is an operation and maintenance engineer of an Internet finance company. Data in the financial industry is very valuable, and any damage or loss cannot be tolerated.

To this end, Xiao Zhang chose the highest-quality computer room in Beijing, bought the best-quality hardware, and made a comprehensive data backup disaster recovery strategy:

Do a full backup every 24 hours, a snapshot backup every 1 hour, and an incremental backup every 5 minutes. The backed-up data is stored in a dedicated backup server, and there will be 3-copy redundancy in the distributed system, and a copy placement strategy across racks is also considered. Each link has monitoring and alarming, the system is running well, and various faults can be locked and dealt with in time.

But in this way, can the data disaster recovery strategy be foolproof?

On a hot summer evening, Xiao Zhang finished his day's work and could finally go home to enjoy beer crayfish. Who would have thought that the weather would change suddenly, the storm, thunder, lightning, and Xiao Zhang's phone rang:

The engine room was struck by lightning! Got hacked three times! ! ! !

Xiao Zhang hurried back to rescue. After all his efforts, he was able to recover most of the data, but there was still a small amount of data that could not be recovered because the hard disk where all copies of this part of the data were located was damaged...

This story seems incredible, but life is often more dramatic than reality. This kind of story of "being struck by lightning" has happened in real life. Someone around me has encountered it, even a big company like Google has encountered it.

**In August 2015, Google's European data center europe-west1-b suffered a natural disaster and was struck by lightning! Although Google's disaster recovery plan is very strict, there is still a small amount of data permanently lost. **Google gave a piece of advice on backup security strategies at the end of the official accident report. The original text is as follows:

We would like to take this opportunity to highlight an important reminder for our customers: GCE instances and Persistent Disks within a zone exist in a single Google datacenter and are therefore unavoidably vulnerable to datacenter-scale disasters. Customers who need maximum availability should be prepared to switch their operations to another GCE zone.

The general idea is that there is a single data center risk for computing instances and storage disks in a zone on the Google Cloud Platform, and disasters at the data center level cannot be avoided. Customers are reminded to make their own off-site backups to ensure optimal data security.

In addition to natural disasters such as lightning strikes, our systems are faced with various data security threats every day, such as power outages in the computer room, UPS failures, unplugged network cables, system intrusions, human error, etc.

Another more embarrassing statistic is that only 6% of companies survive the loss of important data without a disaster recovery plan, according to a study by the Ponemon Institute, an information security agency.

Disasters may happen at any time. Facing this reality, what can minimize the risk factor is not to find solutions after disasters, but to keep "important data always backed up" .

Backing up important data to a relatively isolated system (off-site data center) is a very effective backup solution, which can avoid most of the risks mentioned above and ensure the security of company business data.

How to do offsite backup?

Offsite backup, as the name suggests, is to back up data to another place that is physically isolated.

In the case of existing local backup (same computer room), off-site backup means to make a complete copy of the data elsewhere. Depending on whether the main business is a self-built computer room or a public cloud, there are usually the following types of locations for off-site backup:

  1. There are two or more data centers, and data can be backed up between different data centers;

  2. The main business system is in its own (or leased) data center, and the backup data is in the public cloud;

  3. The main business system is on the public cloud, and a copy is backed up in another service zone (Region/Zone);

  4. The main business system is on the public cloud, and a backup copy is on another public cloud;

It is rare to build multiple data centers by yourself, with a long cycle and high labor costs. For small and medium-sized companies, and even some non-core business departments of large companies, the current mainstream approach is to choose public cloud as an off-site backup solution , because it is easy to implement and can ensure data security the fastest.

So how to use the public cloud to implement off-site backup?

The ideal and reality of off-site backup

Before implementing "off-site backup", "local backup" is usually done first, that is, backing up to the same data center for easy recovery. The storage solutions for local backups usually include the following:

1. Self-built distributed file system;

  • Advantages : Most of them use HDFS. It is the default storage solution in the Hadoop ecosystem, with a three-copy redundancy strategy, rack-aware features, and support for big data analysis.
  • Disadvantages : HDFS needs to maintain a highly available Name Node cluster by itself, and capacity planning and expansion work will also consume the resources of the operation and maintenance team. If your HDFS cluster is also responsible for business computing and data backup requirements, the garbage collection mechanism of the JVM-based Name Node will cause the storage system to freeze under heavy workloads. It is very convenient to calculate data in HDFS, but it is complicated to recover data. It is necessary to copy the corresponding data locally through HDFS CLI, which is a nightmare for operation and maintenance engineers.

2. Multi-machine mutual preparation in the own machine room:

  • Advantages : The backup is on the local file system, a full set of Linux tools can be used, and file backup and recovery are very convenient. In addition, it can make full use of local disk space, which greatly saves costs.
  • Disadvantages : After there are more machines, all backups are not together, which is a troublesome thing for management and recovery. In addition, data security depends on the RAID solution. Once the RAID card is damaged, there is a risk of data loss.

3. Cloud hard drives and NAS on the public cloud:

  • Advantages : This type of storage solution is usually accessed based on the NFS protocol. If a machine needs to restore data, it can be directly mounted (required in a VPC), eliminating the copy process.
  • Disadvantages : Most of them have single-point problems. In addition, many cloud hard disks have a small upper limit. If you have a large amount of data, you need to create new disks frequently, which is very troublesome to manage.

4. Object storage on public cloud:

  • Advantages : elastic expansion of storage capacity, pay-as-you-go, low price, safe and reliable data.
  • Disadvantages : Access needs to be done through a dedicated SDK or API, there is no real directory structure, and renaming is not supported. Many systems do not support direct access to object storage, and data recovery needs to be downloaded to the local area first. When the amount of data is large, emergency data recovery time will be delayed. In addition, object storage lacks various consistency guarantees, which will bring unpredictable troubles.

5. Mounting local disks on public cloud VMs ( strongly not recommended ): Local disks on virtual hosts do not guarantee data security, and data may be lost when the VM is restarted or migrated. It is usually used to store temporary data and is strongly not recommended. Local disk for backup.

In general, these five "local backup" schemes have their own advantages and disadvantages. When considering "off-site backup" based on "local backup", scheme 3 and scheme 4 are slightly better, but when implementing "off-site backup" Also their own problems. In solution 3, whether using public cloud NFS storage or self-built NFS based on cloud hard disk, because the protocol does not support transmission encryption, it is very insecure to directly mount across the public network, and needs to be paired with a VPN or other gateway to solve the problem. Option 4 requires additional learning of the selected public cloud API and SDK. If you want to change the cloud platform, you have to learn the API and SDK again.

When designing an off-site backup solution, it is also necessary to consider the transmission problem caused when the backup storage location is not in the same high-speed intranet. The transmission will be slow and unstable, and it is easy to be eavesdropped. If the storage system does not support direct access from the public network (such as HDFS and NAS, etc.), a dedicated line or VPN needs to be designed to connect.

We have communicated with many teams on this issue, and only a few teams have implemented off- site backup . Most of their implementation methods are to set a periodic task and use rsync to asynchronously copy the full amount of locally backed up data to another POSIX-compliant storage system.

This method is very easy to implement, but it has certain requirements on the storage system:

  1. Compatible with POSIX, there is no additional learning cost, which is convenient for data recovery in emergency situations;
  2. Simple configuration and maintenance. 99% of the time at work, we do not need to deal with the backup system, so the ideal situation is that it is very stable and reliable without maintenance;
  3. Applicable to multiple public clouds/regions, not bound by a cloud . More choices can be made according to the specific needs of the business;
  4. The most important thing is stability, reliability and security, and the price is cheaper, of course;

When we designed JuiceFS, we fully considered the above points, hoping to provide better options for off-site backup. In short, it can be mounted on a virtual machine like a cloud hard disk, and at the same time has the elastic expansion of object storage and a cheap price. Convenient and practical, flexible and low threshold, the price is quite competitive compared with other similar solutions (the price of a single cloud hard disk can get better service than NAS) .

How to use JuiceFS to do remote backup?

JuiceFS is a distributed POSIX file system designed for the public cloud. It saves data in your own public cloud object storage and turns it into a POSIX-compliant distributed file system through a strong, consistent and high-performance metadata service maintained by us. File system.

Whether your main business is in a self-built computer room or a public cloud, you can use JuiceFS for off-site backup. Refer to the JuiceFS User Guide to mount JuiceFS to your host room or the public cloud host you are using, and then use tools such as rsync to directly write the backup data. The transmission of JuiceFS is encrypted, and large files are automatically transmitted in parallel in blocks, so you can get good performance and experience even if you back up through an unreliable public network.

If you use JuiceFS to directly store data or do local backup, it also has a more powerful function that allows you to easily complete off-site backup: Replication, which will automatically asynchronously replicate the written data to another designated object storage Medium (can be any public cloud and service area). Assuming that your main business is in AWS Beijing, and data needs to be backed up to UCloud Guangzhou, you only need to create a file system in AWS Beijing, and then enable the replication function (and select UCloud Guangzhou), all data written to AWS S3 (including before the copy function is enabled) will be automatically copied to UFile in UCloud Guangzhou. When you need to perform data recovery in UCloud Guangzhou, it will read data directly from UFile, which is fast and does not require data charges.

Another big killer of the enterprise version of JuiceFS is global data mirroring , which can help you achieve near real-time data mirroring (read-only) over long distances, such as mirroring from the United States to China, or vice versa. Compared with the data replication function mentioned above, data mirroring also recommends read-only mirroring of metadata to ensure good performance of mirrored data in ultra-long distances. In our tests, we found that when syncing from AWS East US region to Tencent Cloud Shanghai region, there is only a second-level delay in metadata, and most data is synchronized within 30 seconds. It will also actively repair synchronization to ensure correct and consistent access to data (it will be slightly slower when accessing). At present, the data mirroring function is only open to enterprise customers. Interested students can contact us for more technical details.

We believe that JuiceFS is the best data backup solution you can find on the market today, no one, because it:

  1. Support the automatic replication of nearly 100 service intervals of 13 public cloud platforms around the world, and the support platforms are still increasing;
  2. Guaranteed data consistency and 99.95% high availability;
  3. Transmission encryption, enterprise version supports storage encryption;
  4. Compatible with POSIX, JuiceFS can be mounted on the VM through FUSE, and the experience is the same as that of the local disk;
  5. Serverless, completely maintained by us and the public cloud, customers do not need to maintain;
  6. Provide rich monitoring data, which can be integrated into the customer's own monitoring system;
  7. The Replication function can help you achieve easy cross-cloud and cross-service backup or migration;
  8. Recycle bin mechanism, effectively prevent accidental deletion, you can set the storage time of the recycle bin by yourself;
  9. Save a lot of money. Taking into account labor and time costs, it can save 50% to 80% compared to other solutions.

By the way, if you are a personal user of Linux and Mac, JuiceFS can be directly mounted on your own computer. The above method is also suitable for backing up your personal data. We also provide 1TiB permanent free capacity (object storage may not be free).

If it is helpful, please follow our project Juicedata/JuiceFS ! (0ᴗ0✿)

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324162564&siteId=291194637