Backup Linux system artifact: rsync

Explore the role of rsync in backup solutions. Backups are undoubtedly an important part of a system administrator's job. When there is no full backup or a well-planned backup and implementation, sooner or later, important data may be irretrievably lost.

All companies, big or small, run on data. Considering the economic and business loss caused by losing business data, no company, from the smallest individual company to the largest multinational enterprise, can survive the loss of most of its data. Your office can be rebuilt with insurance reimbursement, but your data is impossible to recover.

The loss mentioned here refers to the complete damage of data. Instead of data being stolen, that's a different kind of disaster. What I'm talking about here is that the data is completely destroyed.

Even if you're an individual user and not a business, it's very important to back up your own data, I have two decades of personal financial data and data from my now-closed business, as well as a ton of electronic invoices. It also includes a large number of different types of documents, reports and data reports that I have authored in recent years. I don't want to lose any of this data.

So backup is a necessary guarantee for the long-term security of my data.

Backup Software Selection

There are many softwares that can perform backups. Most  Linux  distributions offer at least one open source backup software. There are also many commercial backup software, but none of these fit my needs, so I decided to use basic Linux tools for backup.

In my article for the Open Source Yearbook, "Best Partners 2015: tar and ssh," I showed that expensive commercial backup software is not necessary in designing and implementing a viable backup plan.

Since last year I've tried another option, the rsync  command , which has many interesting features that I've already benefited from. My main requirement is that the created backup, the user can locate and restore the files without unpacking the backup archive, so as to save the time of creating the backup.

The purpose of this post is just to illustrate the role of rsync in my backup scheme. This is not an overview of rsync's full capabilities or its various use cases.

rsync  command

Andrew Tridgell and Paul Mackerras wrote rsync , first released in 1996. Its goal is to sync files to another computer. Did you notice why they took the name (remotely synchronize)? It is open source software available in most distributions.

rsync can be used to synchronize two directories or directory trees, whether they are on the same computer or different computers, and not only that, but it can do more. The directory it creates or updates is exactly the same as the source directory. The new directory is not stored in a package such as tar or zip, but ordinary directories and files, which can be easily accessed by common Linux tools, which is exactly what I need.

One of the most important features of rsync is the way it handles existing files in the source directory being modified. It uses block verification to compare source and destination files, rather than copying the entire file from source to past. If the checksums of all blocks of the two files are the same, then no data is transferred. Otherwise only changed blocks are transmitted. This saves a lot of time and bandwidth consumed by remote synchronization. For example, the first time I used the rsync  script to back up all my hosts to a large external usb hard drive, it took three hours because all the data needed to be transferred. Subsequent backups can take as little as 3 to 8 minutes, depending on how many files have been created and changed since the last backup. I use the time command to log the actual time spent. Last night, it took me just three minutes to back up about 750 Gb of data from six remote systems and a local workstation. Only a few hundred Mb of data that changes during the day needs to be backed up.

The following command can be used to synchronize the contents of two directories and any subdirectories. That is, after the contents of the new directory and the source directory are synchronized, their contents are exactly the same.

rsync -aH sourcedir targetdir

The -a option indicates archive mode, which preserves permissions, ownership, and symbolic (soft) links. The -H option is used to keep hard links. Note that both the source and target directories can be on the remote host.

Suppose we synced two directories using rsync yesterday. Today we wanted to sync again, but we deleted some files from the source directory. By default, rsync only copies new and changed files to the new directory, and does not change the files that we deleted in the new directory, but if you want those files that were deleted in the source directory to be deleted in the new directory , then you can add the --delete option to delete.

Another interesting option, and my personal favorite, is --link-dest, because it greatly increases the power and flexibility of rsync. --link-dest Makes daily backups that take little extra space and take very little time.

Use this option to specify the previous day's backup directory and today's backup directory, then rsync will create today's new backup directory, and create a hard link for each file in yesterday's backup directory in today's backup directory. Now we have a bunch of hard links pointing to yesterday's backup in today's backup directory. The files are not created repeatedly, but some hard links are created. For hard links, there is a very detailed description in Wikipedia. And after creating today's backup with a hard link to yesterday's backup directory file, rsync does the backup as usual, and if a change is detected in the file, it doesn't make a hard link, but makes one from yesterday's backup directory Copy the file, and then copy the changed part of the source file. (LCTT Annotation: It seems that the original text is unclear here, see the try_dests_reg function of generator.c to select copy or hard link according to match_level first, instead of creating a hard link and then judging match_level)

Now our command looks like this.

rsync -aH --delete --link-dest=yesterdaystargetdir sourcedir todaystargetdir

You may also want to exclude some directories or files that you don't want to back up. Then you can use the --exclude option. Use this option to add the pattern of files or directories you want to exclude. You can exclude your browser's cache with the new command below.

rsync -aH --delete --exclude Cache --link-dest=yesterdaystargetdir sourcedir todaystargetdir

Note: Each pattern of files you want to exclude needs to be preceded by the --exclude option separately.

rsync can synchronize remote hosts, either as a source or as a target. To give another example, suppose we want to synchronize the directory of the remote host named remote1 to the local. Since ssh is the default protocol for exchanging data with remote hosts, I've been using the ssh option. Now the command looks like this.

rsync -aH -e ssh --delete --exclude Cache --link-dest=yesterdaystargetdir remote1:sourcedir todaystargetdir

This is the final version of my rsync backup command.

You can rely on rsync's large number of options to customize your synchronization process. For the most part, the simple commands I just described suffice for my personal needs. You can read rsync's extensive documentation to learn about its other capabilities.

deployment backup

My backups run automatically because - "everything can be automated". I wrote a BASH  script using rsync to create daily backups. This includes making sure the backup media is mounted, generating the name of the daily backup directory, creating the proper directory structure on the backup media, and finally performing the actual backup before unmounting the backup media.

I run a script with cron every morning to make sure I never forget to back up.

My script rsbu and configuration file rsbu.conf are available on GitHub - opensourceway/rsync-backup-script: A script to accompany https://opensource.com/article/17/1/rsync-backup-linux  .

recovery test

No backup plan is complete without testing. You can test restore a file or an entire directory to make sure the backup is working and can be used to recover from total data loss. I've seen too many backups fail for various reasons, and valuable data is lost due to lack of testing to ignore issues.

Choose a file to restore to a test directory like /tmp so you don't overwrite any files that were updated after the backup. Verify that the contents of the file are what you expect. Restoring files backed up with rsync is as simple as finding your backup file and copying it to where you want to restore it.

I have had to restore my individual files and occasionally entire directories a few times. Most of them accidentally deleted files or directories by themselves. A couple of times it was because of a hard drive crash. These backups will come in handy sooner or later.

last step

But just creating backups won't save your business, you need to create backups on a regular basis so that the most recent backup is stored on another remote machine, in another building or miles away if possible. This ensures that a large-scale disaster doesn't destroy all your backups.

A reasonable option for small businesses is to do daily backups on removable media, take the most recent backup home at night, and bring the older backup to the office the next morning. You'll then have several copies in rotation. It's even possible to take the latest backup to the bank and put it in your safe deposit box, then bring back the previous backup.

Guess you like

Origin blog.csdn.net/yaxuan88521/article/details/130962339