rsync remote synchronization for distributed applications

1. Knowledge about rsync 
1.1 Introduction
to rsync rsync (Remote Sync, remote synchronization) is an open source fast backup tool, which can mirror and synchronize the entire directory tree between different hosts, supports incremental backup, and maintains links and permissions, and adopts The optimized synchronization algorithm performs compression before transmission, so it is very suitable for remote backup, mirror server and other applications.

The URL of rsync's official site is rsync.samba.org/

1.2 rsync features

Support copying special files, such as connection files, devices, etc.

It can have the function of excluding the synchronization of specified files or directories, which is equivalent to the exclusion function of the packaging command tar.

It can be done to keep the permissions, time, soft and hard links, owner, group and other attributes of the original file or directory unchanged – p.

Incremental synchronization can be realized, that is, only the changed data is synchronized, so the data transmission efficiency is very high (tar-N).

You can use rcp, rsh, ssh, etc. to transfer files (rsync itself does not encrypt data).

Files and data (server and client) can be transferred through socket (process mode).

It supports anonymous live authentication (no system user required) process mode transmission, which can realize convenient and safe data backup and mirroring.
 

 1.3 rsync sync source server

In a remote synchronization task, the client responsible for initiating the rsync synchronization operation is called the initiator, and the server responsible for responding to the rsync synchronization operation from the client is called the synchronization source.

In downlink synchronization (download), the synchronization source is responsible for providing the original location of the document, and the originator should have read access to the location.
In an upstream sync (upload), the sync source is responsible for providing the target location of the document, to which the initiator should have write access.
 

1.4 Differences between scp and rsync
(1) Function gap 
rsync remote copy can be attached with soft link/hard link. (parameter -l keeps soft links, -H keeps hard links) 

    scp does not support copying of links.

(2) Efficiency difference 
Simple analysis of scp and rsync, the former is copying, and the latter is synchronization.

    When rsync and scp do not exist in the folder, the execution time is not much different, but the difference is very large when the folder exists. The reason is that scp is a copy: if the destination file does not exist, it will be created, and if it exists, it will be overwritten. And rsync is a synchronization, compare whether the files on both sides are the same, if they are the same, do nothing, if there is a difference, update it directly.

    It will be faster to use rsync when it plays the role of synchronization, and both can be used when it plays the role of copying (there is no file at the destination). Choose rsync or scp depending on the situation.
 

 2. The rsync tool uses
the basic format:

  rsync [options] origin destination destination

Option Function
-r Recursive mode, including all files in the directory and subdirectories.
-l For symlink files still copy as symlink files.
-v Display detailed (verbose) information about the synchronization process.
-z Compress when transferring files (compress).
-a archive mode, retain file permissions, attributes and other information, which is equivalent to the combination option "-rlptgop".
-p Preserves permission flags for files.
-t Preserve time stamps of files.
-g Preserve the file's group flag (only for superusers).
-o Preserve the ownership of the file (superuser only).
-H Keep hardlink files.
-A Keep ACL attribute information.
-D Preserve device files and other special files.
--delete Delete files that exist in the target location but not in the original location.
--checksum Use checksum (instead of file size, modification time) to decide whether to skip files.
3. Configure rsync timing downlink synchronization
Source server: 192.168.50.25

Client (initiator): 192.168.50.26

(1) Configure the source server 
systemctl stop firewalld
 setenforce 0
 ​rpm
 -q rsync #The general system has installed rsync by default
 ​#Create
 /etc/rsyncd.conf configuration file
 vim /etc/rsyncd.conf #Add the following configuration items
 uid = root
 gid = root
 use chroot = yes #Imprisoned in source directory
 address = 192.168.50.25 #Listening address
 port = 873 #Listening port tcp/udp 873, you can view
 log file = /var/log/rsyncd through cat /etc/services | grep rsync .log #Log file location
 pid file = /var/run/rsyncd.pid #File location where the process ID is stored
 hosts allow = 192.168.50.0/24 #The address of the client that is allowed to access. Multiple addresses are separated by spaces
 dont compress = *.gz *.bz2 *.tgz *.zip *.rar *.z #File types that are no longer compressed during synchronization
 ​[
 wwwroot] #Shared module name
 path = /var/www/html #The actual path of the source directory
 comment = Document Root of www.yang.com #Note
 read only = yes #Whether it is read-only. yes means that the client can only read the contents of the directory, but not write.
Only downlink is allowed, no uplink is allowed.
 auth users = backuper #Authorized accounts, multiple accounts are separated by spaces. Authorized users, usernames that are allowed to read.
 secrets file = /etc/rsyncd_users.db
 #Data file for storing authorized account information
 #If anonymous mode is used, just remove the "auth users" and "secrets file" configuration items.
 ​#Create
 a data file for the backup account.
 vim /etc/rsyncd_users.db
 backuper:abc123 #There is no need to create a system user with the same name. backuper is the user name, and abc123 is the password.
 ​chmod
 600 /etc/rsyncd_users.db
 
mkdir -p /var/www/html
 ​ #Ensure that all users have read access to the source directory /var/www/html
 chmod +r /var/www/html/
 ls -ld /var/www/html/
 #Start the rsync service program
 rsync --daemon #Start rsync service, run
 netstat -anpt | grep rsync
 ​#Close
 rsync service
 kill $(cat /var/run/rsyncd.pid) 
 rm -rf /var/ run/rsyncd.pid
 

(2) Initiator Configuration 
 #Download the specified resources to the local /opt directory for backup. Password abc123
 Format
 1: #username@host address::shared module name
 rsync -avz [email protected]::wwwroot /opt/ #wwwroot is the shared module name, password abc123
 #backuper refers to when I am synchronizing Which user identity to use
 #wwwroot represents the module, the default path of synchronization and some features will be written under the module, so we only need to write the module
 #/opt/ refers to the directory that is synchronized to the local
 format
 2: # rsync:/username@host address/shared module name
 rsync -avz rsync://[email protected]/wwwroot /opt  /
 


 #Interaction-free format configuration:
 echo "abc123" > /etc/server.pass
 ​chmod
 600 /etc/server.pass #The password file permission must be 600, that is, except for the owner, no one else has viewing permission.
 ​rsync
 -avz --password-file=/etc/server.pass [email protected]::wwwroot /opt/   
  #Secret-free synchronization  #
   Timing synchronization  crontab -e  30 22 * ​​* * /usr/bin/rsync -az --delete --password-file=/etc/server.pass [email protected]::wwwroot /opt/  #In order not to enter a password during synchronization, a password file needs to be created to save the password of the backuper user, Such as /etc/server. pass. Use the option "--password-file=/etc/server.pass" to specify when performing rsync synchronization.  ​systemctl  restart crond  systemctl enable crond








 4. Insufficient regular synchronization of  rsync real-time synchronization (uplink synchronization)
The backup time is fixed, the delay is obvious, and the real-time performance is poor.
When the synchronization source does not change for a long time, intensive periodic tasks are unnecessary.
 The advantage of real-time synchronization is
that once the synchronization source changes, Start the backup immediately
As long as there is no change in the synchronization source,
 the inotify mechanism for backing up the Linux kernel will not be executed.
It is available from version 2.6.13.
It can monitor changes in the file system and respond to notifications.
Auxiliary software: inotify-tools

 The initiator configures Inotify
to use the inotify notification interface, which can be used to monitor various changes in the file system, such as file access, deletion, movement, and modification. Using this mechanism, it is very convenient to realize file change alarms, incremental backups, and respond to changes in directories or files in a timely manner.
Combining the inotify mechanism with the rsync tool can realize triggered backup (real-time synchronization), that is, as long as the document in the original location changes, the incremental backup operation will be started immediately; otherwise, it will be in a silent waiting state.
Because the inotify notification mechanism is provided by the Linux kernel, it is mainly used for local monitoring, and it is more suitable for upstream synchronization when applied in triggered backup
 

Specific operation 
backup source server configuration
Modify rsync source server configuration file (backup source)

vim /etc/rsyncd.conf
......
read only = no #Close read-only, uplink synchronization needs to be writable

kill $(cat /var/run/rsyncd.pid)
rm -rf /var/run/rsyncd.pid
rsync --daemon    
netstat -anpt | grep rsync

Initiate segment configuration
Adjust inotify kernel parameters 
In the Linux kernel, the default inotify mechanism provides three control parameters:

max_queue_events (monitoring event queue, the default value is 16384),
max_user_instances (the maximum number of monitoring instances, the default value is 128),
max_user_watches (the maximum number of monitoring files per instance, the default value is 8192).
When the number of directories and files to be monitored is large or changes frequently, it is recommended to increase the value of these three parameters.
 

 Initiator, install inotify-tools
to use the inotify mechanism and also need to install inotify-tools to provide inotifywait and inotifywatch auxiliary tool programs to monitor and summarize changes.

inotifywait: It can monitor various events such as modify (modify), create (create), move (move), delete (delete), attrib (attrib (attribute change), etc., and output the result immediately when there is a change.
inotifywatch: It can be used to collect file system changes and output the summary changes after the operation ends.
 

Need to install dependency packages: yum install -y gcc gcc-c++ make

cd /opt
 
tar zxvf inotify-tools-3.14.tar.gz -C /opt/
 ​cd
/opt/inotify-tools-3.14
./configure
make && make install
 ​#
 You can execute the "inotifywait" command first, and then open it again A new terminal adds files to the /data directory, moves
files, and follows the screen output in the old terminal.
 inotifywait -mrq -e modify,create,move,delete /data
 ​#Option
 "-e": Used to specify which events to monitor
 #Option "-m": Indicates continuous monitoring
 #Option "-r": Indicates recursive entire directory
 #Option "-q": Simplify the output information

At the initiator, write a trigger synchronization script. 
Note that the script name cannot contain the rsync string, otherwise the script may not take effect

vim /opt/inotify.sh 
 #!/bin/bash
 ​#Define
 variables for inotifywait to monitor file events in the /data directory. attrib indicates attribute changes.
 INOTIFY_CMD="inotifywait -mrq -e modify,create,attrib,move,delete /data"
 ​#Define
 the variables for performing rysnc uplink synchronization. --delete ensures that the contents of the directories on both sides are consistent, and can be omitted.
 RSYNC_CMD="rsynC -azH --delete --password-file=/etc/server .pass /data [email protected]::wwwroot/"
 ​#Use
 while and read to continuously obtain monitoring results, and further judgment can be made based on the results Whether to read the output monitoring record
 $INOTIFY_CMD | while read DIRECTORY EVENT FILE 
 do
    #If rsync is not executing, start immediately
    if[ $(pgrep rsync | wc -l) -le 0 ];then
         $RSYNC_CMD
    fi
 done
 ​chmod
 +x
 /opt/inotify.sh
 chmod +x /etc/rc.d/rc.local #Boot self-starting script file
 echo '/opt/inotify.sh' >> /etc/rc.d/rc.local #Add automatic execution at startup
 #
 Run the script after (Running in the background)
 cd /opt/
 ./inotify.sh &
 #
 ​Create a file on the initiator and check whether there is a new one in the source server

If the file to be synchronized is relatively large, and the synchronization is relatively slow, resulting in the failure of subsequent files and synchronization, you need to add a message queue or buffer in the script: 

 #!/bin/bash
 #Define the variable
 INOTIEY_CMD="inotifywait -mrq -e modify,create,attrib,move,delete /data/" for file events in the inotifywait monitoring directory
 #Define the variable for performing rsync upstream synchronization
 RSYNC_CMD="rsync - azH --delete --password-file=/etc/server.pass /data/ [email protected]::wwwroot/"
 #Use while and read to continuously obtain monitoring results, and further judge whether the output is read according to the results Monitoring record
 $INOTIEY_CMD | while read DIRECTORY EVENT FILE
 do    
       #If it is less than or equal to 0, wait for it to finish executing before synchronizing other files
       until [ $(pgrep rsync | wc -l) -le 0 ] 
      
       do
          sleep 1
       done
       $RSYNC_CMD
 done
 

 Verify the synchronization effect 
The above script is used to detect changes in the /data directory of the local machine. Once there is an update, the rsync synchronization operation will be triggered and uploaded to the wwwroot shared directory of the server 192.168.50.26.

5. Use rsync to quickly delete a large number of files.
If you want to delete a large number of files under linux, such as 1 million or 10 million, like the nginx cache of /usr/local/nginx/proxy_temp, etc., then rm -rf * may not be easy to use. Because it takes a long time to wait.

In this case we can use rsync to handle it neatly.

rsync actually uses the replacement principle.
 

 #Create an empty folder first:
 mkdir /home/blank
 ​#Delete
 the target directory with rsync:
 rsync --delete-before -a -H -v --progress --stats /home/blank/ /usr/local/ nginx/proxy_temp
 
 #The target directory will be emptied soon

Option Description

Option Function
--delete-before the receiver deletes during transmission
-a archive mode, means to transfer files recursively, and keep all file attributes
-H keep hard-linked files
-v verbose output mode
--progress is displayed during transmission TRANSFER PROCESS
--stats gives the transfer status of some files

Guess you like

Origin blog.csdn.net/zl965230/article/details/130803511