Use JuiceFS Sync Commands to Migrate and Synchronize Data Across Clouds

In recent years, cloud computing has become the mainstream. For the sake of self-interest, or unwilling to be locked by a single cloud service provider, or because of business and data redundancy, or for cost optimization, enterprises will try to transfer some or all of their business from Offline computer rooms are migrated to the cloud or migrated from one cloud platform to another. Service migration involves data migration. It just so happens that JuiceFS has been connected to various object storage APIs and has also implemented the logic of data synchronization. Let's learn about the sync command of JuiceFS.

What is JuiceFS Sync

JuiceFS's sync subcommand is a full-featured data synchronization utility that can synchronize or migrate data concurrently with multiple threads between all JuiceFS-supported object stores. It supports both data migration between "object storage" and "JuiceFS", and Migrate data across clouds and regions between Object Storage and Object Storage. Similar to rsync, in addition to object storage, it also supports synchronizing local directories, accessing remote directories through SSH, HDFS, WebDAV, etc., and provides advanced functions such as full synchronization, incremental synchronization, and conditional pattern matching.

Basic usage

command format

juicefs sync [command options] SRC DST

That is, if you SRCsynchronize to DST, you can synchronize both directories and files.

in:

  • SRCRepresents the data source address and path
  • DSTRepresents the destination address and path
  • [command options]Represents optional synchronization options, see the command reference for details .

The address format is[NAME://][ACCESS_KEY:SECRET_KEY@]BUCKET[.ENDPOINT][/PREFIX]

in:

  • NAMEis the storage type, eg s3, oss. See all supported storage services for details
  • ACCESS_KEYand SECRET_KEYare the API access keys for Object Storage
  • BUCKET[.ENDPOINT]is the access address of the object storage
  • PREFIXis optional and qualifies the directory name prefix to be synchronized.

The following is an example address for Amazon S3 object storage:

s3://ABCDEFG:[email protected]

In particular, and are treated as directories if theySRC end with , for example: . If it does not end with , it will be regarded as a "prefix" and will be matched according to the rules of prefix matching. For example, there are two directories in the current directory and , which can be synchronized to the target path by using the following command :DST/movies//testtext~/mnt/

juicefs sync ./te ~/mnt/te

In this way, the synccommand tewill match all directories or files in the current path containing the prefix with the prefix, i.e. testand text. And in the target path ~/mnt/teis tealso a prefix, it will replace the prefix of all synchronized directories and files, in this example, tereplace with te, that is, keep the prefix unchanged. If you adjust the prefix of the target path, for example change the target prefix to ab:

juicefs sync ./te ~/mnt/ab

The testdirectory will become abst, textwill become abxt.

List of resources

This assumes the following storage resources:

  1. Object Storage A <span id="bucketA" />

    • Bucket name: aaa
    • Endpoint:https://aaa.s3.us-west-1.amazonaws.com
  2. Object Storage B <span id="bucketB" />

    • Bucket name: bbb
    • Endpoint:https://bbb.oss-cn-hangzhou.aliyuncs.com
  3. JuiceFS file system <span id="bucketC" />

    • Metadata storage:redis://10.10.0.8:6379/1
    • Object storage:https://ccc-125000.cos.ap-beijing.myqcloud.com

All stored access keys are:

  • ACCESS_KEYABCDEFG
  • SECRET_KEYHIJKLMN

Sync between Object Storage and JuiceFS

Sync object store A's moviesdirectory to the JuiceFS file system:

# 挂载 JuiceFS
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
# 执行同步
juicefs sync s3://ABCDEFG:[email protected]/movies/ /mnt/jfs/movies/

Sync the imagesdirectory to Object Storage A:

# 挂载 JuiceFS
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
# 执行同步
juicefs sync /mnt/jfs/images/ s3://ABCDEFG:[email protected]/images/

Sync between object storage and object storage

Synchronize all data of object store A to object store B:

juicefs sync s3://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]

Advanced usage

Incremental synchronization and full synchronization

The sync command works by default in incremental synchronization, that is, first compares the differences between the source and target paths, and then synchronizes only the differences. The file can be updated using the --updateor -uoption mtime.

For a full sync, i.e. to resync regardless of whether the same file exists on the target path, use --force-updateor -f. For example, to fully synchronize the moviesdirectory to the JuiceFS file system:

# 挂载 JuiceFS
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
# 执行全量同步
juicefs sync --force-update s3://ABCDEFG:[email protected]/movies/ /mnt/jfs/movies/

pattern matching

syncThe pattern matching function of the command is similar to that of rsync. It can exclude or include certain types of files through rules, and achieve synchronization of any set through the combination of multiple rules. The rules are as follows:

  • /A pattern ending in will match only directories, otherwise it will match files, links or devices;
  • When it contains *, ?or [characters , it will be matched by wildcard pattern, otherwise it will be matched by regular string;
  • *matches any non-empty path component, stopping /at ;
  • ?matches any character /except ;
  • [matches a set of characters, such as [a-z]or [[:alpha:]];
  • In wildcard mode, backslashes can be used to escape wildcards, but in the absence of wildcards, it will be matched literally;
  • Always match recursively with a pattern as a prefix.

Exclude files/directories

Use the --excludeoption to set directories or files to exclude. For example, to fully sync JuiceFS filesystem to object store A, but not sync hidden files and folders:

All names .starting

# 挂载 JuiceFS
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
# 完整同步,排除隐藏文件和目录
juicefs sync --exclude '.*' /mnt/jfs/ s3://ABCDEFG:[email protected]/

This option can be repeated to match more rules, for example, to exclude all hidden files, pic/directories and 4.pngfiles :

juicefs sync --exclude '.*' --exclude 'pic/' --exclude '4.png' /mnt/jfs/ s3://ABCDEFG:[email protected]

Include files/directories

Use the --includeoption to set directories or files to be included (not excluded), e.g. to sync only the pic/and 4.pngtwo files and exclude the others:

juicefs sync --include 'pic/' --include '4.png' --exclude '*' /mnt/jfs/ s3://ABCDEFG:[email protected]

When using include/exclude rules, options that are placed first take precedence. --includeIt should be in the front. If all files are --exclude '*'excluded , then the following --include 'pic/' --include '4.png'include rules will not take effect.

Multithreading and bandwidth throttling

JuiceFS syncenables 10 threads to perform synchronization tasks by default, and you can set --threadoptions .

In addition, if you need to limit the bandwidth occupied by synchronization tasks, you can set the --bwlimitoption , unit Mbps, and the default value 0is no limit.

Directory structure and file permissions

By default, the sync command only synchronizes file objects and directories containing file objects. Empty directories are not synchronized. To sync empty directories, you can use the --dirsoption .

Also, if you want to maintain file permissions when synchronizing between file systems like local, sftp, hdfs, etc., you can use the --permsoption .

copy symbolic link

syncWhen synchronizing between local directories, JuiceFS supports setting the --linksoption enable the function of synchronizing itself instead of the object it points to when encountering a symbolic link. The path pointed to by the synchronized symbolic link is the original path stored in the source symbolic link, regardless of whether the path is reachable before or after synchronization, it will not be converted.

A few other details to note

  1. The symlink's own mtimewill not be copied;
  2. --check-newThe behavior of the and --permsoptions is ignored when a symbolic link is encountered.

Multi-machine concurrent synchronization

In essence, synchronizing data between two object stores is to pull data from one end and push it to the other end. As shown in the figure below, the efficiency of synchronization depends on the bandwidth between the client and the cloud.

When synchronizing a large amount of data, the bandwidth of a single machine is often occupied and a bottleneck occurs. In response to this situation, JuiceFS Sync provides multi-machine concurrent synchronization support, as shown in the following figure.

The Manager executes synccommands , and --workerdefines multiple Worker hosts through the parameter. JuiceFS will dynamically split the synchronization workload according to the total number of Workers and distribute them to each host for simultaneous execution. That is, the amount of synchronization tasks originally processed on one host is divided into multiple copies and distributed to multiple hosts for simultaneous processing. The amount of data that can be processed per unit time is larger, and the total bandwidth is doubled.

When configuring multi-machine concurrent synchronization tasks, you need to configure the SSH password-free login from the Manager host to the Worker host in advance to ensure that clients and tasks can be successfully distributed to Workers.

The Manager will distribute the JuiceFS client program to the Worker hosts. To avoid client compatibility issues, please ensure that the Manager and Worker use the same type and architecture of operating systems.

For example, to synchronize object store A to object store B, using multi-master parallel synchronization:

juicefs sync --worker [email protected],[email protected] s3://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]

The current host and the two Worker hosts [email protected]and [email protected]will share the data synchronization task between the two object stores.

If the SSH service of the Worker host is not the default port number 22, please set the SSH service port number of the Worker host through the .ssh/configconfiguration .

Scenario application

Data offsite disaster recovery backup

The off-site disaster recovery backup is aimed at the files themselves, so the files stored in JuiceFS should be synchronized to other object storages. For example, the files in the JuiceFS file system should be synchronized to object storage A:

# 挂载 JuiceFS
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
# 执行同步
sudo juicefs sync /mnt/jfs/ s3://ABCDEFG:[email protected]/

After synchronization, all files can be seen directly in Object Storage A.

Create a JuiceFS data copy

Different from the disaster recovery backup for the file itself, the purpose of establishing a JuiceFS data copy is to create a mirror with the same content and structure for the JuiceFS data storage. When the object storage in use fails, you can switch to the data copy by modifying the configuration. continue working. It should be noted that only the data of the JuiceFS file system is copied here, and the metadata is not copied. The data backup of the metadata engine is still required.

This requires directly manipulating the underlying object store of JucieFS and synchronizing it with the target object store. For example, to use object store B as a data copy for the JuiceFS file system:

juicefs sync cos://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]

After synchronization, what you see in Object Store B is exactly the same content and structure as the Object Store used by JuiceFS.

If it is helpful, please follow our project Juicedata/JuiceFS ! (0ᴗ0✿)

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5389802/blog/5514088