Nautilus new features: CephFS improvements

In Ceph Nautilus CephFS version of the file system did improve. As with other parts of CEPH, we have been putting in a lot of developer time to improve the availability and stability. The following sections will detail work

 

1. MDS Stability

In the last two editions, MDS stability has been the main focus of developers. We recently discovered a problem is a very large cache (64 + GB) of MDS will hang during certain recovery event. For example, when trying to reduce its MDS cache size to comply with the time "mds_cache_memory_limit", we will ask the client to release some capabilities, MDS calculate the required ratio capabilities to be deleted, and requires each client to do so. However, large-scale MDS may have tens of millions capabilities outstanding, a single long-running clients may have millions. View all the time release of information may cause the actual MDS can not work, and even lead to monitor the use Standby MDS replace the MDS.

We solve this problem from two directions:

1. MDS now restricts calls from customers passenger-side capabilities of speed, so that a sudden reduction in cache size (due to configuration change) or large client-side caching does not make MDS unstable. This behavior is by these new configuration variables to control: mds_recall_max_caps, mds_recall_max_decay_rate, mds_recall_max_decay_threshold, mds_recall_global_max_decay_threshold , mds_recall_warning_threshold, mds_recall_warning_decay_rate. We have carefully selected the default values; we do not want administrators need to change them. A description of each option are recorded its role.

2. MDS now limit a single client can have the capabilities quantity. This is done by a new mds_max_caps_per_client configuration variables to control. MDS will exceed this limit client The Capabilities , to return to below the limit. Version of the client does not matter.

There are some ongoing work to solve this problem by having the customer's own voluntary release function no longer in use.

CephFS users when using very large cache operations, MDS should expect to see running in a more predictable way. In addition, MDS will no longer tolerate the upper limit have millions of clients, these limits when major changes occur during recovery event or MDS cache size can cause their own problems.

2. NFS Ganesha Gateway cluster

Ganesha is a user space NFS server, export solutions for the local file system and other storage systems. CephFS file system is one in which the file system type Ganesha support export. Will inevitably lead to NFS clients want to open Ceph file system, which may be needed for a number of reasons (including the storage cluster isolation, security, and legacy applications).

CEPH Nautilus version of the NFS Ganesha become an important component of the cluster, start to finish by the Ceph management life cycle. Like together, Ceph create Nfs-Ganesha cluster daemon and Rook operator and kubernetes, to export the Ceph file system.

Each daemon Ganesha cluster are configured as active-active way to export the same file system. This means that multiple NFS servers can load balance client requests to the same CEHFS file system. Prior to Nautilus, which is an unsafe configuration (done manually), failover may be lost because NFS client state.

The setting of these clusters become as much a master key of the work is still in progress. We expect in the coming months, many user interface changes will be re-transplanted into Nautilus. At the same time, Jeff Layton's blog post also provides a practical demonstration.

3 Changes to Standby Replay

Configuration "standby replay" daemons in CephFS the mechanism has been rewritten. standby replay daemon MDS real-time tracking activity logs, enabling very fast failover in the Active MDS shutdown. Prior to Nautilus, need to use mds_standby_replay daemon configuration options, so that can be used as an alternate replay mds run. The monitor receives a message from the MDS daemon, indicating that they can run in the standby-replay, and assign them to the subsequent ranks MDS (if available).

This configuration and other file systems are different, these configuration changes typically command suite by CephFS; MDS configuration leads to reverse the decision will affect the Monitor. Instead, let the operator to indicate which file system should have a spare standby-replay procedures easier. In addition, using only standby-replay for certain daemon Rank is meaningless.

The new file system settings allow_standby_replay file system will enable standby-replay. As long as there is spare daemons available, Monitor will assign them to Rank available on the file system.

standby-replay count daemon will tend desired file system Standby count, which is configured by "standby_count_wanted". Sure to configure this parameter to track changes "max_mds" in order to have enough standby. For the standby-replay daemon, it suggested that it be set to at least "max_mds + 1". Thus, it is possible to use an ordinary standby-replay processes to replace the spare standby-replay process.

Finally, there are a number of configuration options that allow the operator to specify the file system to follow, ranks or names of MDS. These configuration options are "mds_standby_for_ *. These options have been abandoned in Nautilus, so all MDS have been unified and equal treatment.

4. cephfs-shell tool

Cephfs There are two main access tools: ceph-fuse client and the Linux kernel driver. Now, Pavani Rajula in the summer of 2018 to write a new cephfs-shell tools python. This tool allows the user to customize the shell to perform simple operations on the file system. E.g

CephFS:~/>>> mkdir foo
CephFS:~/>>> put /etc/hosts foo/hosts
CephFS:~/>>> cat foo/hosts
127.0.0.1       localhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouter
CephFS:~/>>>

Remember, cephfs-sheel is alpha-quality software, may be defective. We regularly to fix Nautilus transplant, the most popular ideas for community improvement (in CEPH user mailing list).

 

5. Blacklisting Older Clients

As in the version of Mimic v13.2.2 + in, CephFS now allows cluster administrators to prevent older clients mount the file system. This version of the blacklist is helpful, because they are improper behavior (for example, not releasing capabilities) in some areas.

This is a new file system settings completed:

ceph fs set tank min_compat_client mimic

Once completed, MDS will be blacklisted and expelled from the old client connection and prevent them in the future.

Published 276 original articles · won praise 134 · Views 1.05 million +

Guess you like

Origin blog.csdn.net/iamonlyme/article/details/99227656