What does the following message mean nfs4_reclaim_open_state Lock reclaim failed!

What does the following message mean: “nfs4_reclaim_open_state: Lock reclaim failed!”

环境

  • Red Hat Enterprise Linux (RHEL) 6
    • all kernels
  • NFSv4
  • NFS client

问题

  • We’re seeing the following message in /var/log/messages. What does it mean?
Jun 11 19:47:02 foobar kernel: nfs4_reclaim_open_state: Lock reclaim failed!

决议

  • This message is a somewhat generic message which is generated by the NFSv4 client code. It means the following
    1. An NFS operation completed with an error status which triggered the NFS state manager thread to begin recovering state.
    2. In the process of recovering states, the state manager attempted to reclaim locks via nfs4_reclaim_locks() and this operation failed
  • Often this message occurs due to a lease expiration event caused by a networking issue between the NFS client and NFS server. In this case the error status seen in an NFS operation will be NFS4ERR_EXPIRED.
  • Unfortunately it is not specific enough to track down what happened. To track down what caused the message, the NFS traffic leading up to the message must be captured, or some method of instrumenting the kernel to track the logic (such as systemtap) must be used. See the Diagnostic Steps section for more information.
Examples

根源

  • This message comes from the NFSv4 client code inside fs/nfs/nfs4state.c nfs4_reclaim_open_state() function below, and is called as part of the NFSv4 client’s state recovery. The state recovery is triggered when certain operations complete with certain error codes.
fs/nfs/nfs4state.c
1155 static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs4_state_recovery_ops *ops)
1156 {
1157    struct nfs4_state *state;
1158    struct nfs4_lock_state *lock;
1159    int status = 0;
1160 
1161    /* Note: we rely on the sp->so_states list being ordered 
1162     * so that we always reclaim open(O_RDWR) and/or open(O_WRITE)
1163     * states first.
1164     * This is needed to ensure that the server won't give us any
1165     * read delegations that we have to return if, say, we are
1166     * recovering after a network partition or a reboot from a
1167     * server that doesn't support a grace period.
1168     */
1169    spin_lock(&sp->so_lock);
1170    write_seqcount_begin(&sp->so_reclaim_seqcount);
1171 restart:
1172    list_for_each_entry(state, &sp->so_states, open_states) {
1173            if (!test_and_clear_bit(ops->state_flag_bit, &state->flags))
1174                    continue;
1175            if (state->state == 0)
1176                    continue;
1177            atomic_inc(&state->count);
1178            spin_unlock(&sp->so_lock);
1179            status = ops->recover_open(sp, state);
1180            if (status >= 0) {
1181                    status = nfs4_reclaim_locks(state, ops);
1182                    if (status >= 0) {
1183                            spin_lock(&state->state_lock);
1184                            list_for_each_entry(lock, &state->lock_states, ls_locks) {
1185                                    if (!(lock->ls_flags & NFS_LOCK_INITIALIZED))
1186-->                                         printk("%s: Lock reclaim failed!\n",
1187                                                    __func__);
1188                            }
1189                            spin_unlock(&state->state_lock);
1190                            nfs4_put_open_state(state);
1191                            spin_lock(&sp->so_lock);
1192                            goto restart;
1193                    }
1194            }

诊断步骤

Gathering a tcpdump leading up to “Lock reclaim failed”

Use the tcpdump-watch.sh script attached to Gathering data and logs to troubleshoot NFS issues and modify the ‘match=’ line to include “Lock reclaim failed” as follows, and run the script as stated in the article.

When the script detects the “Lock reclaim failed” message, it will stop the tcpdump and the tcpdump output file as well as the /var/log/messages file should be uploaded to Red Hat for analysis:

猜你喜欢

转载自blog.csdn.net/QTM_Gitee/article/details/130488412