Detecting NFS Clients
For many years I have used a “safe-shutdown” script. One of the reasons I have not adopted a systemd distro at home is my reliance on this script. I have not found a palatable way to support this script in systemd. The systemd-inhibit
command does not help.
One of my safe shutdown checks is whether NFS clients are connected. If true then the safe-shutdown script aborts. My reason for detecting NFS clients is simple — NFS is evil when the client cannot find the NFS server. The client system will hang during any kind of file related task.
My safe-shutdown script has worked well for some years but by odd circumstance, recently I noticed a client system not being detected. The system running the NFS server powered down despite the connection. Although connected as an NFS client, the system was idle with respect to using that connection. From the NFS server perspective, the client was seen as not being connected.
My safe-shutdown script relies on the netstat
command to detect clients. The common methods suggested online for detecting connected NFS clients is the netstat
and showmount
commands. My recent circumstance revealed that NFS clients will appear not connected when the client system is idle. That is, netstat
will not show any ESTABLISHED
connections. From the NFS server perspective, there are no connected clients.
The showmount -a
command parses the /var/lib/nfs/rmtab
file. The showmount
and rpc.mountd
man pages indicate this command option and file are unreliable. This is a strange design choice — why can’t the NFS server automatically scrub the rmtab
file when clients unmount cleanly?
The unreliability of the rmtab
file can be verified by noticing no change in the file contents, even after a long period. After several days a significant amount of stale log entries are present in the rmtab
file.
When unmounted in a clean manner, the netstat
command immediately shows that a client is disconnected. The netstat
command seems to be the more sensible choice for detecting NFS clients. Except when a client is idle. Being idle does not mean disconnected or that the exported shares were unmounted.
With idle clients the rmtab
file can help only a little by showing what clients have been connected.
This idleness can be observed using the netstat
command. Wait five minutes after connecting a client with no network file activity. The netstat
command will show no connection. On the client, running the df
command or accessing a file on the NFS shares will refresh the NFS server connections. The NFS server will again show an ESTABLISHED
connection.
Resolving this corner case challenge required changes to my safe shutdown procedure.
One, I added a simple one-liner in my system cleanup script that is run on shutdown:
cat /dev/null > /var/lib/nfs/rmtab
This does not eliminate the possibility of stale entries during a current session, but is a good start at trimming the log file of long-term stale connections.
I added additional checks in my safe-shutdown script. I run the showmount -a
command:
showmount -a | awk -F ’:’ ‘{print $1}’ | grep ^[0-9] | sort | uniq
The result might contain stale entries but at least provides a list of recently connected clients.
I run a single ping to each IP address from that list. If the ping exit code is zero then the ping was successful. The client system is still online. Is the system connected? I run netstat
:
netstat -an | grep ${LOCAL_IP_ADDRESS}:2049 | grep ESTABLISHED | awk ‘{print $5}’ | grep $client
If the test returns no ESTABLISHED
connections then this client is online but seemingly not connected. The remaining question is whether the client is truly disconnected or idle?
An NFS client cron job running every three minutes would refresh the connections. For example, running the df
command or piping the contents of a connected directory to /dev/null
.
As I use SSH keys throughout the home LAN, I could have my safe-shutdown script SSH into the client system and check for NFS connections.
The Linux kernel TCP keepalive attributes are not an option. A common reference on the topic is the TCP Keepalive HOWTO. The kernel attributes are intended for TCP/IP connections and not NFS, even when NFS is used over TCP. I had no success testing these attributes with an idle NFS client connection.
The default timeout is five minutes or 300 seconds. This timeout is not related to the NFS mount option timeo
. I found no configuration or mount option for changing this default.
My options for detecting an idle NFS client seem limited. I added the following one-liner in my safe-shutdown script when the first netstat
command returns empty:
ssh root@${client} /bin/df
Thereafter I again run the netstat
command.
Running the df
command on the NFS client refreshes the connections and causes the NFS server to correctly display an established connection.
A client side cron job running df
every three minutes is another option.
Perhaps this monkey-wrenching could be avoided if /var/lib/nfs/rmtab
was updated and reliable or there was a way to override the default five minute timeout.
Posted: Usability Tagged: General
Category:Next: Software As A Service
Previous: Thunderbird and Lightning — 2