I wonder is this is the cause of a nasty NFSv3 issue I was having years ago wher...

jandrese · on Feb 11, 2021

I'd suspect a bug in the NFS implementation. That would hardly be unheard of.

NFS's failure mode of freezing up your system and requiring a full reboot to clear is purestrain NFS though. I never understood why the idea of an eventual soft failure (returning a socket error) was considered unacceptable in NFS land.

toast0 · on Feb 11, 2021

> I never understood why the idea of an eventual soft failure (returning a socket error) was considered unacceptable in NFS land.

Problems like this are usually the result of being unable to decide on an appropriate timeout; so no timeout is chosen. I like to suggest rather long timeouts, like one day or one week, rather than forever to get beyond that. Very few people are going to say, after a read tried for a whole day that it should have tried longer.

Another issue is that POSIX file i/o doesn't have great error indicators; so it can be tricky to plumb things through in clearly correct ways.

StillBored · on Feb 11, 2021

NFS is notorious for breaking kernel and application assumptions about posix. Linux falls into this trap in various ways too in an effort to simplify the common cases. Timeouts might be appropriate for read/open/etc calls but in a way the problems are worse on the write/close/etc side.

Reading the close() manpage hints at some of those problems, but fundamentally posix sync file io isn't well suited to handling space and io errors which are deferred from the originating call. Consider write()'s which are buffered by the kernel but can't be completed due to network or out of space consideration. A naive reading of write() would imply that errors should be immediately returned so that the application can know the latest write/record update failed. Yet what really happens is that for performance reasons the data from those calls is allowed to be buffered. Leading to a situation where an IO call may return a failure as a result of failure at some point in the past. Given all the ways this can happen, the application cannot accurately determine what was actually written, if anything, since the last serialization event (which is itself another set of problems).

edit: this also gets into the ugly case about the state of the fd being unspecified (per posix) following close failures. So per posix the correct response is to retry close(), while simultaneously assuring that open()s aren't happening anywhere. Linux simplifies this a bit by implying the FD is closed, but that has its own issues.

jandrese · on Feb 11, 2021

I understand the reasoning, but at the same time wonder if this isn't perfect being the enemy of good? Since there is no case where a timeout/error style exit can be guaranteed to never lose data we instead lock the entire box up when a NFS server goes AWOL. This still causes the data to be lost, but also brings down everything else.

StillBored · on Feb 11, 2021

Well, soft mounts should keep the entire machine from dying, unless your running critical processes off the NFS mount. Reporting/debugging these cases can be fruitful.

OTOH, PXE/HTTPS+NFS root is a valid config, and there isn't really anyway to avoid machine/client death when the NFS goes offline for an extended period. Even without NFS linux has gotten better at dealing with full filesystems, but even that is still hit or miss.