GlusterFS communicates over TCP. This allows for stateful handling of file descriptors and locks. If, however, a server fails completely, kernel panic, power loss, some idiot with a reset button… the client will wait for ping- timeout (42 by the defaults) seconds before abandoning that TCP connection. This is important because re-establishing FDs and locks can be a very expensive operation. As glusterbot says in #gluster:

Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.

When you’re hosting VM images on GlusterFS, that 42 seconds will cause your ext4 filesystems to error and become read-only. You have two options:

  • Shorten the ping-timeout
    You can shorten the ping-timout by setting the volume option, network.ping- timeout

  • Change ext4’s error behavior
    You can change ext4’s error behavior with the mount option, “errors=continue” or by changing the default in the superblock using tune2fs