In mixed results, some users have been reporting issues with mounting GlusterFS volumes at boot time. I spun up a VM at Rackspace to see what I could see.
For my volume I used the following fstab entry. The host is defined in /etc/hosts:
server1:testvol /mnt/testvol glusterfs _netdev 0 0
The error listed in the client logs tells me that the fuse module isn’t loaded when the volume tries to mount:
[2013-01-30 17:14:05.307253] E [mount.c:598:gf_fuse_mount] 0-glusterfs-fuse: cannot open /dev/fuse (No such file or d irectory) [2013-01-30 17:14:05.307348] E [xlator.c:385:xlator_init] 0-fuse: Initialization of volume 'fuse' failed, review your volfile again
There are no logs with useable timestamps. The init scripts in /etc/rcS.d show that networking is being started before fuse. networking calls any scrips in /etc/network/if-up.d when the network comes up. Of these, the inaptly named mountnfs mounts all the fstab entries with _netdev set using the command
mount -a -O_netdev
The fuse init script was designed with the expectation that all the remote filesystems should already be mounted (for the case of nfs mounted /usr). This means that it’s scheduled after networking to allow those remote mounts to occur.
Since I don’t really care if remote filesystems are mounted before the fuse module is loaded, I worked around this by changing /etc/init.d/fuse replacing $remote_fs with $local_fs for the Required-Start:
# Required-Start: $local_fs
Then re-order the init processes:
update-rc.d fuse start 34 S . stop 41 0 6 .
People often ask us to document troubleshooting steps. Because it’s not supposed to fail, there are seldom fixed troubleshooting steps. If there were, we’d file bug reports and get them fixed.
Here’s the process I used:
Check the client log. That’s actually one that’s documented everywhere. If something goes wrong, check the log.
Fuse isn’t loaded. Where’s it supposed to get loaded from? I’m out of my expertise with debian so I grep fuse /etc/init.d/* to see what all might have an effect. Looks like /etc/init.d/fuse is it.
fuse’s Default-Start is “S” so I looked in /etc/rcS.d and saw the boot order. Thinking that mountnfs.sh (S17mountnfs.sh) was the likely script that was supposed to mount the gluster volume, I manually set the start order of fuse higher. (mv S19fuse S16fuse). Rebooting still didn’t mount the volume.
I decided to see for sure where the volume was being started so in /sbin/mount.glusterfs I added “ps axf >>/tmp/mounttimeps”. Rebooted.
Looking in my new file I saw:
103 hvc0 Ss+ 0:00 init boot 104 hvc0 S+ 0:00 \_ /bin/sh /etc/init.d/rc S 107 hvc0 S+ 0:00 \_ startpar -p 4 -t 20 -T 3 -M boot -P N -R S 399 hvc0 S 0:00 \_ startpar -p 4 -t 20 -T 3 -M boot -P N -R S 400 hvc0 S 0:00 \_ /bin/sh -e /etc/init.d/networking start 402 hvc0 S 0:00 \_ ifup -a 490 hvc0 S 0:00 \_ /bin/sh -c run-parts /etc/network/if-up.d 491 hvc0 S 0:00 \_ run-parts /etc/network/if-up.d 492 hvc0 S 0:00 \_ /bin/sh /etc/network/if-up.d/mountnfs 502 hvc0 S 0:00 \_ mount -a -O _netdev 503 hvc0 S 0:00 \_ /bin/sh /sbin/mount.glusterfs server1:testvol /mnt/testvol -o rw,_netdev
This pretty clearly showed that “networking” was responsible for causing the mount attempt. Since networking clearly happens before $remote_fs, I changed the requirements and reordered. The new order in /etc/rcS.d showed that fuse was going to start before networking and subsequent reboots proved that to work correctly.
I’ll be working with the package maintainer for gluster-client to see if a proper solution can be implemented.