Fixing split-brain with GlusterFS 3.3
Published: Gluster F S, Howtos Estimated reading time: ~2 minutes
With the addition of automated self-heal in GlusterFS 3.3, a new hidden directory structure was added to each brick: “.glusterfs”. This complicates split-brain resolution as you now not only have to remove the “bad” file from the brick, but it’s counterpart in .glusterfs.
Identify that you have a split-brain file:
VOLUME=testvol
gluster volume heal $VOLUME info split-brain
Heal operation on volume testvol has been successful
Brick server1:/data/testvol/brick1
Number of entries: 1
at path on brick
----------------------------------
2012-06-13 04:02:05 /foo/bar
Brick server2:/data/testvol/brick1
Number of entries: 1
at path on brick
----------------------------------
2012-06-13 04:02:05 /foo/bar
Ok, this says that I have one file that’s marked split-brain, “bar” in the “/foo” directory in volume “testvol”. After looking at that file in /data/testvol/brick1/foo/bar on both servers, I decided that the one on server1 is the good one, so I log into server2.
For my examples I like to set shell variables to represent things that are unique to my example. If you’re lazy like me, simply set your own shell variables and you should be able to just copy/paste the rest.
BRICK=/data/testvol/brick1
SBFILE=/foo/bar
GFID=$(getfattr -n trusted.gfid --absolute-names -e hex ${BRICK}${SBFILE} | grep 0x | cut -d'x' -f2)
rm -f ${BRICK}${SBFILE}
rm -f ${BRICK}/.glusterfs/${GFID:0:2}/${GFID:2:2}/${GFID:0:8}-${GFID:8:4}-${GFID:12:4}-${GFID:16:4}-${GFID:20:12}
At this point, I’ve always gone back to the old method of calling stat on the file through the client mount. I don’t know if it would heal automatically though.
If you have any questions, come see us in #gluster.