Fixing split-brain with GlusterFS 3.3

With the addition of automated self-heal in GlusterFS 3.3, a new hidden directory structure was added to each brick: “.glusterfs”. This complicates split-brain resolution as you now not only have to remove the “bad” file from the brick, but it’s counterpart in .glusterfs.

Identify that you have a split-brain file:

VOLUME=testvol
gluster volume heal $VOLUME info split-brain


Heal operation on volume testvol has been successful

Brick server1:/data/testvol/brick1
Number of entries: 1
at                   path on brick
----------------------------------
2012-06-13 04:02:05  /foo/bar

Brick server2:/data/testvol/brick1
Number of entries: 1
at                   path on brick
----------------------------------
2012-06-13 04:02:05  /foo/bar

Ok, this says that I have one file that’s marked split-brain, “bar” in the “/foo” directory in volume “testvol”. After looking at that file in /data/testvol/brick1/foo/bar on both servers, I decided that the one on server1 is the good one, so I log into server2.

For my examples I like to set shell variables to represent things that are unique to my example. If you’re lazy like me, simply set your own shell variables and you should be able to just copy/paste the rest.

BRICK=/data/testvol/brick1
SBFILE=/foo/bar
GFID=$(getfattr -n trusted.gfid --absolute-names -e hex ${BRICK}${SBFILE} | grep 0x | cut -d'x' -f2)
rm -f ${BRICK}${SBFILE}
rm -f ${BRICK}/.glusterfs/${GFID:0:2}/${GFID:2:2}/${GFID:0:8}-${GFID:8:4}-${GFID:12:4}-${GFID:16:4}-${GFID:20:12}

At this point, I’ve always gone back to the old method of calling stat on the file through the client mount. I don’t know if it would heal automatically though.

If you have any questions, come see us in #gluster.