GlusterFS Split-Brain Recovery Made Easy

Posted by Joe Julian 3 years ago (comments)

Split brain. Sounding like something from a B movie about zombies, it's probably more terrifying to data storage people than flesh eating undead would be.

Split Brain ImageSplit-brain occurs when two or more replicated copies of a file become divergent independantly from each other. This can happen due to a network partition where some clients write to one server while other clients write to another, or through partitions over time, where server1 is taken out of service, writes happen to server2. server1 is returned to service and server2 is removed without the files having been healed. Writes occur on server1 and when server2 is returned to service, each has writes independant of one another.

Prior to this post, fixing split brain files in clustered systems required finding the file that needed healed on whichever brick it happened to be on, reading the extended attributed. Extrapolating path and file locations and removing them on one (or more, depending on the replica count) bricks.

Recently, however, I tried splitting the volume definition such that the translator graph was split to produce separate mounts for each replica. This maintains the distribute properties and allows you the same single namespace you would have with a normal mounted volume. Thus was born splitmount.

Split Brain Graph Remap

What can I do with it?

Take a file, /life/lessons/chocolate/gump.txt on volume myvol1 that reports as split-brain in the report from "gluster volume heal myvol1 info split-brain". We simply mount the volume with splitmount, check both versions of the file, pick a good one and delete the other.

# splitmount server1 myvol1 /tmp/sbfix
Your split replicas are mounted under /tmp/sbfix in directories r1 through r2

Obviously if you have more than replica 2, those will be r1 through however many replicas you have.

Compare your files, use stat, diff, whatever tool works for the file you're checking. In this demonstration case, it turns out both files just have different permissions. We'll keep the one on the second replica.

# rm /tmp/sbfix/r1/life/lessons/chocolate/gump.txt

Then just heal the file again

# gluster volume heal myvol1

If that's all you have to heal, just umount and clean up.

# umount /tmp/sbfix/r*
# rm -rf /tmp/sbfix

That's all there is to it.

Where is it?

You can grab this from https://github.com/joejulian/glusterfs-splitbrain

Building and Installing splitmount

Download the source:

git clone https://github.com/joejulian/glusterfs-splitbrain.git splitmount 
cd splitmount

To install splitmount in your home directory:

python setup.py install --user 

To install splitmount system wide:

python setup.py install