In a very unscientific test, I was curious about how much of an effect GlusterFS’ self-heal check has on lstat. I wrote probably the first C program I’ve written in 20 years to find out.

To my local disk, which is not the same type or speed as my bricks (although it shouldn’t matter as this should all be handled in cache anyway), to a raw image from within a KVM instance, and to a file on a fuse mounted gluster volume; I looped lstat calls for 60 seconds. This was the result:

Iterations

|

Calculated Latency

|

Store

—>–|—
90330916 | 0.66 microseconds | Local
56497255 | 1.06 microseconds | Raw VM Image
32860989 | 1.83 microseconds | GlusterFS

Again, this is probably the worst test I could do, it’s not at all scientific, has way too many differences in the tests, is performed on a replica 3 volume with a replica down, is run on 3.1.7 (for which afr should perform the same as 3.2.6) and is just overall a waste of blog space, imho, but who knows. Someone else might at least get inspired to do a real test.

Result

As you can see, it’s pretty significant. An almost 64% latency hit for this dumb test over local which, really, should be expected considering we’re adding network latency on top of everything, but the 41% drop from VM Image to GlusterFS mount probably a smidgeon more accurately represent the latency hit for the self-heal checks.

Here’s the C source:

#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char *argv[]) {
    struct stat sb;
    time_t seconds;
    uint64_t count;


    if (argc != 2) {
        fprintf(stderr, "Usage: %s <pathname>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    if (lstat(argv[1], &sb) == -1) {
        perror("stat");
        exit(EXIT_FAILURE);
    }

    seconds = time(NULL);
    count = 0;

    while ( seconds + 60 > time(NULL) ) {
        lstat(argv[1], &sb);
        count++;
    }

    fprintf(stdout, "Performed %llu lstat() calls in 60 seconds.\n", count);
}