NFS mount for GlusterFS gives better read performance for small files?

This concept is thrown around a lot. People frequently say that “GlusterFS is slow with small files”, or “how can I increase small file performance” without really understanding what they mean by “small files” or even “slow”.

“Small files” is sort of a misconception on its own. Initial file ops include a small amount of overhead, with a lookup, the filename is hashed, the dht subvolume is selected and the request is sent to that subvolume. If it’s a replica, the request is sent to each replica in that subvolume set (usually 2). If it is a replica, all the replicas have to respond. If one or more have pending flags or there’s an attribute mismatch, either some self heal action has to take place, or a split-brain is determined. If the file doesn’t exist on the dht subvolume that the hash predicted (due to the file being renamed usually), the same steps must be done to all the dht subvolumes. If the file is found, a link file is made on the predicted dht subvolume pointing to the place we actually found the file. This will make finding it faster the next time. Once the file is found and is determined to be clean, the file system can move on to the next file operation.

PHP applications, specifically, normally have a lot of small files that are opened for every page query so per-page, that overhead adds up. PHP also queries a lot of files that just don’t exist. Your single page might query 200 files that just aren’t there. They’re in a different portion of the search path, or they’re a plugin that’s not used, etc. These negative lookups take longer the more dht subvolumes you have as it has to query every subvolume brick to see if the file exists, but not where we predict.

NFS mitigates that affect by using FScache in the kernel. It stores directories and stats, preventing the call to the actual filesystem. This also means, of course, that the image that was just uploaded through a different server isn’t going to exist on this one until the cache times out. Stale data in a multi-client system is going to have to be expected in a cached client.

Jeff Darcy created a test translator that caches negative lookups which he said also mitigated the PHP problem pretty nicely.

If you have control over your app, things like absolute pathing for PHP includes or leaving file descriptors open can also avoid overhead. Also, optimizing the number of times you open a file or the number of files to open can help.

So “small files” refers to the percent of total file op time that’s spent on overhead vs actual data retrieval.