On Sunday, March 18th, Fan Yong commited a patch against ext4 to “return 3264-bit dir name hash according to usage type”. Prior to that, ext2/3/4 would return a 32-bit hash value from telldir()/seekdir() as NFSv2 wasn’t designed to accomidate anything larger. This broke the distribute translator as suddenly the dirent structure was returning 64bit d_off values. When DHT (Distributed Hash Translator) applied dht_itransform() on those values, it would overflow. Since the dictionary entry did not have a cached offset, it would try to create one again and would end up in an endless loop.

That patch was for kernel v3.3-rc2. To make things more fun, Jarod Wilson merged in that patch in 2.6.32-268.el6 (from “rpm -q –changelog kernel | less). My personal feelings on this is that structure changes shouldn’t have been backported into Enterprise kernels. This has caused a lot of frustrated users on the IRC channel. Most have just reformatted with xfs, which is a valid solution and falls in line with the officially recommended configuration. For some, however that’s just not possible.

Distributions known to be affected by this change are:

  • Fedora >= 17
  • Red Hat Enterprise Linux (RHEL) 6.3
  • CentOS 6.3
  • Debian Sid
  • Debian Wheezy

The workaround is to either downgrade your kernel, or reformat your bricks xfs OR for RHEL/CentOS, downgrade your kernel to 2.6.32-267 or for everybody else, downgrade to 3.2.9.

The patches that are related to this issue can be tracked at http://review.gluster.com/

UPDATE 2012-08-17 04:02 GMT

Spoke briefly with Vijay ‘hagarth’ Bellur, one of the lead developers, who said, “there are some problems getting NFS and ext3/4 to work with this patch .. hence it is sitting in the queue.”

It is still being actively worked on, though, and is a high priority.