The NFS 16 groups limit issue

The last Friday I was involved in a curious situation trying to setup a NFS server. The NFS server was mounted in UNIX server which was using UNIX users accounts assigned to many groups. These users were using files and directories stored in the NFS server.

As brief description of the situación which incites this post, I will say that the problem occurs when you are using UNIX users which are assigned in more than 16 UNIX groups. In this scenario, if you are using NFS (whatever version) with the UNIX system authentication (AUTH_SYS), quite common nowadays in spite of the security recommendations, you will get a permission denied during the access to certain arbitrary files and directories. The reason is that the list of secondary groups assigned to the user is truncated by the AUTH_SYS implementation. That is simple amazing!

Well, to be honest, this is not an unknown NFS problem. This limitation is here, around us, since the early stages of the modern computing technology. After a quick search on Internet, I found the reason why this happens and it is not a NFS limitation but it is a limit specified on AUTH_SYS:

   The client may wish to identify itself, for example, as it is
   identified on a UNIX(tm) system.  The flavor of the client credential
   is "AUTH_SYS".  The opaque data constituting the credential encodes
   the following structure:

         struct authsys_parms {
            unsigned int stamp;
            string machinename<255>;
            unsigned int uid;
            unsigned int gid;
            unsigned int gids<16>;
         };

The root cause

AUTH_SYS is the historical method which is used by client programs contacting an RPC server need. This allows the server get information about how the client should be able to access, and what functions should be allowed. Without authentication, any client on the network that can send packets to the RPC server could access any function.

AUTH_SYS has been in use for years in many systems just because it was the first authentication method available but AUTH_SYS is not a secure authentication method nowadays. In AUTH_SYS, the RPC client sends the UNIX UID and GIDs for the user, the server implicitly trusts that the user is who the user claims to be. All the this information is sent through the network without any kind of encryption and authentication, so it is high vulnerable.

In consequence, AUTH_SYS is an insecure security mode. The result is this can be used as the proverbial open lock on a door. Overall  the technical articles about these matters highly suggest the usage of other alternatives like NFSv4 (even NFSv3) and Kerberos, but  yet AUTH_SYS is commonly used within companies, so we must still deal it.

Note: This article didn’t focus in security issues. The main purpose of this article is describe a specific situation and show the possible alternatives identified during the troubleshooting of the issue.

Taking up the thread …

I was profiling a situation where the main issue was leaded by a UNIX secondary groups list truncation. Before continue, some summary of the context here: A UNIX user has a primary group, defined in the passwd database, but can also be a member of many other groups, defined in the group database. A UNIX system hardcoded  a limit of 16 groups that a user can be a member of (source). This means that clients into UNIX groups only be able to access to 16 groups. Quite poor when you deal with dozens and dozens of groups.

As we already know, the problem is focused in the NFS fulfilment with the AUTH_SYS specifications, which has an in-kernel data structure where the groups a user has access to is hardcoded as an array of 16 identifiers (gids). Even though Linux now supports 65536 groups, it is still not possible to operate on more than 16 from userland.

My scenario …

at this moment, I had identified this same situation in my case. I had users assigned to more than 16 secondary groups, I had a service using a NFS for the data storage but, in addition, I had some more extra furnitures in the room:

  • Users of the service are actual UNIX accounts. The authorization to for the file accessing is delegated to the own UINIX system
  • I hadn’t got a common LDAP server sharing the uids and gids
  • The NFS service wasn’t under my control

; this last point turned my case a little bit more miserable as we will see later.

 Getting information from Internet …

first of all, a brief analysis of the situation is always welcome:

– What is the actual problem? This problem occurs when a user, who is a member of more than 16 groups, tries to access a file or directory on an nfs mount that depends on his group rights in order to be authorized to see it.  Isn’t it?
– Yes!
– So, whatever thing that you do should be starting by asking on Google. If the issue was present for all those years, the solution should be also present.
– Good idea! – I told concluding the dialog with myself.

After a couple of minutes I had a completed list of articles, mail archives, forums and blog posts which throw up all kind of information about the problem. All of them talked about the most of the points introduced up to this point in this article. More or less interesting each one, one of them sticked out respect the others. It was the solving-the-nfs-16-group-limit-problem posted article from the xkyle.com blog.

The solving-the-nfs-16-group-limit-problem article describes a similar situation and offers it own conclusions. I must admit that I am pretty aligned with these conclusions and I would recommend this post for a deep reading.

The silver bullet

This solution is the best case. If you have the control of the NFS and you are running a Linux kernel 2.6.21 at least. This kernel or newer supports a NFS feature with allows ignore the gids sent by the RPC operations, instead of uses the local gids assigned to the uid from the local server:

-g or --manage-gids
Accept requests from the kernel to map user id numbers into lists of group id numbers for use in access control. An NFS request will normally (except when using Kerberos or other cryptographic authentication) contains a user-id and a list of group-ids. Due to a limitation in the NFS protocol, at most 16 groups ids can be listed. If you use the -g flag, then the list of group ids received from the client will be replaced by a list of group ids determined by an appropriate lookup on the server. Note that the 'primary' group id is not affected so a newgroup command on the client will still be effective. This function requires a Linux Kernel with version at least 2.6.21.

The key for this solution is get synchronized the ids between the client and the server. A common solution for this last requirement it is a common Name Service Switch (NSS) service. Therefore, the --manage-gids option allows the NFS server to ignore the information sent by the client and check the groups directly with the information stored into a LDAP or whatever using by the NSS. For this case, the NFS server and the NFS client must share the UIDs and GIDs.

That is the suggested approaching suggested in solving-the-nfs-16-group-limit-problem. Unfortunately, it was not my case :-(.

But not in my case

In my case, I had no way for synchronize the ids of the client with the ids of the NFS server. In my situation the ids in the client server was obtained from a Postgres database added in the NSS as one of the backends, there was not any chance to use these backend for the NFS server.

The solution

But this was not the end. Fortunately, the nfs-ngroups patchs developed by frankvm@frankvm.com expand the variable length list from 16-bit to 32-bit numeric supplemental group identifiers. As he says in the README file:

This patch is useful when users are member of more than 16 groups on a Linux NFS client. The patch bypasses this protocol imposed limit in a compatible manner (i.e. no server patching).

That was perfect! It was that I was looking for exactly. So I had to build a custom kernel patched with the right patch in the server under my control and voilá!:

wget https://cdn.kernel.org/pub/linux/kernel/v3.x/linux-3.10.101.tar.xz
wget http://www.frankvm.com/nfs-ngroups/3.10-nfs-ngroups-4.60.patch
tar -xf linux-3.10.101.tar.xz</code><code>
cd linux-3.10.101/
patch &lt; ../3.10-nfs-ngroups-4.60.patch
make oldconfig
make menuconfig
make rpm
rpm -i /root/rpmbuild/RPMS/x86_64/kernel-3.10.101-4.x86_64.rpm
dracut "initramfs-3.10.101.img" 3.10.101
grub2-mkconfig &gt; /boot/grub2/grub.cfg

Steps for CentOS, based on these three documents: [1] [2] [3]

Conclusions

As I said this post doesn’t make focus in the security stuffs. AUTH_SYS is a solution designed for the previous times before Internet. Nowadays, the total interconnection of the computer networks discourages the usage of kind methods like AUTH_SYS. It is an authentication method too much naive in the present.

Anyway, the NFS services are still quite common and many of them are still deployed with AUTH_SYS, not Kerberos or other intermediate solutions.  This post is about a specific situation in one of these deployments. Even if these services should be progressively replaced by other more secure solutions, a sysadmin should demand practical feedback about the particularities of these legacy systems.

Knowledge about the NFS 16 secondary groups limit and the different recognized workaround are still interesting from the point of view of the know-how. This post shows two solutions, even three if you consider the Kerberos choice, to fix this issue … just one of them fulfill with my requirements in my particular case.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s