A Critique of Kerberos and AFS

James F. Carter

2002-05-06

At UCLA-Mathnet we use Sun's NIS for user authentication, and thus everyone's encrypted password is exposed to all users on local systems without a root exploit. For distributed file service we use Sun's NFS, whose access control is host-based and depends on user authentication on the client; thus it is insecure to export filesystems to machines not under our direct control, and spoofing exploits are not hard. Both protocols are not ideal when run through a firewall. We are therefore investigating alternative systems for authentication and distributed files. The conclusion is that Kerberos and AFS are the leading contenders.

1 User Authentication

What possibilities are there for authentication? I restrict the scope to only those that have PAM modules, since PAM is essential in Solaris and much preferred in Linux.

NIS: Sun's Network Information Service distributes (among other things) the traditional password file from a central server to all authenticating clients. The encrypted passwords are available to anyone, with an easy command line interface. NIS is what we are using now, and want to replace.
NIS+: It uses Kerberos for the authentication component, but there is a lot of baggage that comes with it. We tried to install NIS+; it worked, but we feared that the excess baggage would break and we would not be able to fix it, so we reverted to NIS.
Miscellaneous alternatives: Besides putting a traditional UNIX passwd file on every client, you can store name-password pairs in a SQL database or a LDAP server, or you can use a Windows NT PDC via SMB, or an IMAP server. None of these are useful for us.
Kerberos: It has been around for a long time and is well understood. Beyond just matching up a user ID and password, it can be used for access control to Kerberos-aware services, and there is a way to propagate tickets to remotely spawned sessions. Windows NT uses Kerberos for its authentication component.
Radius (RFC 2865) is widely used by embedded systems like routers, and by ISPs to authenticate dialin clients. A Radius server is available for Linux. On the wire the presented password is encrypted by a not particularly strong code. The password is typically memorized but there is some indication of smart card implementations too. Radius would be a contender if it meets other needs.
TACACS (RFC 1492): Similar to Radius but without the encryption. Cisco offers a product with proprietary extensions. Not considered further.
One-time passwords: S/KEY or OPIE (RFC 1760 or 2289) transform a memorized password by repeatedly MD5-hashing it (plus a random seed), thus providing a sequence of values, each used only once, that represent the password and which can be sent over the wire without any benefit to an eavesdropper. But the sequence has finite length and can only be resynced on a network free of hackers. This is not worth the bother.
Physical objects: The SecurID card by RSA shows a number which is a function of time, used as a password; the server can reproduce the number. The iButton has a matching reader and contains a readonly serial number plus user-defined information. Other smart cards exist but PAM modules for them are not known. None of these is likely in our environment.

Of these authentication services, Radius and Kerberos are the major contenders. For Kerberos the tie-in with both Windows NT and Solaris is a big attraction, and also, AFS (if we choose to install it) requires Kerberos for access control. So let's look more closely at the advantages of Kerberos.

Users are authenticated with strong cryptographic keys. Brute force attacks are unlikely to work. If a hacker does a root exploit on a machine, Kerberos credentials are useless if the user is not logged on, and if the tickets of a logged-in user are stolen, they are valid only for a limited time. Of course a Trojan horse can be installed to capture users' pass phrases.
Authentication information is never passed on the net in clear text, nor is it available to ordinary users in bulk. With NIS, you can just do ``ypcat passwd'' and have all the encrypted passwords, cracking them at your leisure.
Kerberos includes an access control component, for applications programmed to use it.
Users can authenticate manually at a remote site, if needed, while remaining authenticated locally. Blanket cross-realm trust can also be set up.
Kerberos authentication can be honored on remote sites for access control, provided the relevant servers and clients are aware of Kerberos. For telnet and ftp, kerberized versions are part of the standard distribution. rsh, rcp, rlogin are also available but are deprecated because the tickets are transferred in clear text. Openssh can be compiled with Kerberos v4 support (I don't know about Ylonen ssh). Besides ssh, daemons are standardly provided that can forward generic tickets and X-windows connections securely. AFS file sharing includes access control through Kerberos as an integral part.
Kerberos has remote administration based on Kerberos tickets and an access control list; you don't have to be root, nor on the server machine, to administer it.
Slave servers are a standard feature. Incremental updating of the database is an ``advanced feature'' which appears to work; bulk updates (as done for NIS) are actually the normal mode of operation.
It is possible to dump the database in ASCII form, and restore it.
Windows NT (and successors) standardly use Kerberos for authentication. The Windows machine can be configured to use a UNIX Kerberos server. I don't know if Windows knows about slave servers; probably it does. If this were done, UNIX and Windows accounts would be controlled as a unit, and password changes on one would be immediately seen on the other. (I haven't seen discussion of UNIX authenticating to a Windows Kerberos server; but there is a PAM module for SMB authentication which ultimately uses Kerberos.)

So what is wrong with Kerberos that might prevent us from using it?

I have seen criticism of the Solaris and Linux PAM modules for Kerberos; however, this may be old information, since there were no bug reports for the PAM module in a recent Debian distro.
Recent source code for AFS and ssh both use the deprecated Kerberos version 4, not version 5. It would appear that the developers of these subsystems are not aggressively keeping them up to date with Kerberos. However, the Kerberos 5 server can serve Kerberos 4 clients.
Ticket forwarding can be a problem: if you login to machine A and get a set of tickets, then from there you use ssh to execute a command relevant to Kerberos on machine B, you will need tickets on B, and most likely you won't have them. The same problem happens if you queue a batch job. Tickets are saved in /tmp of the local machine, in a file readable only by the user (and root), and have to be propagated using the provided client and daemon, manually or as part of a script. Ssh can propagate AFS-related tickets by itself, but the documentation says that other tickets are not propagated.

I would conclude that Kerberos is likely to work; the transition to Kerberos is likely to be transparent for most but not all of our users; Kerberos will be much more secure than what we have; and we will get additional capabilities which may be very useful.

2 Distributed Filesystems

What are we looking for in a distributed filesystem? We want its access control to be hard to circumvent, and we want it to work through an aggressive firewall. It must be moderately scalable in the sense that we have 32 fileservers with 168 exported filesystems among them. Encrypting file content in transit would be nice but is not essential. Replicated readonly volumes for software (with transparent failover) would be a helpful addition. Also helpful would be sufficiently robust security that off-campus machines beyond our control could mount our filesystems with low risk to us. We must either have a dedicated Windows client or the data must be exportable via Samba.

In a web search I found a number of cluster filesystems. I don't think they're relevent here; their intended use is for high-performance parallel I/O, as in a Beowulf cluster, or for Storage Area Networks (i.e. fileserver appliances), or for enormous worldwide databases. I'm listing them here so in future searches they can be recognized and ignored.

GASS from the Globus Tookit. For Beowulf.
PVFS, Parallel Virtual File System, from Clemson. For Beowulf.
Veritas Distributed Filesystem. Proprietary, for a commercial SAN product.
GFS Global Filesystem, a SAN block-level cluster design.
APGrid Datafarm, targeted for petabytes of data spread over thousands of nodes.
OIF, Oracle Internet Filesystem. Stores files in an Oracle database.

There are, however, a few filesystems targeted at our kind of application.

Sun's NFS is the distributed filesystem that Mathnet currently uses. Its disadvantages are that users are identified on the insecure client, not the server; data is transferred by UDP which is fine for the interior LAN but not if a serious firewall has to be traversed; it has no encryption (though supposedly Sun's implementation can be made to encrypt); and it is not particularly fast, having no local cache.
AFS is the most well-established distributed filesystem. It has been around since 1984, and OpenAFS is under active development. A number of organizations, big and small, use it and are happy with it. Its features, advantages and disadvantages are discussed below, but in summary: It has excellent security, encryption and access control. It has client side caching, server replication, and tolerance of server failures (if replicated). Apparently it is reasonably scalable. It works directly on the raw device, needing special information in the inodes. It is not much slower than NFS (faster when the cache can be utilized). It is available for Solaris and Linux; and there is a Windows client (not server).
Coda filesystem. It has client side persistent caching and server replication. It has good security for authentication, encryption, and access control. It is tolerant of network and server failures, and is designed so clients can be disconnected for mobile operation. It has good scalability. It uses native filesystem formats such as UFS or ext2, but the files are organized idiosyncratically, not really accessible except via the Coda client. It is available for Linux and Solaris. It is under active development and is a fairly mature product. It is not particularly fast when writing, which requires a synchronous round-trip to the server.
InterMezzo filesystem. The leader of the Coda team did a ground-up redesign to make it faster. It uses an existing disc filesystem format and driver; it is known to work on ext2, ext3 and tmpfs. Its kernel module is included in the Linux kernel starting with 2.4.15. I believe there is no Solaris kernel module. I'm afraid that it's kind of alpha level, but it could be a contender in a few years.
DCE/DFS, an OFS standard set of services and interfaces for client-server applications. Uses Kerberos-5 for authentication, and includes a distributed file system with local caching and AFS-style access control and write synchronization. DCE includes a lot more good stuff, including portable RPC, a thread library, naming service (for files on remote machines), communication with alien cells, and time synchronization. It isn't clear whether you have to take the whole package to get any part of it. SGI may be in charge of development.

Of these filesystems, NFS is the devil that we know, which all others have to measure up to. AFS is a strong contender, used at a number of sites to do what we're trying to do. Coda and DCE/DFS may or may not be contenders also, but our lack of familiarity with them puts risk in their columns. InterMezzo is for the future.

So if we replace NFS at all, we'll almost certainly replace it with AFS. What are its advantages?

AFS uses Kerberos version 4 tickets, issued by a daemon on or adjacent to the fileserver, to determine if users have permission to read particular files. ``Honor system'' user identification is a major weakness of Sun's NFS. By ``honor system'' I mean that if a machine is not under our administrative control, because it is foreign, or is a personal laptop, or is infested by a hacker, our choices are to trust the user identities it passes out, or to not export files to it.
File content is encrypted in transit. Eavesdropping is not a major threat for Mathnet, but is important for other organizations.
AFS protocols are organized so users can connect to the fileserver through a firewall. It is feasible, and is common practice, for users on foreign systems to mount AFS volumes, either world-readable ones, or if the users have authenticated themselves to the fileserver.
AFS has more flexible access control than standard UNIX: the directory's owner can specify an arbitrary access control list, if desired.
For readonly (software residence) volumes that are replicated, AFS has automatic and transparent failover when a server dies.
For writeable (home directory) volumes, there is a local write-back cache on the client, and a standard feature is a backup copy of all files with copy on write semantics, so the version as of the last backup is kept automatically, unobtrusively and without a lot of actual disc space, if most files aren't written on.
A client is available to mount AFS volumes on Microsoft Windows NT, but Windows directories cannot be exported as if AFS.
A PAM module is available that converts Kerberos credentials to AFS tokens at login time.
We won't be alone in using AFS. Here are a few other users, mostly hype from the OpenAFS web site (selected for positive results):
- Duke University, serving student home directories in campus labs. Solaris 2.6 through 8 on servers; Windows and MacOS-X clients. Has cross-realm trust between Kerberos-5 and Win2K-XP Active Directory.
- KTH EE Department (Sweden; newly expanded installation), 1 Tbyte of data on several HP Alpha with Tru64 UNIX. 400 PC or Mac clients, 15 Solaris clients. No server crashes in 6 months that they've used it.
- A high school in Germany: 1.6 Tbyte on 4 servers, 100 to 150 PC-type clients running Linux and Win2K. 1500 student users.
- CMU Computing Services Division: Solaris 2.6 servers; Solaris 7 and 8 clients; also Linux and WinNT. Doesn't say how many.
- Stanford University: All student home directories, accessible from the dorms, public labs and departmental installations. Solaris servers, Windows and Mac clients (and presumably Linux). They have used it at least since 1996. Trouble-free from the user's point of view.

AFS sounds like a really wonderful replacement for NFS, so why don't we, and everyone else, embrace it wholeheartedly?

Kerberos version 4 is the old version, and is potentially vulnerable to replay attacks, though such attacks are not promient in the security mailing lists. However, it would appear that Kerberos-5 can support AFS in a compatibility mode.
NFS operates ``on top of'' local filesystems; in other words it moves the data between machines but has nothing to do with how it is stored on the fileserver. On the other hand, AFS volumes are specially formatted for AFS, and are accessible only through AFS, even locally. Thus an organization deploying AFS has to trust it from the beginning, and has to make an instantaneous transition from no AFS to all AFS, referring to access to the volumes being converted.
AFS file permissions are not the same as UNIX. While AFS access control may be ``better'', scripts and procedures that assume a UNIX background may find surprises if run on AFS. Only the ``owner'' UNIX permissions have any effect.
Remote authentication and forwarding of tickets is not necessarily transparent to the user. This is unimportant if the user works on one client (workstation) at a time, but where multiple machines are involved, new procedures may have to be developed and learned by the users. This is particularly a problem for asynchronous jobs, i.e. submitted to a batch queuing system, and starting or finishing after the user has logged out, invalidating his tickets. This can be dealt with, but the procedure has to be researched (by us) and learned (by the user).

3 Actions to Take

So what should we do now?

Set up a Kerberos realm and have the MCG staff use it for UNIX authentication. Make sure we can make it function.
Have the MCG staff use the UNIX Kerberos for Windows authentication.
Assuming we're going to commit to Kerberos, redesign our root access paradigm to take advantage of Kerberos security.
Think very hard whether the advantages of AFS -- and they are real -- are enough to justify abandoning NFS. An initial step would be to put the MCG home directories on an AFS filesystem, and install just the clients globally. A prerequisite would be to have Kerberos operating reliably.
In particular, have Windows home directories of MCG staff served from AFS. At all the AFS sites with testimonials, this was the biggest part of the use of AFS.
A Radius to Kerberos proxy server might be useful for controlling administrative access to our routers and, possibly, printers.