Disentangling Kerberos from LDAP

James F. Carter <jimc@math.ucla.edu>, 2014-11-03

Former versions of MIT Kerberos could not do incremental propagation, so we set up various substitute schemes. The most recent was to use the LDAP database backend and to rely on LDAP incremental propagation.

See here for how to set up the LDAP database backend.
LDAP replication has worked well so far.
But Kerberos puts incredible load on LDAP, lots of queries with lots of result rows. This is particularly annoying during password guessing attacks, which are incessant.
During system upgrades, the sysadmin may forget that Kerberos depends on LDAP, and waste time debugging Kerberos issues before LDAP has been re-activated.

In OpenSuSE 13.1 we have krb5-1.11.3 dated about 2013-11-xx. It would appear that MIT Kerberos is now able to do incremental replication. So we want to revert from the LDAP backend to native files, turning on native incremental replication.

About examples in this writeup: the hostnames and realm refer to jimc's home network. The reader will need to substitute his own realm and KDC hostnames. Also some steps refer to local features such as /m1/custom/restarter.conf , which are absent on a stock OS. This SuSE system uses systemd (systemctl) for booting and system process control. If you have LSB startup scripts (e.g. /etc/init.d/kerberos) or upstart, substitute the appropriate commands for your process control framework.

Research on Features

kprod propagation daemon: This is for Kerberos version 1.14 (2014) but has not changed a lot since 1.11. kpropd runs on the slave and is normally socket activated. In conventional propagation a cron job on the master executes kprop occasionally, sending a full dump to kpropd.

When using kprop I hit this error:

kprop: Server rejected authentication (during sendauth exchange) 
while authenticating to server
Generic remote error: Wrong principal in request

The master authenticates as host/$MASTER@$REALM, and the slave needs this principal's key in its default keytab -- with the same key version (KVNO). I failed to use the -norandkey option of ktadd (in kadmin) when extracting the key to the slave's keytab.

If incremental propagation is enabled, kpropd runs continually on the slave and periodically polls kadmind on the master. kdc.conf parameter iprop_enable turns it on; iprop_slave_poll is the poll interval, default is 2 minutes. The slave's keytab must include kiprop/$SLAVE@$REALM and /var/krb5kdc/kpropd.acl needs the principal as which the master is sending the updates.

If incremental propagation is enabled, kprop can no longer be used. The symptom is this error mesage: kprop: Software caused connection abort while reading response from server. This message means that kpropd received the dump that kprop sent, and then execed kdb5_util load -i $kdir/from_master, which returned a nonzero exit code. Various issues could make kdb5_util fail, but in this case the -i option means to look for the serial number of the last entry in the update log that is included in the dump, so kpropd knows where to start polling from, but the dump lacks the serial number.

To debug propagation the program kproplog is handy. In case you don't have convenient symbolic links, its full path (for SuSE) is /usr/lib/mit/sbin/kproplog . kproplog -h gives just the serial numbers and timestamps of the first and last updates. On the master only, kproplog -e 5 -v gives more details of the last N updates.

Table of parameters for incremental propagation. For incremental propagation, the default poll interval is 2 mins, and there is a lockout of 10 secs; if the DB was modified this recently the slave (kpropd) will pause and retry. If a principal's policy is altered, it cannot be propagated incrementally and a full update will be done.

Tidbit: For cross-realm trust, MIT recommends the password be 26 bytes of random ASCII text. Assuming this means an alphabet of 96 choices, that would be about 168 bits.

Tidbit: How to destroy a realm in LDAP: You have to do this while the krb5.conf file has the LDAP parameters; otherwise the error message will be LDAP container not specified or something like that. Do it on any server and rely on LDAP replication to delete entries on the other servers. You will need to type in the password from ldaproot, generally stored in /etc/ldap.secret and/or /etc/openldap/ldap.secret. (It doesn't know about -y.) For a big realm it may run as long as a minute.

    kdb5_ldap_util -D $rootdn -H ldaps://hostname destroy -r REALM

To test if it's gone (do before and after) (it will print nothing, if the realm is gone):

    ldapsearch -x -H ldaps://jacinth.cft.ca.us \
    -D uid=ldaproot,dc=cft,dc=ca,dc=us -y /etc/ldap.secret \
    -b cn=CFT.CA.US,cn=Kerberos,dc=cft,dc=ca,dc=us -LLL \
    '(krbPrincipalName=jimc@CFT.CA.US)' krbLastSuccessfulAuth

Outcome: it says kdb5_ldap_util: Realm Delete FAILED: Operation not allowed on non-leaf deleting database of 'CFT.CA.US'. But the stuffing is gone even so, on all 3 servers due to LDAP incremental propagation. I think it tried to remove the realm container before it was empty. It left these objects in place, all of which I formerly created by hand:

dn: cn=Kerberos,dc=cft,dc=ca,dc=us
dn: cn=CFT.CA.US,cn=Kerberos,dc=cft,dc=ca,dc=us
dn: krbPrincipalName=krbtgt/CFT.CA.US@KRB.MATH.UCLA.EDU,cn=CFT.CA.US,cn=Kerberos,dc=cft,dc=ca,dc=us
dn: cn=kadmind,cn=Kerberos,dc=cft,dc=ca,dc=us
dn: cn=kdc,cn=Kerberos,dc=cft,dc=ca,dc=us

Procedure

These web pages describe the procedure to set up a new KDC:

I re-created new KDCs with a native file backend, copied the payload to them, and then removed the LDAP instances.

On a SuSE system, most relevant files are in /var/lib/kerberos/krb5kdc Call this $kdir .
On all hosts, add kerberos to /m1/custom/restarter.conf and disable it. You don't want it to be restarted when the database is absent.
```
systemctl disable kerberos
```

Converting the Master KDC

Turn off KDC on Jacinth (the master site).
```
systemctl stop kerberos
```
Dump the database on Jacinth, or use a recent dump, which is in /home/kerberos.bku/krb.db.1.gpg (also ov). Actually an up-to-the-minute dump is recommended particularly for propagating to the slaves once you get them converted. To make one:
```
(umask 077; kdb5_util dump krb.db )
```
If you encrypted the dump, test if you can decrypt it; see the procedure below for restoration. Check (later) if we no longer need the "ov" crap. For CFT this table is empty.

Save the master key's stash file. Find the name in

$kdir/kdc.conf : key_stash_file = $kdir/.k5.$REALM

Get rid of any leftover non-LDAP database on Jacinth. For us, in $kdir: principal principal.ok db.dump* datatrans* from_master Old LDAP materials were moved into a jail directory. Also the key for krb-incr (jimc's incremental propagation script).
Update /etc/krb5/krb5.conf.m4 on Jacinth only. Changes made: commented out or deleted all the LDAP stuff, specifically database_module = openldap_ldapconf. $kdir/kdc.conf did not need any changes.
Do not create a new empty database because it will overwrite the stash file. However, the unencrypted master password is in $kdir/.files-key.CFT.CA.US if you lose the stash file. It's binary. (Probably not a good idea.)
Restore the dumped payload. If you are decrypting the saved backup, this is the procedure to do that. Execute as the recipient for which the dump is encrypted (jimc or bugs). Do it on a host on which your secret key is available and give your password when asked.
```
umask 077
cd $j   #A temporary directory
gpg -o krb.tgz /home/kerberos.bku/krb.tgz.1.gpg		# Config files
gpg -o krb.db /home/kerberos.bku/krb.db.1.gpg		# "principal" table
gpg -o krb.ov /home/kerberos.bku/krb.ov.1.gpg		# For us it's empty
```
Now execute this as root on the master site:
```
kdb5_util load $j/krb.db
```
This operation will wipe and overwrite any existing database. Remember to shred the decrypted dump when finished.
Start the KDC on Jacinth. It started with no complaints. Amazing.
Test if authentication is working using the KDC on Jacinth (use passwd-check). Yes it works on Jacinth. OK to test the slaves which are still using LDAP.

Converting the Slave KDCs

A recommended strategy is to configure the master and slaves identically. This means that if the master dies you can edit /etc/krb5.conf specifying a different machine as the master, and Kerberos will pick right up until you can order a replacement. But this means, for example, that kadm5.acl needs permission granted to all slaves to suck updates, but the master should be included in case of a future role switch. There are several places where permissions and credentials are included that are not strictly necessary, just so the configuration can be the same on all KDCs.

Turn off KDC on slaves (Diamond, Xena).
```
systemctl stop kerberos
```
On each master+slave, the default keytab should have the up-to-date host key for each master+slave, as conventional propagation authenticates with the host key. The key version (KVNO) has to be the same in all hosts' keytabs.
- On each dirsvr, execute kadmin.local and use the command: ktadd -norandkey host/xena.cft.ca.us. (Repeat for each KDC except the local host should already have its own host key.) If the local database is not readable (e.g. nonexistent), use kadmin to get the keys from the master (authentication required).
- krb-maint has been modified to extract the needed host keys and to use -norandkey. Command line, execute on the master:
```
krb-maint keytab -i -v $SLAVE
```
Get rid of any leftover non-LDAP database on slaves. For us: db.dump* principal principal.ok from_master. LDAP stuff was moved to a jail directory.
Do not create a new empty database because it will overwrite the stash file. However, the unencrypted master password is in $kdir/.files-key.CFT.CA.US if you lose the stash file. It's binary. (Probably not a good idea.)
Install on the slaves all non-database items. For us:
- /etc/krb5/krb5.conf (OK to use Jacinth's temporarily on the slaves)
- /etc/krb5/krb5.conf.m4
- /var/lib/kerberos/krb5kdc/kadm5.acl (added Kerberator from Jacinth)
- /var/lib/kerberos/krb5kdc/kdc.conf (only changed comments)
- /var/lib/kerberos/krb5kdc/kpropd.acl (authorizes each master+slave)
- Stash file is already uniform on the slaves.
Start kpropd on the slave. Don't enable Kerberos yet, just do:
```
systemctl enable kpropd
systemctl start kpropd
```
Oops, firewall problems. The slaves need to accept port 754 for conventional propagation, and the master needs port 750 (kerberos-iv) which I've stolen for incremental propagation. Open both ports on master+slave.
Use kprop on the master to propagate the database to each slave.
```
(umask 077 ; kdb5_util dump $j/krb.dump)
kprop -f $j/krb.dump xena
kprop -f $j/krb.dump diamond
```
My initial outcome was: kprop: Server rejected authentication (during sendauth exchange) while authenticating to server; Generic remote error: Wrong principal in request. The problem turned out to be that the slave's keytab had the wrong KVNO for the master, because I missed -norandkey. Fixed. Now the propagation happens.
Start the KDC on the slaves:
```
systemctl start kerberos
```
Use passwd-check to check operation of KDC on the slaves. Yes, all the slaves are delivering jimc's password.
Edit /m1/custom/scripts.dat to enable kpropd on dirsvr-pmaster.

Cleanup Steps

On all hosts, remove kerberos and kpropd in /m1/custom/restarter.conf (allowing them to be restarted.)
Run audit-scripts -v -c (to re-enable KDC and kpropd)
On Diamond, update post_jump files (configuration version control).

Setting Up Incremental Propagation

See: Incremental Propagation Parameters

Create a principal kiprop/$HOST for each master+slave. Execute kadmin (authentication required) or kadmin.local (on master only) and use the command
```
add_principal -randkey kiprop/$HOST
```
You need to propragate the database now to the slaves, before turning on incremental propagation, so they have the principal they are going to use to ask for incremental updates.
On the master, add kiprop/$HOST to $kdir/kadm5.acl with the "p" privilege. Not clear what $HOST is -- probably each of the slaves. To easily switch to a different master, include master+slave and install this file on all of them.
Hack /usr/diklo/sbin/krb-maint so kiprop/$HOST is included in dirsvr keytabs (just its own principal), and reinstall keytabs on all dirsvrs.
In $kdir/kdc.conf in the [realms] stanza(s) for our realm(s), insert these parameters:
```
iprop_enable = true	# Turns on incremental propagation
iprop_slave_poll = 2m	# How often slave polls master, default shown
iprop_port = 750	# Stolen from kerberos-iv, itself stolen
```
Leave the others at their defaults (and slave_poll too). Install kdc.conf on master+slaves.
Restart KDC and kpropd on each master+slave, starting with the master. Seeing iprop_enable it will start doing incremental propagation.

Issues During Testing

If, on the master, you do
```
kdb5_util dump $j/krb5.dump
kprop -f $j/krb5.dump xena
```
It says: kprop: Software caused connection abort while reading response from server. The reason is, kpropd execs kdb5_util load -i $kdir/from_master but the dump does not include the incremental serial number, and kdb5_util exits with code 1 saying: kdb5_util.exe: dump header bad in $kdir/from_master. If you screwed up, you'll have to do the command without -i (suggestion, with kdc and kpropd shut down, then restart them.)
Initially, incremental propagation did not happen. To debug, I stopped kpropd and then ran it by hand with the debug option:
```
/usr/lib/mit/sbin/kpropd -S -d  
```
This reveals that it tries to do:
```
Initializing kadm5 as client kiprop/xena.cft.ca.us@CFT.CA.US
kadm5 initialization failed!
/usr/lib/mit/sbin/kpropd: Communication failure with server while attempting to connect to master KDC ... retrying
Sleeping 4 seconds to re-initialize kadm5 (RPC ERROR)
```
What do you want to bet that port 750 is blocked by the firewall? Yes, the firewall log shows that it's using IPv4 and getting blocked. Fixed (on Jacinth, and now it uses IPv6). Another similar screwup was a misspelled realm in kadm5.acl producing "permission denied".
- Now kpropd is connected to kadm5 on Jacinth.
- It will do a full propagation if appropriate (actually saw it do this).
- It will do incremental propagation if appropriate (actually seen).
- Polls every 2 mins, but SIGUSR1 will make it poll instantly.
The only anomaly is, when testacct changes its password, passwd-check gets it right on the slaves even though the slave did not report checking for updates yet, and the test krb5.conf has no admin_server nor master_kdc, just the one KDC that is being tested. So where is it getting the updated password? The master's /var/log/krb5/kdc.log shows that it issued 3 tickets where 1 is expected, i.e. kinit somehow found the master and got a ticket from it.
Weasel! It did a DNS query for _kerberos-master._udp.cft.ca.us (jacinth). In the test krb5.conf that passwd-check creates, I added dns_lookup_kdc = false and now it gets the credential actually on the slave, i.e. fails until after incremental propagation happens 2 minutes later.
The issue here is an otherwise nice feature: if a client (e.g. kinit) uses a slave and the password is wrong, it will retry on the master, assuming that the user has recently changed his password and the database has not yet been propagated to the slave, which in this case is not a lie.
My laptop has a slave KDC, so I can log in even though the laptop is on a foreign net that the home net's firewall is blocking. Security professionals would be horrified, since if the laptop is stolen the thief can boot from a rescue disc, dump the Kerberos database, and then do dictionary attacks, not that the thief will make any progress that way. Nonetheless, in this case useability (being able to use the laptop) wins over security. I wondered if there would be a problem when I put the laptop to sleep, but on losing its connection to the master, kpropd promptly reopened it on awakening.