Former versions of MIT Kerberos could not do incremental propagation, so we set up various substitute schemes. The most recent was to use the LDAP database backend and to rely on LDAP incremental propagation.
See here for how to set up the LDAP database backend.
LDAP replication has worked well so far.
But Kerberos puts incredible load on LDAP, lots of queries with lots of result rows. This is particularly annoying during password guessing attacks, which are incessant.
During system upgrades, the sysadmin may forget that Kerberos depends on LDAP, and waste time debugging Kerberos issues before LDAP has been re-activated.
In OpenSuSE 13.1 we have krb5-1.11.3 dated about 2013-11-xx. It would appear that MIT Kerberos is now able to do incremental replication. So we want to revert from the LDAP backend to native files, turning on native incremental replication.
About examples in this writeup: the hostnames and realm refer to jimc's home network. The reader will need to substitute his own realm and KDC hostnames. Also some steps refer to local features such as /m1/custom/restarter.conf , which are absent on a stock OS. This SuSE system uses systemd (systemctl) for booting and system process control. If you have LSB startup scripts (e.g. /etc/init.d/kerberos) or upstart, substitute the appropriate commands for your process control framework.
kprod propagation daemon: This is for Kerberos version 1.14 (2014) but has not changed a lot since 1.11. kpropd runs on the slave and is normally socket activated. In conventional propagation a cron job on the master executes kprop occasionally, sending a full dump to kpropd.
When using kprop I hit this error:
The master authenticates as host/$MASTER@$REALM, and the slave needs this principal's key in its default keytab -- with the same key version (KVNO). I failed to use the -norandkey option of ktadd (in kadmin) when extracting the key to the slave's keytab.kprop: Server rejected authentication (during sendauth exchange) while authenticating to server Generic remote error: Wrong principal in request
If incremental propagation is enabled, kpropd runs continually on the slave and periodically polls kadmind on the master. kdc.conf parameter iprop_enable turns it on; iprop_slave_poll is the poll interval, default is 2 minutes. The slave's keytab must include kiprop/$SLAVE@$REALM and /var/krb5kdc/kpropd.acl needs the principal as which the master is sending the updates.
If incremental propagation is enabled, kprop can no longer be used. The
symptom is this error mesage:
kprop: Software caused connection abort while reading response from server
.
This message means that kpropd received the dump that kprop sent, and then
execed kdb5_util load -i $kdir/from_master
, which returned a nonzero
exit code. Various issues could make kdb5_util fail, but in this case the -i
option means to look for the serial number of the last entry in the update
log that is included in the dump, so kpropd knows where to start polling from,
but the dump lacks the serial number.
To debug propagation the program kproplog is handy. In case you don't
have convenient symbolic links, its full path (for SuSE) is
/usr/lib/mit/sbin/kproplog .
kproplog -h
gives just the serial numbers and timestamps of the first and last updates.
On the master only, kproplog -e 5 -v
gives more details of the
last N updates.
Table of parameters for incremental propagation. For incremental propagation, the default poll interval is 2 mins, and there is a lockout of 10 secs; if the DB was modified this recently the slave (kpropd) will pause and retry. If a principal's policy is altered, it cannot be propagated incrementally and a full update will be done.
Tidbit: For cross-realm trust, MIT recommends the password be 26 bytes of random ASCII text. Assuming this means an alphabet of 96 choices, that would be about 168 bits.
Tidbit: How to destroy a realm in LDAP: You have to do this while the
krb5.conf file has the LDAP parameters; otherwise the error message will be
LDAP container not specified
or something like that.
Do it on any server and rely on LDAP
replication to delete entries on the other servers. You will need to type
in the password from ldaproot, generally stored in /etc/ldap.secret and/or
/etc/openldap/ldap.secret. (It doesn't know about -y.) For a big realm it may
run as long as a minute.
To test if it's gone (do before and after) (it will print nothing, if the realm is gone):kdb5_ldap_util -D $rootdn -H ldaps://hostname destroy -r REALM
ldapsearch -x -H ldaps://jacinth.cft.ca.us \ -D uid=ldaproot,dc=cft,dc=ca,dc=us -y /etc/ldap.secret \ -b cn=CFT.CA.US,cn=Kerberos,dc=cft,dc=ca,dc=us -LLL \ '(krbPrincipalName=jimc@CFT.CA.US)' krbLastSuccessfulAuth
Outcome: it says kdb5_ldap_util: Realm Delete FAILED: Operation not
allowed on non-leaf deleting database of 'CFT.CA.US'
.
But the stuffing is gone even so, on all 3 servers due to LDAP incremental
propagation. I think it tried to remove the realm container before it was
empty.
It left these objects in place, all of which I formerly created by hand:
These web pages describe the procedure to set up a new KDC:
I re-created new
KDCs with a native file backend, copied the
payload to them, and then removed the LDAP instances.
On a SuSE system, most relevant files are in /var/lib/kerberos/krb5kdc Call this $kdir .
On all hosts, add kerberos to /m1/custom/restarter.conf and disable it. You don't want it to be restarted when the database is absent.
systemctl disable kerberos
Turn off KDC on Jacinth (the master site).
systemctl stop kerberos
Dump the database on Jacinth, or use a recent dump, which is in /home/kerberos.bku/krb.db.1.gpg (also ov). Actually an up-to-the-minute dump is recommended particularly for propagating to the slaves once you get them converted. To make one:
(umask 077; kdb5_util dump krb.db )
If you encrypted the dump, test if you can decrypt it; see the procedure below for restoration. Check (later) if we no longer need the "ov" crap. For CFT this table is empty.
Save the master key's stash file. Find the name in
$kdir/kdc.conf : key_stash_file = $kdir/.k5.$REALM
Get rid of any leftover non-LDAP database on Jacinth. For us, in $kdir: principal principal.ok db.dump* datatrans* from_master Old LDAP materials were moved into a jail directory. Also the key for krb-incr (jimc's incremental propagation script).
Update /etc/krb5/krb5.conf.m4 on Jacinth only. Changes made:
commented out or deleted all the LDAP stuff, specifically
database_module = openldap_ldapconf
.
$kdir/kdc.conf did not need any changes.
Do not create
a new empty database because it will overwrite the
stash file. However, the unencrypted master password is in
$kdir/.files-key.CFT.CA.US if you lose the stash file. It's binary.
(Probably not a good idea.)
Restore the dumped payload. If you are decrypting the saved backup, this is the procedure to do that. Execute as the recipient for which the dump is encrypted (jimc or bugs). Do it on a host on which your secret key is available and give your password when asked.
umask 077 cd $j #A temporary directory gpg -o krb.tgz /home/kerberos.bku/krb.tgz.1.gpg # Config files gpg -o krb.db /home/kerberos.bku/krb.db.1.gpg # "principal" table gpg -o krb.ov /home/kerberos.bku/krb.ov.1.gpg # For us it's empty
Now execute this as root on the master site:
This operation will wipe and overwrite any existing database. Remember to shred the decrypted dump when finished.kdb5_util load $j/krb.db
Start the KDC on Jacinth. It started with no complaints. Amazing.
Test if authentication is working using the KDC on Jacinth (use passwd-check). Yes it works on Jacinth. OK to test the slaves which are still using LDAP.
A recommended strategy is to configure the master and slaves identically. This means that if the master dies you can edit /etc/krb5.conf specifying a different machine as the master, and Kerberos will pick right up until you can order a replacement. But this means, for example, that kadm5.acl needs permission granted to all slaves to suck updates, but the master should be included in case of a future role switch. There are several places where permissions and credentials are included that are not strictly necessary, just so the configuration can be the same on all KDCs.
Turn off KDC on slaves (Diamond, Xena).
systemctl stop kerberos
On each master+slave, the default keytab should have the up-to-date host key for each master+slave, as conventional propagation authenticates with the host key. The key version (KVNO) has to be the same in all hosts' keytabs.
On each dirsvr, execute kadmin.local and use the command:
ktadd -norandkey host/xena.cft.ca.us
. (Repeat for each KDC
except the local host should already have its own host key.)
If the local database is not readable (e.g. nonexistent), use kadmin
to get the keys from the master (authentication required).
krb-maint has been modified to extract the needed host keys and to use -norandkey. Command line, execute on the master:
krb-maint keytab -i -v $SLAVE
Get rid of any leftover non-LDAP database on slaves. For us: db.dump* principal principal.ok from_master. LDAP stuff was moved to a jail directory.
Do not create
a new empty database because it will overwrite the
stash file. However, the unencrypted master password is in
$kdir/.files-key.CFT.CA.US if you lose the stash file. It's binary.
(Probably not a good idea.)
Install on the slaves all non-database items. For us:
Start kpropd on the slave. Don't enable Kerberos yet, just do:
systemctl enable kpropd systemctl start kpropd
Oops, firewall problems. The slaves need to accept port 754 for conventional propagation, and the master needs port 750 (kerberos-iv) which I've stolen for incremental propagation. Open both ports on master+slave.
Use kprop on the master to propagate the database to each slave.
(umask 077 ; kdb5_util dump $j/krb.dump) kprop -f $j/krb.dump xena kprop -f $j/krb.dump diamond
My initial outcome was: kprop: Server rejected authentication (during
sendauth exchange) while authenticating to server;
Generic remote error: Wrong principal in request
.
The problem turned out to be that the slave's keytab had the wrong KVNO
for the master, because I missed -norandkey. Fixed.
Now the propagation happens.
Start the KDC on the slaves:
systemctl start kerberos
Use passwd-check to check operation of KDC on the slaves. Yes, all the slaves are delivering jimc's password.
Edit /m1/custom/scripts.dat to enable kpropd on dirsvr-pmaster.
On all hosts, remove kerberos and kpropd in /m1/custom/restarter.conf (allowing them to be restarted.)
Run audit-scripts -v -c (to re-enable KDC and kpropd)
On Diamond, update post_jump files (configuration version control).
See: Incremental Propagation Parameters
Create a principal kiprop/$HOST for each master+slave. Execute kadmin (authentication required) or kadmin.local (on master only) and use the command
add_principal -randkey kiprop/$HOST
You need to propragate the database now to the slaves, before turning on incremental propagation, so they have the principal they are going to use to ask for incremental updates.
On the master, add kiprop/$HOST to $kdir/kadm5.acl with the "p" privilege. Not clear what $HOST is -- probably each of the slaves. To easily switch to a different master, include master+slave and install this file on all of them.
Hack /usr/diklo/sbin/krb-maint so kiprop/$HOST is included in dirsvr keytabs (just its own principal), and reinstall keytabs on all dirsvrs.
In $kdir/kdc.conf in the [realms] stanza(s) for our realm(s), insert these parameters:
Leave the others at their defaults (and slave_poll too). Install kdc.conf on master+slaves.iprop_enable = true # Turns on incremental propagation iprop_slave_poll = 2m # How often slave polls master, default shown iprop_port = 750 # Stolen from kerberos-iv, itself stolen
Restart KDC and kpropd on each master+slave, starting with the master. Seeing iprop_enable it will start doing incremental propagation.
If, on the master, you do
It says:kdb5_util dump $j/krb5.dump kprop -f $j/krb5.dump xena
kprop: Software caused connection abort while reading response from server. The reason is, kpropd execs
kdb5_util load -i $kdir/from_masterbut the dump does not include the incremental serial number, and kdb5_util exits with code 1 saying:
kdb5_util.exe: dump header bad in $kdir/from_master. If you screwed up, you'll have to do the command without -i (suggestion, with kdc and kpropd shut down, then restart them.)
Initially, incremental propagation did not happen. To debug, I stopped kpropd and then ran it by hand with the debug option:
This reveals that it tries to do:/usr/lib/mit/sbin/kpropd -S -d
What do you want to bet that port 750 is blocked by the firewall? Yes, the firewall log shows that it's using IPv4 and getting blocked. Fixed (on Jacinth, and now it uses IPv6). Another similar screwup was a misspelled realm in kadm5.acl producing "permission denied".Initializing kadm5 as client kiprop/xena.cft.ca.us@CFT.CA.US kadm5 initialization failed! /usr/lib/mit/sbin/kpropd: Communication failure with server while attempting to connect to master KDC ... retrying Sleeping 4 seconds to re-initialize kadm5 (RPC ERROR)
The only anomaly is, when testacct changes its password, passwd-check gets it right on the slaves even though the slave did not report checking for updates yet, and the test krb5.conf has no admin_server nor master_kdc, just the one KDC that is being tested. So where is it getting the updated password? The master's /var/log/krb5/kdc.log shows that it issued 3 tickets where 1 is expected, i.e. kinit somehow found the master and got a ticket from it.
Weasel! It did a DNS query for _kerberos-master._udp.cft.ca.us
(jacinth). In the test krb5.conf that passwd-check creates, I added
dns_lookup_kdc = false
and now it gets the credential actually
on the slave, i.e. fails until after incremental propagation happens
2 minutes later.
The issue here is an otherwise nice feature: if a client (e.g. kinit) uses a slave and the password is wrong, it will retry on the master, assuming that the user has recently changed his password and the database has not yet been propagated to the slave, which in this case is not a lie.
My laptop has a slave KDC, so I can log in even though the laptop is on a foreign net that the home net's firewall is blocking. Security professionals would be horrified, since if the laptop is stolen the thief can boot from a rescue disc, dump the Kerberos database, and then do dictionary attacks, not that the thief will make any progress that way. Nonetheless, in this case useability (being able to use the laptop) wins over security. I wondered if there would be a problem when I put the laptop to sleep, but on losing its connection to the master, kpropd promptly reopened it on awakening.