Strategic Plan for 2006-2007

James F. Carter, UCLA-Mathnet, 2006-09-30

With the start of the new academic year we need to think about what projects we want to do. I've listed my items, in approximate order of priority as seen by me.

SuSE 10.1 Deployment

To get the benefits of the new OS release, we need to get it deployed promptly.

All but 3 servers (Arachne, Testudo, Ulanda) are presently upgraded, and Arachne will get its turn very soon. The two backup machines are not in urgent need, but it's best to get the upgrades over with.
A few Bamboos have been upgraded. This needs to be pushed because of version skew: C++ programs compiled on v10.1 will use libstdc++-4.1.0 while v9.2 or v8.2 has libstdc++-5.0.x (older despite the higher number). If the program was compiled on v9.2, v10.1 has compat-libstdc++-5.0.7 so the program will run, but not vice versa. We need to make this problem go away by upgrading all the bamboos promptly. Non-bamboo compute nodes have already been upgraded.
We need to get v10.1 onto workstations, aggressively. Reasons:
- Our major benefit, besides the opportunity to clean house, is bug fixes. We need to get these deployed.
- Our handouts are going to describe the new release, not the old one.
- We plan some user interface improvements (see next section) which may or may not be compatible with the old version.
- We can't let this drag on until the next OS changeover, like we did with v8.2.

SuSE 10.1 on Workstations

The KDE and Gnome menus are full of useless stuff and they lack stuff useful for Mathnet. Also the FVWM menu needs to be reviewed again. We have a list of functions that we want on the menus; now we have to create the menus themselves and deploy them. At the same time, the login scripts should be reviewed thoroughly.

Mathnet Help System and Handouts

Our handouts and writeups are almost 100% from the last handout campaigns around 1992 and 1996, and are ridiculously out of date. We currently have a "handout" command to display them on a VT100, with lots of features. We should completely rewrite and revise all our handouts, presenting them as web documents.

Uniformity of System Files

When a Mathnet program or script is installed, invariably some machines are down, and they miss the update. We have had occasional bugs issues from this cause. We need a program to automatically review the installed files, and to report when a machine has an old version. The same program can serve a tripwire function, detecting programs that have been altered either accidentally or as a hostile act.

Tagging Wires in Machine Room

Every time someone has to mess with Ethernet cables in the machine room, it's a nightmare of tracing cables to make sure we have the right ones. At least they don't drape over the top of the racks like they used to. Charlie had a good idea, to mark each end of each wire with a serial number. Let's make this happen before the next network upheaval. It would also help to document which ports the various wires are supposed to be plugged into. How about adding to equip a field for the Cisco box and port number? The Cisco config file item for each port is supposed to have a comment telling where it goes. A script could compare these tables and report missing or inconsistent entries so we can fix them.

The Horde Invades

Presently our webmail is set up as a minimal installation, with no memory features such as user preferences, and only the IMP web mail application. I think we now have the infrastructure (sqlite, or possibly Microsoft SQL Server via Trifox Vortex) so we can save user information, and this means we can install additional Horde functions: calendar, address book and task list. Also at minimum, we should upgrade to the current version.

We need to learn whether PDA's can be synced with a Horde calendar or address book, and if so, we need to document the procedure.

Mitigating Cisco Spanning Tree Delay

Whenever a machine is connected directly to a Cisco port, when we need to do O.S. maintenance across the network, the installer invariably gets stuck because the Cisco box won't open its end of the port until 30 seconds after the host turns it on. This happens both when booting off the CD and with the (hoped to be) no hands procedure. Isn't it true that the spanning tree delay can be dispensed with on all except trunk ports? If so, let's do that.

LDAP to replace NIS

NIS was developed at the dawn of Sun Microsystems' successful foray into network computing. It has worked well, but the modern world has different values, e.g. more CPU power and network bandwidth to expend on directory services, and more dangerous and aggressive security threats. For the most part, the NIS maps are reasonable, that is, they give us information that we need and that we can keep up to date. LDAP is an alternative with better security, reliability and behavior under stress. There is a schema (set of tables) for LDAP which is essentially a drop-in replacement for NIS. We have a long-range plan to replace NIS with LDAP, and perhaps now is the time to get serious about this.

Spam Suppression

The amount of spam we receive is ridiculous. We want to reduce its effect on our system and on our users. Several alternatives have been discussed at various times. Here are my thoughts about what we ought to do about spam.

Research

First we need to assess the magnitude of the problem. Anecdotal counts, or counts from just one person, aren't going to be helpful; we need to survey the entire department. I propose to review recent history: the last 7 days of maillogs on the department MX, and for all user mailboxes, all messages sitting in mailboxes (including the spam mailbox(es)), received in the last 7 days, and received since messages were last deleted from the mailbox (which generally would be more recent than 7 days). I propose to collect these statistics (avoiding personally identifiable information):

Total number (rate) of messages received.
Fraction that were rejected by various Postfix tests on the MX. The remainder will have been delivered.
Number of messages surveyed in mailboxes.
Fraction in system vs. spam mailboxes.
Fraction that are virusgrams.
Fraction that have spam level tags. The remainder were not handled by spamassassin.
Distribution of spam levels, separately for mail in spam mailboxes and system mailboxes.

System-level Spam Suppression in General

Our governing standards and culture require that we act as a common carrier, not making judgments about content. We can (and do) reject messages that are clearly defective, e.g. where the sender address cannot be replied to, and where we can declare actual hazard, as with executable content, but we can't actually refuse to deliver anything else. This means that there is no way to mitigate our costs at the system level to handle spam. However, we can be helpful to our users in their own efforts to dispose of spam.

Blacklists

It's been pointed out that certain commercial blacklist organizations catch a lot of messages. Corporate culture is a lot different from University culture, and there is a question of political correctness. Before embracing one or more of these services we should check very carefully whether they are too zealous or whether hostile competitors can cause a denial of service by falsely accusing someone of sending spam. However, assuming the answer is favorable, Postfix on the MX can tag messages accordingly, and our standard user's procmailrc can (on the user's responsibility) send such messages to the spam mailbox. Edson posted a review in which the sysop used three such services, and the message was tagged/rejected if any one triggered. Counting messages that the services let through but that SpamAssassin tagged, that number was only about 5% of the number the services caught. (No data on how many SpamAssassin missed that the services caught, which of course depends on the cutoff level.)

System-Level Spam Recognition

It's been proposed that we run Sophos PureMessage on the MX. Let's compare PureMessage with SpamAssassin:

PureMessage may or may not be better than SpamAssassin at catching spam. With PureMessage we get frequent corporate updates of signatures; with SpamAssassin some of us get Bayesian adaptiveness. Any difference in effectiveness has to be tested empirically.
How much CPU power is actually going to spam detection? My anecdotal feeling is that it's not very much; this should be validated with real data. In any case, with PureMessage it would be concentrated on the MX, which could be a problem in fault situations. With SpamAssassin the CPU load is distributed to the various homedir servers.
Presently we use SpamAssassin in the user's account just before delivery; however, I believe it's possible to do system-level tagging with SpamAssassin, and even to have some degree of user customization if file permissions are right. If we choose to do that, which I'm not advocating.
How does PureMessage handle user customization, particularly cut levels, blacklists and whitelists? (Not that whitelists are very helpful in any case.)
Does PureMessage have a tag-only option? In the docs I read several years ago the emphasis was on spam diversion, but I'm sure that it can tag only.
How bad is the system administration burden going to be? I worried then that PureMessage is going to be a time sink, and I still worry.

Unless PureMessage performs significantly better than SpamAssassin in empirical tests, I'm not enthusiastic about system-level spam tagging.

System Level Spam Diversion

If we tag spam at the system level, e.g. using blacklists, then we could consider diverting it to a different mailbox. Let's call it /m1/spam/$user.$date, in other words, a public directory same as for the system mailboxes, with the mailboxes separated by date and purged after N days by us, not by the user. One of our major problems is inactive users acting as spam traps and filling up the filesystem; system level diversion and (delayed) deletion would take care of that issue. Also, that would take care of clueless users who don't set up spam filtering themselves: we're still delivering the mail, just in a different place.

I'm thinking of delivering to a directory belonging to Mathnet so we clearly have the right to rotate and delete the contents. An alternative is to deliver to, and rotate, ~user/Mail/spam. The advantage is that IMAP clients such as Pine and Horde/IMP can display it with no need to tell them where the spam mailbox is. The disadvantage is that we aren't supposed to be monkeying with files in a user's home directory.

PureMessage has a fairly elaborate mechanism for spam diversion. I wonder if it's too elaborate. I think the best is to deliver to a file on the homedir server in parallel with the regular system mailbox, and to read it with the same tools used for the regular mailbox.