Asus Transformer Pad Infinity
My Home Directory

Jim Carter, 2013-06-15

In the world of personal computing, each machine typically has a separate home directory for each relevant user, and software is installed separately on each host. This is how I have operated so far on my home network. However, with the new tablet I expect to face several issues involving my home directory:

I intend to keep my old laptop operational during a fairly long transition period in which I learn to use the tablet effectively. Both machines have to use the identical files, and manual synchronization is not going to be practical. [Update: 3 weeks later my laptop died, and I need access to my home directory quickly.]
It is inconvenient that my complete home directory is not accessible on the other machines on my network, e.g. my wife's machine, the central server, and the media playback nodes. So far I have lived with this lack, but improvement would be welcome.
At times I need to work with my home directory files from off-site. Presently this is feasible by downloading copies (possibly not the most recent version) from the backup server, but improvement would be welcome.
Both the tablet and the old laptop need to be fully functional even if they have no network connection, e.g. inside an airplane.

Network Filesystems

In around 1985 Sun Microsystems introduced their Sun-2 and Sun-3 line of workstations based on the Motorola MC68010 and MC68020. Their slogan was the network is the computer. Their then-new Network File System (NFS) was the heart of the strategy. A user's home directory, plus much of the shared software, resided on a file server and was sent via NFS to any of the workstations, so whichever one the user was at, the same content was available. UCLA-Mathnet adopted this strategy at that time and continues to use it to this day (2013).

The obvious solution to my requirements is a network filesystem, except for the last one. The requirements in my case for a network filesystem are:

Both Linux and Android must be able to mount it. Windows compatibility is not required. I am able to compile kernel modules that do not come with the standard Android (CyanogenMod) distribution.
It must be well supported, mature and reliable. No experimental solutions.
I have a more aggressive firewall than most people. Sanitary traversal of the firewall is required.
It's used from the wild side, and both credentials and data need to be encrypted. The ideal is intrinsic encryption, but an encrypted tunnel (e.g. IPSec) is also a possibility.
It must be transparent on the client side: anything requiring modification to the client software is disqualified.
Backups presumably will be on the server. If there are permission or format problems that preclude backups, it will be disqualified.
This is not a high performance application, but sluglike performance is not appreciated.
Simultaneous writing by multiple hosts will be rare, but it is valuable to avoid failures from this cause. The filesystem should include intrinsic locking or some similar mitigation strategy.
An important distinction is between block level and file level protocols. Although most of my use of the network filesystem will be to transfer entire files at once, a protocol oriented to this mode is less flexible than one which is equally good at transferring individual blocks, e.g. for a database.
At the block level, another distinction is between an over-layer that publishes any local filesystem on the net, versus one having its own on-disc format, versus accessing the server's disc as a raw block device.

Credible network filesystems include:

NFS, Sun's Network File System. This is the devil we know. It is very mature and is well understood. NFSv4 can operate over a single TCP connection. Encryption and Kerberos authentication are available; however, Android can't do Kerberos. Content is stored in any local filesystem format and NFS operates as an over-layer.
AFS, Andrew File System. This is mature (maybe archaic is the better term). Sites that have deployed it love it: Carnegie-Mellon, Stanford. The wire protocol is secure and is suitable for use from the wild side, e.g. from home or from student dormitories. Disadvantages are that it relies on Kerberos V4 authentication (obsolete), and it has its own disc format which nothing else can read.
CIFS, Common Internet File System. This is like NFS for Windows. It formerly was known as SMB and is served by the Samba suite for UNIX. The UNIX extensions allow symbolic and hard links and provide realistic UNIX functions, for when a UNIX client mounts from a Samba server on UNIX. Authentication is by a loginID and password or by Kerberos; however, if Samba is going to use PAM for host-integrated authentication it needs to receive the password in plain text over the unencrypted connection. (It could also accept hashed passwords which it compares against its own table.) Since the channel is not intrinsically encrypted, CIFS could only be used over a secure tunnel.
Coda and Intermezzo. Their special features are caching on the client, and aggressive concurrency control. Coda has its own on-disc format (if I remember correctly) while Intermezzo was re-written to be more like NFS, publishing any local filesystem. You rarely hear of them actually being used.
SSHFS. The client uses FUSE and the userspace daemon opens a SSH/SFTP connection to the server, which is intrinsically encrypted. This protocol is mainly intended for transferring entire files, so random access (databases) can be a problem. SSHFS takes the least system setup of all the network filesystems.
iSCSI or AoE. These provide a remote block device, SCSI or ATA respectively. While they are favorites in SAN storage appliances, I have only one disc in my server, and I don't intend to partition it like we used to do for Sun's ND protocol. Also the ND model assumes only one writer, not sharing between multiple clients.
GFS or GFS2. Here you have a network block device as with iSCSI, and a lock daemon so multiple clients can access it without conflict and can instantly see changes made by other clients.
Dropbox and friends. On the client it acts as an over-layer over one of your directories. Any files created or altered there are copied to the cloud server, and are retrieved (or pushed?) on other devices mounting your directory. I object to Dropbox politically, requiring that my data be stored on a server under my administrative control; also the closed-source binary driver looks very ominous. You can get a few Gb of space for free, or you can pay for more.

I would really like the Dropbox model: the network mounted directory is available on each client locally, and is synced automatically to and from each participating client. When there is no network connection the syncing has to wait until the network returns. If two clients change the same file before syncing, this should be detected but automatic resolution is too much work to be promised.

It looks like the surviving contenders are NFS, CIFS and SSHFS. AFS is too obsolete and its authentication cannot be supported. Intermezzo is not widely deployed and has no Android client. iSCSI is for a SAN, not for multiple clients sharing the same filesystem. GFS is intriguing but there is no Android client (though the kernel modules could be built). Dropbox is politically unacceptable.

Client Side Caching

There is a thing called CacheFS.

It is a Janus over-layer: the client mounts it over a local directory where it stores the cached content in its own special format, and also over another filesystem providing the content. This is typically NFS or SSHFS, but any filesystem can be used and CacheFs can be useful for slow physical media like CDs.
When read the first time, a file is retrieved from the NFS server and saved locally; subsequently the local copy is read, which is faster. Writing goes to the remote filesystem, giving no speed advantage but providing instant sharing; CacheFS can be configured to either write also to the local copy or to invalidate it.
Clients review inodes periodically and invalidate their local copies if the mod date has changed. (Push-type notification would seem useful but is not available from NFS.)
The client does not have to provide space for the entire remote filesystem; when local occupancy exceeds a configured limit, CacheFS will invalidate least recently used items.
There is no option to enforce caching the entire remote filesystem, which is the model I am looking for. However, in principle I could write a script that scans the two underlying filesystems periodically and forces retrieval of missing items, possibly even using push technology.
CacheFS is documented by Oracle (Solaris), IBM, and Cray (Unicos).
In 2010 some people did a project at Google called cachefs which is different from this. They are using RAM, a solid state disc, and a rotating disc as a three-level cache to give blindingly fast file delivery (on cache hits only).

The main benefit of CacheFS is speed on the client for repeatedly read files. While this may be useful to me when I run over a slow network link, I think I should defer CacheFS until the remote filesystem is nailed down.

Where to find CacheFS:

The package name is cachefilesd for the userspace daemon. It is available on the SuSE Build Service but is not in the main distro.
The kernel modules are fscache.ko and cachefiles.ko. They are documented in src/linux/Documentation/filesystems/caching/fscache.txt and rc/linux/Documentation/filesystems/caching/cachefiles.txt .

Problems with Authentication and Authorization

If I use NFS, there is an ugly fly in the ointment: authentication and authorization, specifically whether the client is permitted to read or write the files on the server. Here are some of the issues for NFS:

In the original SunOS-3 you were expected to authenticate users with Sun's Yellow Pages, later renamed Network Information Service or NIS due to trademark issues. In this way a particular numeric UID referred to the same user on all participating hosts. The server trusted the client to honestly report the numeric UID of the executing user, so the server could use normal UNIX facilities to make the access decision. This was before the Morris Worm and its various devilish descendants.
With NFSv4 the client normally maps the numeric UID and group ID to an alphabetic string, and the server maps it back to numeric, so the numeric UIDs don't have to be in sync on the client and server, only the alphabetic loginIDs must be consistent. The server can map between alphabetic and numeric IDs using NIS, or LDAP, or a static map in /etc/idmapd.conf. Or mapping can be omitted in the style of NFSv3 and earlier.
All reasonable UNIX distros can handle user identification via NIS or LDAP. However, Android is not a reasonable UNIX distro.
In Android user identification has been turned inside out: each app has a user ID and group ID which governs its permissions to read or write various local filesystems. There is normally only one executing user, ever, and he has no identity.
The Android client for CIFS is called CIFSManager. It records in its database the loginID and password on the remote Windows server; CIFS accepts this style of authentication. NFS authentication is handled on the honor system: the client uses an arbitrary (configurable) UID and the server presumably exports to the Android client only directories suitable to be used by this user, e.g. publicly readable material.
There is another app that does CIFS and NFS: Mount Manager. But a lot of reviewers complain that it doesn't work for them; some say that direct NFS mounts or CIFSManager does work, showing that the problem is not lack of the needed kernel modules.
This post on Android Forums, OP jimsmith80 (2013-02-19): he suggests this style of mount command:
busybox mount -o nolock,ro,hard,intr,vers=3 -t nfs 192.168.1.128:/home/jim /mnt/sdcard/Network

The options would bypass the need for Android to run a lock daemon, and would downgrade to NFS version 3 which never uses alphabetic user IDs, only numeric.
From the changelog of CIFSManager posted on XDA-Developers: Starting in v1.1 (2010-08-31): Specify a NFS share path as host:/path. Username and password are ignored. This kind of authentication is not going to be satisfactory.
CyanogenMod-10.1 for TF700T has the CIFS and NFS (client) modules hardwired into the kernel. Many (all?) stock OS kernels lack them, and there are many forum postings about where to find modules that work with particular kernels.

Picking the Network Filesystem

Here is a summary of the characteristics of the three network filesystems.

Feature	NFS	CIFS	SSHFS	SSHFS/Debian
Style	Block overlayer	Block overlayer	File overlayer	File overlayer
Authentication	Honor system	LoginID+PW	SSH RSA key or password	Normal SSH
Android Client	CIFSManager	CIFSManager	SSHFSDroid	DebianKit
Maturity	Well liked	Well liked	Alpha level, €2	Mature
Complaints	No authentication: unacceptable.	Stored password: stealable.	File level; alpha client is the kiss of death.	Entire OS just for mounting?

NFS is not going to fly, but I'm going to seriously investigate CIFS and SSHFS in Debian.

Making CIFS Work

Follow the link for details of my experience with CIFS. To summarize:

It works, barely.
You have to turn off UNIX extensions; it's not a real simulated UNIX filesystem.
I'm worried about custody of the authentication secret (password). I made a separate password just for CIFS, but even so I don't like storing my password in plain text in the app's database.
Speed of access is far from swift, and in particular, apps actually time out when opening my home directory. (This may not be CIFS's fault specifically.)

I will be very happy if I can use SSHFS and junk CIFS.

SSHFS for Mounting the Homedir

I already decided that SSHFSDroid was unacceptable, but in a forum post I spotted a suggestion to install Debian using DebianKit by Sven-Ola Tuecke, and use Debian's sshfs. Follow the link for the results of the Debian experiment including working command lines and procedures. To summarize:

It works. Really works, not barely works.
SSH is intrinsically encrypted, so setup is simpler and less vulnerable to user errors.
It acts like a real UNIX filesystem, although all Android users get the access rights of the remote user that is providing the files.
SSHFS speed is noticeably slower than NFS at work, comparable to CIFS on Android.
I need to investigate the timeouts opening my homedir, which probably affect SSHFS equally as CIFS.
I need to use sshfs/Debian longer to detect any flies hiding in the ointment.
This approach is like killing ants with a sledgehammer: putting in a whole other OS just to get sshfs seems rather excessive. But…
Debian makes other much wanted features available (after I do some work setting them up) including:
- Kerberos authentication
- Multiple windows
- My teddy editor
- SSH key management for all (Debian) apps
- Easy implementation of my normal backup procedure

So far, with limited experience, it looks like I will be able to make SSHFS on Debian my normal mode of mounting my home directory on Android.

Home Directory for Non-Android Clients

It's a solved problem to export my home directory via NFS to real UNIX clients. There are, however, a number of issues that need to be dealt with.

Various desktop software creates a ridiculous collection of dot files and directories, of which .mozilla and .cache/mozilla contain more garbage than the entire rest of the homedir. There are 172 dot files and 121 non-dot files (or directories). If I could avoid seeing the dot files on Android, it would greatly ease the long delay opening the homedir, and would be helpful as well for the real UNIX clients.
Only a few directories are really active. I need to have two parallel homedir structures, one with active content and one archival. I should plan on moving directories between them as projects and interests change.
My ~/public_html directory has evolved like fungus, and particularly needs to be neatened up. There are 110 items at the top level. It relies on a lot of symbolic links to directories outside webspace, requiring that Apache be configured to follow symlinks. To be easily editable on Android, it needs to be reorganized with a few intermediate directories. (And useless cruft tossed.)
Similarly the ~/misc directory has too many toplevel items and needs more intermediate directories. And misc is such a vague semantic tag.
I require that root should be able to log in and work normally even if the machine cannot mount NFS filesystems. (It is not really necessary that jimc be functional.) This has two implications: the traditional 30 second NFS timeout has to be shortened drastically, like to 2 or 3 seconds, and /usr/diklo/default/path.sh should only add ~$LOGNAME/bin to the path if it exists, requiring only one NFS timeout to discover that it's missing.
I'm planning to have at the top level only dot files and a few major directories, of which one will be public_html. The Android client can either mount the desired major directory, or the whole homedir but it should be able to jump over the toplevel directory without statting all the dot files.
NFS exports directories, not filesystems. I'm going to reorganize /home to contain only home directories. So where do the rest go? There are quite a lot of directories, some large, like ADT and images.
Do I want one copy of /usr/diklo (local software) mounted by NFS? Or an individual copy per machine like I have now? The latter makes it a lot easier to work on a machine with a flaky network connection. As now, I need to back up only the master copy.

Asus Transformer Pad Infinity My Home Directory