Selection | Setup | Testing | Software | Homedir | Top |
In the world of personal computing, each machine typically has a separate home directory for each relevant user, and software is installed separately on each host. This is how I have operated so far on my home network. However, with the new tablet I expect to face several issues involving my home directory:
I intend to keep my old laptop operational during a fairly long transition period in which I learn to use the tablet effectively. Both machines have to use the identical files, and manual synchronization is not going to be practical. [Update: 3 weeks later my laptop died, and I need access to my home directory quickly.]
It is inconvenient that my complete home directory is not accessible on the other machines on my network, e.g. my wife's machine, the central server, and the media playback nodes. So far I have lived with this lack, but improvement would be welcome.
At times I need to work with my home directory files from off-site. Presently this is feasible by downloading copies (possibly not the most recent version) from the backup server, but improvement would be welcome.
Both the tablet and the old laptop need to be fully functional even if they have no network connection, e.g. inside an airplane.
In around 1985 Sun Microsystems introduced their Sun-2 and Sun-3 line of
workstations based on the Motorola MC68010 and MC68020. Their slogan was
the network is the computer
. Their then-new Network File
System (NFS) was the heart of the strategy. A user's home directory, plus
much of the shared software, resided on a file server and was sent via NFS
to any of the workstations, so whichever one the user was at, the same content
was available. UCLA-Mathnet adopted this strategy at that time and continues
to use it to this day (2013).
The obvious solution to my requirements is a network filesystem, except for the last one. The requirements in my case for a network filesystem are:
Both Linux and Android must be able to mount it. Windows compatibility is not required. I am able to compile kernel modules that do not come with the standard Android (CyanogenMod) distribution.
It must be well supported, mature and reliable. No experimental solutions.
I have a more aggressive firewall than most people. Sanitary traversal of the firewall is required.
It's used from the wild side, and both credentials and data need to be encrypted. The ideal is intrinsic encryption, but an encrypted tunnel (e.g. IPSec) is also a possibility.
It must be transparent on the client side: anything requiring modification to the client software is disqualified.
Backups presumably will be on the server. If there are permission or format problems that preclude backups, it will be disqualified.
This is not a high performance application, but sluglike performance is not appreciated.
Simultaneous writing by multiple hosts will be rare, but it is valuable to avoid failures from this cause. The filesystem should include intrinsic locking or some similar mitigation strategy.
An important distinction is between block level and file level protocols. Although most of my use of the network filesystem will be to transfer entire files at once, a protocol oriented to this mode is less flexible than one which is equally good at transferring individual blocks, e.g. for a database.
At the block level, another distinction is between an over-layer that publishes any local filesystem on the net, versus one having its own on-disc format, versus accessing the server's disc as a raw block device.
Credible network filesystems include:
NFS, Sun's Network File System. This is the devil we know. It is very mature and is well understood. NFSv4 can operate over a single TCP connection. Encryption and Kerberos authentication are available; however, Android can't do Kerberos. Content is stored in any local filesystem format and NFS operates as an over-layer.
AFS, Andrew File System. This is mature (maybe archaic is the better term). Sites that have deployed it love it: Carnegie-Mellon, Stanford. The wire protocol is secure and is suitable for use from the wild side, e.g. from home or from student dormitories. Disadvantages are that it relies on Kerberos V4 authentication (obsolete), and it has its own disc format which nothing else can read.
CIFS, Common Internet File System. This is like NFS for Windows. It formerly was known as SMB and is served by the Samba suite for UNIX. The UNIX extensions allow symbolic and hard links and provide realistic UNIX functions, for when a UNIX client mounts from a Samba server on UNIX. Authentication is by a loginID and password or by Kerberos; however, if Samba is going to use PAM for host-integrated authentication it needs to receive the password in plain text over the unencrypted connection. (It could also accept hashed passwords which it compares against its own table.) Since the channel is not intrinsically encrypted, CIFS could only be used over a secure tunnel.
Coda and Intermezzo. Their special features are caching on the client, and aggressive concurrency control. Coda has its own on-disc format (if I remember correctly) while Intermezzo was re-written to be more like NFS, publishing any local filesystem. You rarely hear of them actually being used.
SSHFS. The client uses FUSE and the userspace daemon opens a SSH/SFTP connection to the server, which is intrinsically encrypted. This protocol is mainly intended for transferring entire files, so random access (databases) can be a problem. SSHFS takes the least system setup of all the network filesystems.
iSCSI or AoE. These provide a remote block device, SCSI or ATA respectively. While they are favorites in SAN storage appliances, I have only one disc in my server, and I don't intend to partition it like we used to do for Sun's ND protocol. Also the ND model assumes only one writer, not sharing between multiple clients.
GFS or GFS2. Here you have a network block device as with iSCSI,
and a lock daemon so multiple clients can access it without conflict
and can instantly
see changes made by other clients.
Dropbox and friends. On the client it acts as an over-layer over one of your directories. Any files created or altered there are copied to the cloud server, and are retrieved (or pushed?) on other devices mounting your directory. I object to Dropbox politically, requiring that my data be stored on a server under my administrative control; also the closed-source binary driver looks very ominous. You can get a few Gb of space for free, or you can pay for more.
I would really like the Dropbox model: the network mounted directory is available on each client locally, and is synced automatically to and from each participating client. When there is no network connection the syncing has to wait until the network returns. If two clients change the same file before syncing, this should be detected but automatic resolution is too much work to be promised.
It looks like the surviving contenders are NFS, CIFS and SSHFS. AFS is too obsolete and its authentication cannot be supported. Intermezzo is not widely deployed and has no Android client. iSCSI is for a SAN, not for multiple clients sharing the same filesystem. GFS is intriguing but there is no Android client (though the kernel modules could be built). Dropbox is politically unacceptable.
There is a thing called CacheFS.
It is a Janus
over-layer: the client mounts it over a local
directory where it stores the cached content in its own special format,
and also over another filesystem providing the content. This is
typically NFS or SSHFS, but any filesystem can be used and CacheFs can
be useful for slow physical media like CDs.
When read the first time, a file is retrieved from the NFS server and saved locally; subsequently the local copy is read, which is faster. Writing goes to the remote filesystem, giving no speed advantage but providing instant sharing; CacheFS can be configured to either write also to the local copy or to invalidate it.
Clients review inodes periodically and invalidate their local copies if the mod date has changed. (Push-type notification would seem useful but is not available from NFS.)
The client does not have to provide space for the entire remote filesystem; when local occupancy exceeds a configured limit, CacheFS will invalidate least recently used items.
There is no option to enforce caching the entire remote filesystem, which is the model I am looking for. However, in principle I could write a script that scans the two underlying filesystems periodically and forces retrieval of missing items, possibly even using push technology.
CacheFS is documented by Oracle (Solaris), IBM, and Cray (Unicos).
In 2010 some people did a project at Google called cachefs which is different from this. They are using RAM, a solid state disc, and a rotating disc as a three-level cache to give blindingly fast file delivery (on cache hits only).
The main benefit of CacheFS is speed on the client for repeatedly read files. While this may be useful to me when I run over a slow network link, I think I should defer CacheFS until the remote filesystem is nailed down.
Where to find CacheFS:
The package name is cachefilesd
for the userspace daemon.
It is available on the SuSE Build Service but is not in the main
distro.
The kernel modules are fscache.ko and cachefiles.ko. They are documented in src/linux/Documentation/filesystems/caching/fscache.txt and rc/linux/Documentation/filesystems/caching/cachefiles.txt .
If I use NFS, there is an ugly fly in the ointment: authentication and authorization, specifically whether the client is permitted to read or write the files on the server. Here are some of the issues for NFS:
In the original SunOS-3 you were expected to authenticate users
with Sun's Yellow Pages
, later renamed Network Information
Service
or NIS due to trademark issues. In this way a
particular numeric UID referred to the same user on all participating
hosts. The server trusted the client to honestly report the numeric
UID of the executing user, so the server could use normal UNIX
facilities to make the access decision. This was before the Morris
Worm and its various devilish descendants.
With NFSv4 the client normally maps the numeric UID and group ID to an alphabetic string, and the server maps it back to numeric, so the numeric UIDs don't have to be in sync on the client and server, only the alphabetic loginIDs must be consistent. The server can map between alphabetic and numeric IDs using NIS, or LDAP, or a static map in /etc/idmapd.conf. Or mapping can be omitted in the style of NFSv3 and earlier.
All reasonable UNIX distros can handle user identification via NIS or LDAP. However, Android is not a reasonable UNIX distro.
In Android user identification has been turned inside out: each app has a user ID and group ID which governs its permissions to read or write various local filesystems. There is normally only one executing user, ever, and he has no identity.
The Android client for CIFS is called CIFSManager. It records in its database the loginID and password on the remote Windows server; CIFS accepts this style of authentication. NFS authentication is handled on the honor system: the client uses an arbitrary (configurable) UID and the server presumably exports to the Android client only directories suitable to be used by this user, e.g. publicly readable material.
There is another app that does CIFS and NFS: Mount Manager. But a lot of reviewers complain that it doesn't work for them; some say that direct NFS mounts or CIFSManager does work, showing that the problem is not lack of the needed kernel modules.
This post on Android Forums, OP jimsmith80 (2013-02-19): he suggests this style of mount command:
busybox mount -o nolock,ro,hard,intr,vers=3 -t nfs 192.168.1.128:/home/jim /mnt/sdcard/Network
The options would bypass the need for Android to run a lock daemon, and would downgrade to NFS version 3 which never uses alphabetic user IDs, only numeric.
From the changelog of CIFSManager posted on XDA-Developers: Starting in v1.1 (2010-08-31): Specify a NFS share path as host:/path. Username and password are ignored. This kind of authentication is not going to be satisfactory.
CyanogenMod-10.1 for TF700T has the CIFS and NFS (client) modules hardwired into the kernel. Many (all?) stock OS kernels lack them, and there are many forum postings about where to find modules that work with particular kernels.
Here is a summary of the characteristics of the three network filesystems.
Feature | NFS | CIFS | SSHFS | SSHFS/Debian |
---|---|---|---|---|
Style | Block overlayer | Block overlayer | File overlayer | File overlayer |
Authentication | Honor system | LoginID+PW | SSH RSA key or password | Normal SSH |
Android Client | CIFSManager | CIFSManager | SSHFSDroid | DebianKit |
Maturity | Well liked | Well liked | Alpha level, €2 | Mature |
Complaints | No authentication: unacceptable. | Stored password: stealable. | File level; alpha client is the kiss of death. | Entire OS just for mounting? |
NFS is not going to fly, but I'm going to seriously investigate CIFS and SSHFS in Debian.
Follow the link for details of my experience with CIFS. To summarize:
I will be very happy if I can use SSHFS and junk CIFS.
I already decided that SSHFSDroid was unacceptable, but in a forum post I spotted a suggestion to install Debian using DebianKit by Sven-Ola Tuecke, and use Debian's sshfs. Follow the link for the results of the Debian experiment including working command lines and procedures. To summarize:
usersget the access rights of the remote user that is providing the files.
teddyeditor
So far, with limited experience, it looks like I will be able to make SSHFS on Debian my normal mode of mounting my home directory on Android.
It's a solved problem to export my home directory via NFS to real UNIX clients. There are, however, a number of issues that need to be dealt with.
Various desktop software creates a ridiculous collection of dot files and directories, of which .mozilla and .cache/mozilla contain more garbage than the entire rest of the homedir. There are 172 dot files and 121 non-dot files (or directories). If I could avoid seeing the dot files on Android, it would greatly ease the long delay opening the homedir, and would be helpful as well for the real UNIX clients.
Only a few directories are really active. I need to have two parallel homedir structures, one with active content and one archival. I should plan on moving directories between them as projects and interests change.
My ~/public_html directory has evolved like fungus, and particularly needs to be neatened up. There are 110 items at the top level. It relies on a lot of symbolic links to directories outside webspace, requiring that Apache be configured to follow symlinks. To be easily editable on Android, it needs to be reorganized with a few intermediate directories. (And useless cruft tossed.)
Similarly the ~/misc directory has too many toplevel items and
needs more intermediate directories. And misc
is such a vague
semantic tag.
I require that root should be able to log in and work normally even if the machine cannot mount NFS filesystems. (It is not really necessary that jimc be functional.) This has two implications: the traditional 30 second NFS timeout has to be shortened drastically, like to 2 or 3 seconds, and /usr/diklo/default/path.sh should only add ~$LOGNAME/bin to the path if it exists, requiring only one NFS timeout to discover that it's missing.
I'm planning to have at the top level only dot files and a few major directories, of which one will be public_html. The Android client can either mount the desired major directory, or the whole homedir but it should be able to jump over the toplevel directory without statting all the dot files.
NFS exports directories, not filesystems. I'm going to reorganize /home to contain only home directories. So where do the rest go? There are quite a lot of directories, some large, like ADT and images.
Do I want one copy of /usr/diklo (local software) mounted by NFS? Or an individual copy per machine like I have now? The latter makes it a lot easier to work on a machine with a flaky network connection. As now, I need to back up only the master copy.
Selection | Setup | Testing | Software | Homedir | Top |