UCLA Mathematics Department has about 100 machines running OpenSuSE 13.1 Linux with kernel 3.11.6. After two servers were upgraded to this distro version (Cedar and Walnut), they repeatedly had alleged filesystem corruption leading to the filesystem being mounted readonly. (It is not practical to clear this on a live system, so we rebooted each time; the filesystem was always repairable automatically, so far.) We badly need to solve this issue. So what are the symptoms, as close as can be described?
Problems started right after upgrading from OpenSuSE 11.4 with kernel 2.6.37, to OpenSuSE 13.1 with kernel 3.11.10. Reverting to 3.11.6 may have made the horkage less frequent but did not put an end to it. However, with 3.11.10 the tracebacks involved NFS; now they don't.
We made two interventions to filesystems: reverting from NFSv4 to NFSv3, and switching to soft NFS mounts.
Today on Cedar, one filesystem (/dev/sdb2, ext3) was remounted
readonly three four times (and Cedar was rebooted
afterward). It cannot be proven that the same filesystem is always
involved, on a historical time scale.
Walnut has had two or three similar incidents. Julia, Tupelo and Sunset are the other servers with similar hardware and usage patterns, and they have not had readonly remounts (yet). All these servers use ext3 on the filesystems. (New installations get ext4.) All of them use RAID-5 with Megaraid cards, for the homedir filesystems.
So would could be the cause, and the cure?
Kernel 3.11.10 has an ongoing bug involving NFSv4 (find link to bugzilla). Today's readonlifications did not implicate NFS in the call trace, so I'm inclined to give NFS a pass now. Perhaps 3.11.6 also has a different bug. But then, why do the other homedir servers escape it? I doubt that reverting to an older kernel would help us.
Cedar and Walnut have the most different users on them and the most varied activity. Could there be some user-dependent capacity limit or weakness? In one incident on Cedar that was closely monitored, it was rebooted (with fsck -p) and the affected filesystem was read-write, but about 150 waiting Apache threads on the webserver then accessed different files (not all on the affected filesystem, in fact, probably the minority there), likely but not proven as the various users owning UserDirs, and in less than 2 minutes the filesystem went readonly again. A causal connection is not proven but is suggestive.
Could there be a hardware problem, in the discs, in RAID, in SCSI, or whatever? There is no sign of it in syslog, and specifically, not a peep today when or before the filesystem was mounted readonly.
Could certain filesystems be corrupt in a way that e2fsck cannot detect or repair? That would explain why that one filesystem is involved over and over.
It's known that some filenames contain blanks, control characters, Unicode, and likely worse garbage. If the filesystem code were vulnerable to garbage in filenames, that would explain a lot.
Here is what we're doing on Cedar:
A USB external disc (and enclosure) has been attached to Cedar (sdc1, ext4) and the contents of the horked filesystem is being copied to it. It will then be substituted for /dev/sdb2 on /h2. With luck the ghost will remain on /dev/sdb2; in other words, poisonous user files are not the issue (we hope). Copy speed is 2.3Mbyte/sec compared to 2*56Mb/sec for the internal RAID discs.
Then, /dev/sdb2 will be wiped and reformatted (ext4 recommended), and the content copied back. This will not be done immediately, but when users are not likely to be affected as much.
For investigation, jimc will write a test program that will abuse a filesystem, either locally, by NFSv3, or by NFSv4. It will overwrite and replace files, simulating various editing styles. It will be run on workstations (test machines). The objective is to get a procedure that will reliably set off one or more of our insectile nemeses.
Jimc will search cedar:/h2 for filenames that are not alphanumeric.
find /h2 -print | grep -v '^[ -~]*$' > /tmp/h2-bad-names.ls
Outcome: 59 files in /h2 have garbage in their names. One egregious example is /h2/fa/dearricott/public_html/\037 . But in /h1, 85 files have garbage, including /h1/fa/jbg/Papers/\037 , exactly the same situation as dearricott's file. It's a longshot, to think that particular filenames (or data) are poisonous.
Apr 7 13:32:16 cedar kernel: [846079.180677] JBD2 unexpected failure: jbd2_jour nal_revoke: !buffer_revoked(bh); Apr 7 13:32:16 cedar kernel: [846079.180682] inconsistent data on disk Apr 7 13:32:16 cedar kernel: [846079.180685] EXT4-fs: ext4_alloc_branch:395: ab orting transaction: IO failure in __ext4_forget Apr 7 13:32:16 cedar kernel: [846079.195435] EXT4-fs error (device sdb2): ext4_ alloc_branch:395: error -5 when attempting revoke Apr 7 13:32:16 cedar kernel: [846079.195442] EXT4-fs (sdb2): Remounting filesystem read-only Apr 7 13:32:16 cedar kernel: [846079.195445] Aborting journal on device sdb2-8. Apr 7 13:32:16 cedar kernel: [846079.219374] EXT4-fs error (device sdb2) in ext4_free_blocks:4818: Journal has aborted Apr 7 13:32:16 cedar kernel: [846079.225946] EXT4-fs error (device sdb2) in ext4_orphan_add:2616: Journal has aborted Apr 7 13:32:16 cedar kernel: [846079.231938] EXT4-fs error (device sdb2) in ext4_write_begin:999: IO failure (wait 23 secs) Apr 7 13:32:39 cedar kernel: [846102.151759] ------------[ cut here ]------------ Apr 7 13:32:39 cedar kernel: [846102.151768] WARNING: CPU: 9 PID: 340 at /home/abuild/rpmbuild/BUILD/kernel-default-3.11.6/linux-3.11/fs/buffer.c:1194 __brelse+0x2f/0x40() Apr 7 13:32:39 cedar kernel: [846102.151769] VFS: brelse: Trying to free free buffer Apr 7 13:32:39 cedar kernel: [846102.151770] Modules linked in: md5 sha1_generic cbc cts nfsv3 nfs fscache rpcsec_gss_krb5 nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry quota_v2 quota_tree coretemp binfmt_misc kvm iTCO_wdt joydev gpio_ich iTCO_vendor_support dcdbas crc32c_intel pcspkr serio_raw sr_mod cdrom lpc_ich ata_generic ses enclosure i7core_edac edac_core bnx2 shpchp button mperf sg ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 hid_generic usbhid hid mgag200 ttm drm_kms_helper uhci_hcd drm ehci_pci i2c_algo_bit ehci_hcd sysimgblt sysfillrect i2c_core syscopyarea usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh fan ata_piix libata megaraid_sas thermal processor thermal_sys hwmon Apr 7 13:32:39 cedar kernel: [846102.151812] CPU: 9 PID: 340 Comm: systemd-journal Not tainted 3.11.6-4-default #1 Apr 7 13:32:39 cedar kernel: [846102.151813] Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 1.1.4 05/08/2009 Apr 7 13:32:39 cedar kernel: [846102.151815] 00000009 c06abefe f3579978 c02422 7e c0832590 f3579990 00000154 c0832508 Apr 7 13:32:39 cedar kernel: [846102.151820] 000004aa c036ad8f c036ad8f 00000008 00000008 c2d4de48 e32c3e48 c02422d3 Apr 7 13:32:39 cedar kernel: [846102.151824] 00000009 f3579978 c0832590 f3579990 c036ad8f c0832508 000004aa c0832590 Apr 7 13:32:39 cedar kernel: [846102.151828] Call Trace: Apr 7 13:32:39 cedar kernel: [846102.151836] [] try_stack_unwind+0x179/0x190 Apr 7 13:32:39 cedar kernel: [846102.151841] [ ] dump_trace+0x47/0xf0 Apr 7 13:32:39 cedar kernel: [846102.151845] [ ] show_trace_log_lvl+0x3f/0x50 Apr 7 13:32:39 cedar kernel: [846102.151847] [ ] show_stack_log_lvl+0x50/0xd0 Apr 7 13:32:39 cedar kernel: [846102.151850] [ ] show_stack+0x1f/0x40 Apr 7 13:32:39 cedar kernel: [846102.151855] [ ] dump_stack+0x3e/0x4e Apr 7 13:32:39 cedar kernel: [846102.151859] [ ] warn_slowpath_common+0x7e/0xa0 Apr 7 13:32:39 cedar kernel: [846102.151862] [ ] warn_slowpath_fmt+0x33/0x40 Apr 7 13:32:39 cedar kernel: [846102.151865] [ ] __brelse+0x2f/0x40 Apr 7 13:32:39 cedar kernel: [846102.151869] [ ] __find_get_block+0x1bf/0x1f0 Apr 7 13:32:39 cedar kernel: [846102.151872] [ ] __getblk+0x2c/0x310 Apr 7 13:32:39 cedar kernel: [846102.151877] [ ] ext4_read_block_bitmap_nowait+0x46/0x300 Apr 7 13:32:39 cedar kernel: [846102.151881] [ ] ext4_read_block_bitmap+0xc/0x40 Apr 7 13:32:39 cedar kernel: [846102.151885] [ ] ext4_mb_mark_diskspace_used+0x3d/0x510 Apr 7 13:32:39 cedar kernel: [846102.151888] [ ] ext4_mb_new_blocks+0x296/0x580 Apr 7 13:32:39 cedar kernel: [846102.151894] [ ] ext4_alloc_branch+0x2ca/0x310 Apr 7 13:32:39 cedar kernel: [846102.151898] [ ] ext4_ind_map_blocks+0x1ab/0x740 Apr 7 13:32:39 cedar kernel: [846102.151900] [ ] ext4_map_blocks+0x14c/0x570 Apr 7 13:32:39 cedar kernel: [846102.151903] [ ] _ext4_get_block+0x62/0x160 Apr 7 13:32:39 cedar kernel: [846102.151906] [ ] __block_write_begin+0x140/0x380 Apr 7 13:32:39 cedar kernel: [846102.151909] [ ] ext4_write_begin+0x177/0x410 Apr 7 13:32:39 cedar kernel: [846102.151913] [ ] generic_file_buffered_write+0x105/0x260 Apr 7 13:32:39 cedar kernel: [846102.151916] [ ] __generic_file_aio_write+0x26c/0x4b0 Apr 7 13:32:39 cedar kernel: [846102.151919] [ ] generic_file_aio_write+0x59/0xc0 Apr 7 13:32:39 cedar kernel: [846102.151922] [ ] ext4_file_write+0xd4/0x4e0 Apr 7 13:32:39 cedar kernel: [846102.151926] [ ] do_sync_write+0x6b/0xa0 Apr 7 13:32:39 cedar kernel: [846102.151930] [ ] vfs_write+0x95/0x1b0 Apr 7 13:32:39 cedar kernel: [846102.151933] [ ] SyS_pwrite64+0x6a/0xa0 Apr 7 13:32:39 cedar kernel: [846102.151937] [ ] sysenter_do_call+0x12/0x28 Apr 7 13:32:39 cedar kernel: [846102.151942] [ ] 0xb77a142f Apr 7 13:32:39 cedar kernel: [846102.151943] ---[ end trace 2141fd7e68c3f778 ]--- (wait 46 secs) Apr 7 13:33:15 cedar kernel: [846137.980264] ------------[ cut here ]------------ Apr 7 13:33:15 cedar kernel: [846137.980273] WARNING: CPU: 15 PID: 2054 at /home/abuild/rpmbuild/BUILD/kernel-default-3.11.6/linux-3.11/fs/buffer.c:1194 __brelse+0x2f/0x40() Apr 7 13:33:15 cedar kernel: [846137.980274] VFS: brelse: Trying to free free buffer Apr 7 13:33:15 cedar kernel: [846137.980275] Modules linked in: md5 sha1_generic cbc cts nfsv3 nfs fscache rpcsec_gss_krb5 nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry quota_v2 quota_tree coretemp binfmt_misc kvm iTCO_wdt joydev gpio_ich iTCO_vendor_support dcdbas crc32c_intel pcspkr serio_raw sr_mod cdrom lpc_ich ata_generic ses enclosure i7core_edac edac_core bnx2 shpchp button mperf sg ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 hid_generic usbhid hid mgag200 ttm drm_kms_helper uhci_hcd drm ehci_pci i2c_algo_bit ehci_hcd sysimgblt sysfillrect i2c_core syscopyarea usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh fan ata_piix libata megaraid_sas thermal processor thermal_sys hwmon Apr 7 13:33:15 cedar kernel: [846137.980317] CPU: 15 PID: 2054 Comm: qmgr Tainted: G W 3.11.6-4-default #1 Apr 7 13:33:15 cedar kernel: [846137.980318] Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 1.1.4 05/08/2009 Apr 7 13:33:15 cedar kernel: [846137.980320] 00000009 c06abefe d6de1bc8 c024227e c0832590 d6de1be0 00000806 c0832508 Apr 7 13:33:15 cedar kernel: [846137.980324] 000004aa c036ad8f c036ad8f 00000008 00000008 ec9f5cc0 e32c3e48 c02422d3 Apr 7 13:33:15 cedar kernel: [846137.980328] 00000009 d6de1bc8 c0832590 d6de1be0 c036ad8f c0832508 000004aa c0832590 Apr 7 13:33:15 cedar kernel: [846137.980333] Call Trace: Apr 7 13:33:15 cedar kernel: [846137.980341] [ ] try_stack_unwind+0x179/0x190 Apr 7 13:33:15 cedar kernel: [846137.980346] [ ] dump_trace+0x47/0xf0 Apr 7 13:33:15 cedar kernel: [846137.980349] [ ] show_trace_log_lvl+0x3f/0x50 Apr 7 13:33:15 cedar kernel: [846137.980352] [ ] show_stack_log_lvl+0x50/0xd0 Apr 7 13:33:15 cedar kernel: [846137.980355] [ ] show_stack+0x1f/0x40 Apr 7 13:33:15 cedar kernel: [846137.980360] [ ] dump_stack+0x3e/0x4e Apr 7 13:33:15 cedar kernel: [846137.980365] [ ] warn_slowpath_common+0x7e/0xa0 Apr 7 13:33:15 cedar kernel: [846137.980368] [ ] warn_slowpath_fmt+0x33/0x40 Apr 7 13:33:15 cedar kernel: [846137.980371] [ ] __brelse+0x2f/0x40 Apr 7 13:33:15 cedar kernel: [846137.980374] [ ] __find_get_block+0x1bf/0x1f0 Apr 7 13:33:15 cedar kernel: [846137.980378] [ ] __getblk+0x2c/0x310 Apr 7 13:33:15 cedar kernel: [846137.980382] [ ] __ext4_get_inode_loc +0xf7/0x490 Apr 7 13:33:15 cedar kernel: [846137.980386] [ ] ext4_reserve_inode_write+0x16/0x70 Apr 7 13:33:15 cedar kernel: [846137.980389] [ ] ext4_mark_inode_dirty+0x34/0x1b0 Apr 7 13:33:15 cedar kernel: [846137.980392] [ ] add_dirent_to_buf+0x106/0x190 Apr 7 13:33:15 cedar kernel: [846137.980395] [ ] ext4_add_entry+0x685/0xb90 Apr 7 13:33:15 cedar kernel: [846137.980399] [ ] ext4_rename+0x22b/0x950 Apr 7 13:33:15 cedar kernel: [846137.980404] [ ] vfs_rename+0x113/0x4d0 Apr 7 13:33:15 cedar kernel: [846137.980408] [ ] SYSC_renameat+0x2ac/0x2c0 Apr 7 13:33:15 cedar kernel: [846137.980412] [ ] SyS_renameat+0x1b/0x20 Apr 7 13:33:15 cedar kernel: [846137.980415] [ ] SyS_rename+0x27/0x30 Apr 7 13:33:15 cedar kernel: [846137.980418] [ ] sysenter_do_call+0x12/0x28 Apr 7 13:33:15 cedar kernel: [846137.980423] [ ] 0xb774a42f Apr 7 13:33:15 cedar kernel: [846137.980425] ---[ end trace 2141fd7e68c3f779 ]- (wait 1 hour almost exactly) (irrelevancies snipped) Apr 7 14:33:55 cedar systemd[1]: Stopping Session 17899 of user root. Apr 7 14:34:13 cedar systemd[482]: Reached target Shutdown. Apr 7 14:34:14 cedar rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="993" x-info="http://www.rsyslog.com"] exiting on signal 15.