An update about the ext4 corruption report that has spread an unfortunate amount of panic:
I have a theory and a two patch series which might
make a difference, but until I get a someone to confirm they are effective, I don't want to push patches to Linus just for take sake of sending a placebo to satisfy the panicked masses. The problem is no one other than one person who is using non-standard mount options (which I should have disabled by default, because they are known to be problematic) has been able to reliably reproduce the problem. Eric and I haven't been able to reproduce the problem at all using the default options, and Eric's reproduction using the scary, non-standard mount options may be caused by a dm-snapshot bug; I'm not sure yet.
Even when I use the problematic mount options which I know in theory could lead to serious file system corruptions if the journal has been corrupted, when I run fsstress under KVM, and then kill -9 the kvm process, afterwards the journal replays just fine and I don't see corruption which Nix has reported. (Eric has reported that when he does this using dm-snapshot, he can repro the problem. I don't know why he and I are seeing different results, and in any case Nix has claimed that the message saying that the journal has been corrupted isn't present --- although we haven't been able to get a full dmesg from one of his reproduction cases yet, so I can't say this with 100% other than for now I have to believe what Nix has told us.)
I've asked Nix to do a series of tests; in particular, do the patch series I posted on Thursday make a difference, and whether he can reproduce the problem at all when removes the highly dangerous combination of mount options: nobarrier,journal_checksum,journal_async_commit. (Basically, you shouldn't use any of these mount options unless you really know what you are doing, and the last two should only be used by developers. I will likely #ifdef
them out in the next version, since more development is necessary before they are safe to use.) Nix has said that he will have time to do the request tests over the weekend, so hopefully I'll be able to send patches that are known to actually make a difference to Linus by early next week.
In general, you should use ext4 with the default mount options unless you really are an expert and know what you are doing. We've had plans to create warninigs for the more experimental bits of ext4, but that hasn't happened yet due to everyone being busy. Guess what will probably be getting a higher priority in the very near future? :-/
The evidence at this point points to the bug requiring a combination of issues, perhaps certain hardware, certain mount options, and perhaps needing to win (lose) a race where you crash just as you are trying to unmount the file system.
Unfortunately, a complex, nuanced story like this doesn't drive huge numbers of web hits, so I don't know if all of the web sites that eagerly picked up on this story last week will bother trying to explain what is going on. :-(