Shared publicly  - 
 
Phoronix, alas, has perpetrated another example of irresponsible journalism.   I won't dignify said article with a web link, since I don't want to reward them with more ad hits.  So I'll link to the original Ubuntu Launchpad report, and include the comment I just made there:

Those specific fsck corrections --- fixing the number of free blocks and the number of free inodes --- is completely normal and is purely a cosmetic issue. There is nothing to worry about here.

What is going on is that ext4 no longer updates the superblock after every block and inode allocation; that causes a wasteful write cycle to the superblock at every single journal commit, and it also is a SMP scalability bottleneck for larger servers (i.e., with 32 or 64 CPU's). To fix this, we no longer update these values in the superblock every time we allocate a block or an inode. Instead, we only update these values when we unmount the file system, mainly for cosmetic purposes so that dumpe2fs shoes the correct number of free inodes and blocks, and at mount time we calculate the total number of free blocks and inodes in the file system by summing the the free blocks/inodes statistics for each block group. So in fact, ext4 does not depend on the correctness of the values in the superblock, but it does try to update them on a clean unmount.

In e2fsprogs commit id 2788cc879bbe6, which is in e2fsprogs 1.42. 3 and newer, we changed things so that e2fsck -n would not display this as something "wrong". However, we still do show this as something that we "fix" when running e2fsck -y or -p, since in fact it is a change to the file systems. See: http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commit;h=2788cc879bbe667d28277e1d660b7e56514e5b30

No one else has complained or noticed up until now, because other distro's apparently are capable of doing a clean shutdown allowing the file system to be unmounted cleanly. Ubuntu, unfortunately, is incapable of reliably doing a clean shutdown even when users request it, which is why Ubuntu users are seeing this behavior much more frequently, and apparently some people have panicked as a result. Sigh....

----

I will say that it is extremely irresponsible of Phoronix to make a big deal about this this before giving anyone knowledgeable (which unfortunately  does not include any Ubuntu kernel engineers, since as far as I know they don't have any file system specialists on staff) to comment on the bug.  No one from Phoronix even bothered to contact me to tell me they were posting this story, or to ask me for a comment.  I had to find out about it when someone asked me to comment on Google+.

However, from the perspective of trying to send as many ad clicks as possible to their web site, they are doing a heckuva job....
101
26
Nicola Orritos's profile photoKirill Kolyshkin's profile photoBrandon Golway's profile photoViktor Pankraz's profile photo
28 comments
 
Yeah, when I read the article I thought it was pulling the trigger too soon, before verification.  Too bad...
 
Not the first time Phoronix comes out with bullshit...
 
"Trolling for dollars" is how these sites hope to "make money fast"...but in the long run, it would seem to damage their revenues...
 
Nice to hear the real story from an expert :)
 
I usually like their news and stories. But apparently there is sometimes the tendency to overdraw with irrelevant articles.
 
OT: Ubuntu isn't the only distro incapable of correctly shutting down: I'm stuck at the Arch Linux LTS kernel (3.0.x) because kernels above 3.3.x can't shutdown/reboot my Linux laptop (LTS is actively mantained but I'd like new kernels like zombies needs brains).
 
Phoronix does more harm to the community than good, and Larabel seems to make a lot of money doing it too.
 
Have you tried asking anyone for help on the forums +Nicola Orritos? Unless you have strange hardware there shouldn't be a problem. Are you using Systemd or SysVinit?
 
So Theodore, you, your team or anyone else working on ext4 SCREWED UP with e2fsck, reporting and fixing problems that don't exist or simply failing, and now you blame distributions and people for reporting the issue without asking your "permission" first??!!

What an arrogant attitude!!
 
Hi +Brandon Golway and thanks for your reply. I'm trying to debug the issue on my own as it doesn't seem to be a known one. The problem is I'm not sure whether it's caused by systemd, the kernel or a combination of the two. Also I'm not ruling out some kernel modules and/or filesystem issues yet.
Once debugged a bit more I'll discuss it on the forums and I may even open a bug if that's the case.
 
+Andrew Wyatt , I really thought they were a MS support site occasionally writing something about Linux, at least that was my impression of some of their articles.
 
+Brandon Golway One thing is for sure: LTS kernel 3.0.x works like a charm, using the same version of Systemd.
 
+Vagelis Giannadakis I don't think we screwed up.   E2fsck has a policy of always reporting when it changes anything with the file system.   When it updated the number of free blocks and free inodes in the superblock, it was making a change; therefore, it has to announce the change when it is run with -y, and it has to ask permission before it makes the change if e2fsck is run without -y.

Users misinterpreted this as an indication that file system was "corrupted".  It wasn't helped by the fact that at least initially, the user who reported it didn't bother to submit the e2fsck logs.

People of good will can disagree with whether or not e2fsck should make changes behind the users back, and/or not to fix up the free inode/blocks counts in the superblock.  I think still think e2fsck is doing the right thing, and I don't plan to change how e2fsck behaves in this fashion.

However, even if you disagree with my design choices, it was still an example of extraordinarily bad journalism (if Michael et. al. consider themselves "journalists" instead to gossips that troll for ad hits).
 
FWIW, I saw this during the ext4 corruption panic a few weeks ago. It was clearly harmless so I didn't report it (the worst possible consequence was a misreporting of free space, and lots of filesystems get that wrong routinely in any case -- and after a few minutes of looking at ext4 code it was clear it wouldn't even do that).

I wonder if Michael will complain about the unlinked inode dtime message next? close-after-unlink, oh noes, clearly corruption :)
 
been using Mint 14 with same devastating effect. ext4 checksums seem to be updating wrong before going into init 6 final stages or probably in between so finally when shutting down it seems skipping a group descriptors close call so a final checksums update would be wasted. it would be «a delayed application with low fs access to multiple sparse files creation», aka mozilla firefox? devfs?. i've been extending a forensic analysis without clues so far so turned into using e2fsck «straight from the git repo». perhaps having better clues after sorting out what e2fsck fix subs are vain looping also working out other distros who would be already patched aka Redhat's hopefuly?. seems to be a «high risk medium priority» double faced «bug» to me. merry christmas for those who've been through this.
 
It's not high risk unless you turn on journal checksums. Solution: don't turn on journal checksums. They're not on by default (though more because it's not clear what to do when the checksums don't match than because of bugs like this).
 
Ubuntu 12.04.2 i386 seems to be polite but still won't fix alternative superblocks so far it just loops. Ubuntu server fixed superblocks and also trashed my /home so i assume it's the inbetwen. not blaming e2fsck itself, it's something asynchroneous as said...
 
+Diego Cadogan If you're not using non-standard mount options, you have nothing to worry about.   If you're having problems, I suggest that you send e-mail to linux-ext4@vger.kernel.org with the precise set of mount options you are using, what version of e2fsck you have, what version of the kernel you have, the output from e2fsck, and whatever other detail you can give.

Ext4 does not enable metadata or journal checksums by default, since they are a feauture still under development and testing.
 
+Diego Cadogan, you probably don't want to use a four-year-old article as an indication of the locus of a bug fixed a few months ago, really.
 
i'm on it right away. not sure how journaling turned on if any. i'm doing the forensics with standard kernel 3.7.1 and lastest packages. i believe the issue has extended to the point some data will be irrecoverable but superblock data seems to be mostly fine except for checksums which might be related to defective journaling. still unsure. i'm also exploring why all superblocks respond in a similar way than 0 which might be effect of a test-fix over superblock 0 or an extended sb rewriting over several days. i assume if journaling was turned on it lasted for at least 15 days which might be main reaon why all superblocks fail the same. fsck still not fixing any of "multiple claimed" blocks which might point to some kind of bug on the original fsck used to fix sb=0 differently from the actual ubuntu 12.04.2 i386 which does no fixing at all. i'll report to kernel list asap once i find out what really was commiting defective superblock/journaling/checksum data. i hope this be helpful for anyone as well for my trashed data after 2 consecutives fsck one of them in manual which points to precise defective checksums.
 
If this is 3.7.1, it certainly cannot be the bug I ran into: that was fixed some time before 3.7 came out. If you're using metadata checksumming, that is a quite different, much newer feature than journal checksums: it is, of course, possible that this has bugs too -- but, again, it's not turned on by default yet, so unless you've explicitly turned it on, it's going to be off. (Just like journal checksumming.)

So far, I see no reason to believe that this has anything to do with checksumming of any sort, nor any reason to believe that it's in any way related to the bug I ran into.

(Oh, and this is very definitely the wrong place to report this bug. If you used email instead, you might have room to actually paste the error messages so we don't need to rely on a vague precis and premature guesswork as to the locus of the bug.

btw, when reporting bugs of any kind, "latest packages" is also not much use compared to, say, a version number of the relevant packages, in this case, e2fsprogs...)
 
indeed. as Theo said e2fsck was not properly correcting block checksums. i'm new to ext4 so i assume it's metadata checksums as you said. but i noticed some overweight for the past days and the issued problem of automated fsck after wrong shutdowns. overweight it's likely to be journaling on, as well as the descripted bug it's metadata checksum looping instead of fixing anything. BUT a different ubuntu version did the work and assumed checksums to zero skipping metadata checksums probably. unsure what role plays journaling at this point but it was likely turned on recently because of the kernel load probably adding some sort of extra signal to the actual defective metadata i had at fixing group blocks form sb=0. i'm not a kernel developer but i detect when something is ruling out as this case which lead to data loss for sb=0
by other hand i found that free blocks are not deleted as fsck said so the data remains untouched and located at alternative superblocks who are being NOT fixed by actual fsck 1.42 (29 nov 2011) but ignored at the point of multiple claimed inodes (sbin elfs and nautilus open inodes like /home/user)
i think using lastest fsck would ignore metadata (when using correct tunning options) giving a chance to fix one of the alternatives so mounting to be polite for checksuming several files by several methods. i actually never been to active kernel dev lists so all my points are probably misleading. anyway we know we're talking sort of the same bug because fsck is hitting a wall over the originally damaging code. hoping to be helpful to anyone. later.
 
Again, please take it to the linux-ext4@vger.kernel.org mailing list, and send us an e-mail telling us how the file system is being mounted (i.e., the contents of /etc/fstab, or the mount command if you are mounting it by hand, or the contents of /proc/mounts if you're not sure), the output of dumpe2fs, and the output e2fsck.

I need solid data, and not your interpretations of the data.  Feel free to give me what you think is going on, but also please give me hard data which you used to form your impressions.  Thanks!!
 
+Diego Cadogan, +Theodore Ts'o is, naturally, quite right. I think we can safely presume for the time being that if you're new to ext4 and not mkfsing or mounting filesystems with a bunch of cool-sounding options you plucked out of the manpage, that it is not metadata checksums at fault, nor journal checksums, since neither of these are on by default. Distros are not using metadata checksums yet (except perhaps bleeding-edge mad distros like Arch): you will not see effects relating to metadata checksums with a normal distro unless you take special measures.

ext4 is incredibly reliable, widely used software. Like all such software, nearly all bugs in it are the result of a long chain of improbable events. While much software is so buggy that you can pick a random feature and probably find a bug in it, ext4 is the opposite: virtually all features, in isolation, work fine. You need to do whole chains of things wrong (or have faulty hardware) to see problems. (e.g., in my case, it took journal checksums and no-barrier mode and asynchronous journal commit and a really weird non-distro shutdown script that routinely rebooted in the middle of umount to see corruption -- and even then I didn't see corruption every time until I intentionally hacked things to amplify the effect. This wasn't an "oops, using one option causes corruption every time". I have never seen such a thing with production-quality ext4 options, and never expect to.

You also cannot say 'ooh, similar symptoms, must be the same bug'. Lots of things can almost certainly cause the symptoms that I saw. Note also that your symptoms are clearly different from mine -- I was seeing silent corruption where fsck claimed the filesystem was clean unless you did a force-fsck.)
Add a comment...