Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: ext3 journal modes
PostPosted: Tue Sep 27, 2011 1:54 am 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
I've been reading about ext3's various journal modes this fine evening, and have some questions. Well, really only one question: Have I gone insane and misunderstood everything, or have Linode and the Linux fs folks gone completely insane?

What I'm hoping I'm misunderstanding is writeback mode, which my Linodes (Ubuntu Hardy and Lucid) apparently use by default. Aside from the fact that it seems rather likely to corrupt files in the face of a crash -- which is no small concern itself -- it seems to have another interesting property in *how* it corrupts files: If a file grows right before a crash, its new size may be preserved, but the new contents may be lost. In which case when the fs is recovered, the new part of the file would now contain whatever those blocks happened to contain before. For example, a copy of /etc/shadow that was deleted a month ago. Now, is it just me, or is this a stunningly bad idea security-wise on a system that gives untrusted users access to any of its files? Such as a Linode that gets Fremonted 30 seconds after its owner installs a new WordPress theme*.

So, am I misunderstanding, or is that completely insane?

Now, I have also learned that ordered mode sucks horribly in its own way: if a lot of data is sitting around in the write buffers, an fsync() can freeze I/O for a painfully long time -- a dozen or two seconds. Still, that seems like a limited price, when the only safe way to run writeback is to check every file modified in the last $maximum_time_linux_will_ever_buffer_a_write_ever or, more simply, wipe the fs every time the system crashes...

Recommendations? Risk writeback? Suffer ordered? Seek psychiatric help, because I totally misread things?

* Linode uses battery-backed RAID, of course, but that would only prevent the power outage from seriously scribbling on the disk; it would not magically save data that was sitting in the kernel's write buffers but that it had not bothered to write out to the disk -- meaning the BBU -- yet, right?

Edit: Fix editing error

_________________
Matt Nordhoff (aka Peng on IRC)


Last edited by mnordhoff on Tue Sep 27, 2011 10:31 am, edited 1 time in total.

Top
   
 Post subject:
PostPosted: Tue Sep 27, 2011 10:22 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Well, I'd try to cut back a bit on the paranoia, for one thing ;)

I can't tell you if it works like this, but if I were designing it, I'd only update the block pointers after writing the new data, ensuring that the extra space effectively points nowhere until there's data there to point to.


Top
   
 Post subject:
PostPosted: Tue Sep 27, 2011 11:23 am 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
Of course, this presupposes a concept of "after", and I don't believe there's any guarantee that disk writes occur in a linear, time-increasing fashion without intentionally forcing the writes to occur :-)

Anyway, from http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html, which quotes the CHANGES file for ext3:

Quote:
"mount -o data=writeback"
Only journals metadata changes, and data updates are entirely
left to the normal "sync" process. After a crash, files will
may contain stale data blocks from old files: this mode is
exactly equivalent to running ext2 with a very fast fsck on reboot.


So it sounds like it is no worse than ext2, but is no better (safety-wise) than it, either.

"man mount" states that "ordered" is the default, but from a check of /proc/mounts, I think you're on to something:

Code:
Ubuntu 10.04 desktop:
/dev/mapper/witte-root / ext4 rw,relatime,errors=remount-ro,barrier=1,data=ordered 0 0

Ubuntu 10.04 server, upgraded from 8.10 incrementally:
/dev/mapper/hennepin-root / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0

Ubuntu 11.04 netbook:
/dev/disk/by-uuid/9f1f6d4f-ecf1-47f0-ac14-da83dcfbfe0d / ext4 rw,relatime,errors=remount-ro,barrier=1,data=ordered 0 0

Ubuntu 8.04 Linode:
/dev/root / ext3 rw,noatime,errors=remount-ro,barrier=0,data=writeback 0 0

Ubuntu 10.04 Linode (upgraded from 8.04):
/dev/root / ext3 rw,relatime,errors=remount-ro,barrier=0,data=writeback 0 0

Ubuntu 10.04 Rackspace Cloud Server:
/dev/sda1 / ext3 rw,noatime,errors=remount-ro,barrier=0,data=writeback 0 0

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Tue Sep 27, 2011 12:05 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
HoopyCat, from my reading, the kernel's default was changed to writeback around 2.6.30. Your "man mount" probably predates that.

(This ignores the default that can be set on the fs by tune2fs, and, of course, you can override it with /etc/fstab or mount. [Or the kernel command line, at least for the root fs.])

(There's a .config option to change the default back to ordered, but Linode's kernels do not use it. [And neither do Rackspace's.])

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
 Post subject:
PostPosted: Tue Sep 27, 2011 5:37 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
So I whipped out the kernel source. The entire "situation" spans three commits in 2009 and 2010. Here's what we have:

    April 2009 (torvalds): Configuration option CONFIG_EXT3_DEFAULTS_TO_ORDERED added, with no default set. Help text describes EXT3_MOUNT_ORDERED_DATA as an "unfortunate choice" and a "legacy default", and advises that the option not be set (i.e. the default should be writeback) and if the users "really want" to use ordered mode, to set it by tune2fs (bbae8bcc49)
    August 2009 (tytso): "(legacy option)" removed from CONFIG_EXT3_DEFAULTS_TO_ORDERED prompt; help text rewritten to be more neutral, due to concerns about the strong bias in favor of writeback (6d41807614)
    July 2010 (Dave Chinner): default for CONFIG_EXT3_DEFAULTS_TO_ORDERED changes to "y", for data safety reasons, with a rather stern commit message. Interestingly, Chinner claims that "all major distros" are ensuring ext3 filesystems are using ordered mode, but one could interpret that to have a somewhat nonstandard definition of "major distro" (aa32a79638)

So, the option first appeared in 2.6.30 with a strong recommendation for, and an implicit default to, writeback. The help text was copyedited to be more neutral and mention the tradeoffs in 2.6.31, but it still implicitly defaulted to writeback until the config option was changed to defaulted to "y" in 2.6.36.

Linode's paravirt kernel configuration likely traces its pedigree to ~2.6.31 or so. At that time, the default would have been to not set CONFIG_EXT3_DEFAULTS_TO_ORDERED, and this has probably carried forward to today, unbeknownst to anyone.

There does not appear to be a similar configuration option for ext4; its documentation states that ordered is the default.

I would strongly support setting CONFIG_EXT3_DEFAULTS_TO_ORDERED.


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 1:26 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
I just setupped a node to test switching to ordered mode...

1.) As the docs warned, trying to do use /etc/fstab to change the root fs's journal mode results in unhappiness:

Code:
EXT3-fs (xvda): error: cannot change data mode on remount. The filesystem is mounted in data=writeback mode and you try to remount it in data=ordered mode.
mount: / not mounted already, or bad option
mountall: mount / [1454] terminated with status 32
mountall: Filesystem could not be mounted: /
mountall: Skipping mounting / since Plymouth is not available
rm: cannot remove `/var/lib/urandom/random-seed': Read-only file system


:mrgreen:

2.) A quick 'tune2fs -o journal_data_ordered /dev/xvda' (typed from memory; could be wrong) worked perfectly. (I did it in Finnix while repairing /etc/fstab; I don't know if you can do it on a live, writable fs.) Edit: Yes, doing it on a live, writable fs works. I don't know if it's supposed to, though. I would hope tune2fs would be smart enough to bail if it was dangerous.

Edit: The other options, are, of course, CONFIG_EXT3_DEFAULTS_TO_ORDERED and changing the kernel command line (rootflags=data=ordered). But that requires pv-grub or cooperation by Linode.

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
 Post subject: Re: ext3 journal modes
PostPosted: Sat Oct 01, 2011 5:40 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
mnordhoff wrote:
a Linode that gets Fremonted

Fremont is a verb now? :P

I'm more or less a newbie when it comes to filesystems, but the above discussion seems to suggest that Linode should update its kernels to default to ordered mode, if they haven't already done so.


Top
   
 Post subject: Re: ext3 journal modes
PostPosted: Sat Oct 01, 2011 5:52 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
hybinet wrote:
Fremont is a verb now? :P

I'm trying to coin it.

hybinet wrote:
I'm more or less a newbie when it comes to filesystems, but the above discussion seems to suggest that Linode should update its kernels to default to ordered mode, if they haven't already done so.

Well, that's certainly *my* opinion, at least. We'll see if they agree.

They have not already done so, by the way. (I know because I just did the tune2fs thing and rebooted half an hour ago.)

Edit: I filed a ticket about it. If you never see me again, that's why.

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 8:11 pm 
Offline
Senior Member

Joined: Sun Mar 07, 2010 7:47 pm
Posts: 1970
Website: http://www.rwky.net
Location: Earth
This thread peaked my curiosity so I did some digging and tested the write performance of the various journal modes.

These are all done on the same linode 512 in london dumping 500mb to disk 10 times for each test here are the results

ext3 writeback ~5s
ext3 ordered ~5s

ext4 ordered ~4.4s
ext4 writeback ~4.5s

This should be taken with a pinch of salt these results are in no way scientific, but it does seem to indicate no performance degradation from using ordered. It also appears ext4 maybe quicker, I asked support if they had any plans for supporting it but they will forward the suggestion to the developers but cannot guarantee they will support it or when.

So I'd agree that setting the default mode to ordered would be a good idea.

_________________
Paid support
How to ask for help
1. Give details of your problem
2. Post any errors
3. Post relevant logs.
4. Don't hide details i.e. your domain, it just makes things harder
5. Be polite or you'll be eaten by a grue


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 8:26 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
mnordhoff wrote:
2.) A quick 'tune2fs -o journal_data_ordered /dev/xvda' (typed from memory; could be wrong) worked perfectly. (I did it in Finnix while repairing /etc/fstab; I don't know if you can do it on a live, writable fs.) Edit: Yes, doing it on a live, writable fs works. I don't know if it's supposed to, though. I would hope tune2fs would be smart enough to bail if it was dangerous.

I think in this case, tune2fs is simply setting the default filesystem options in the filesystem metadata, and likely not influencing the currently mounted behavior. That is, the ext3 driver reads and applies those filesystem options at mount-time. So you'd probably have to arrange to re-mount the live filesystem after changing the value with tune2fs to have it take effect.

-- David


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 8:26 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
obs,

That's not the sort of situation where you run into trouble with ext3's ordered mode. Where it gets ugly is when you're doing a lot of I/O and then something starts fsync()ing, because that blocks all(?) I/O until it finishes writing out all of the buffers.

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 8:30 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
db3l wrote:
I think in this case, tune2fs is simply setting the default filesystem options in the filesystem metadata, and likely not influencing the currently mounted behavior. That is, the ext3 driver reads and applies those filesystem options at mount-time. So you'd probably have to arrange to re-mount the live filesystem after changing the value with tune2fs to have it take effect.

Oh, certainly. What I was wondering was if tune2fs would let me modify the default mount options while the fs is mounted writable, and whether things would get horribly corrupted if it did.

tune2fs docs all have scary warnings about doing stuff to a writable fs, but doing *this* is very simple, and wasn't explicitly covered. All I can say is that I tried it and it worked, but that was on a test node doing zero I/O. I did not risk it on anything important.

_________________
Matt Nordhoff (aka Peng on IRC)


Top
   
 Post subject:
PostPosted: Sat Oct 01, 2011 9:10 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
I just did it on a couple busy nodes doing significantly non-zero IO, and then updated the maximum mount counts and intervals while I was in there.
No problems noted. <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>^A^@<FE><FF>^C
^@^@<FF><FF><FF><FF>^F^B^@^@^@^@^@<C0>^@^@^@^@^@^@F'^@^@^@Microsoft Office Word 97-2003 Document^@
^@^@^@MSWordDoc^@^P^@^ @^@Word.Document.8^@<F4>9<B2>q^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
 Post subject:
PostPosted: Sun Oct 02, 2011 12:06 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
hoopycat wrote:
<FF><FF><FF><FF>^@F'^@^@^@Microsoft Office Word 97-2003 Document

RIP hoopycat, he seems to have been assassinated by a team of Microsoft ninjas while testing his new Linux box.

Also, something on this page is missing a "word-wrap: break-word" CSS directive.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group