• 2 Posts
  • 84 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle




  • We had all the momentum; we were riding the crest of a high and beautiful wave… So now, less than five years later, you can go up on a steep hill in Las Vegas and look West, and with the right kind of eyes you can almost see the high water mark — that place where the wave finally broke, and rolled back.




  • ZFS is still the de-facto standard of a reliable filesystem. It’s super stable, and annoyingly strict on what you can do with it. Their Raid5 and Raid6 support are the only available software raids in those levels that are guaranteed to not eat your data. I’ve run a TrueNAS server with Raid6 for years now, with absolutely no issues and tens of terabytes of data.

    But, these copy on write filesystems such as ZFS or btrfs are not great for all purposes. For example running a Postgres server on any CoW filesystem will require a lot of tweaking to get reasonable speeds with the database. It’s doable, but it’s a lot of settings to change.

    About the code quality of Linux filesystems, Kent Overstreet, the author of the next new CoW filesystem bcachefs, has a good write-up of the ups and downs:

    • ext4, which works - mostly - but is showing its age. The codebase terrifies most filesystem developers who have had to work on it, and heavy users still run into terrifying performance and data corruption bugs with frightening regularity. The general opinion of filesystem developers is that it’s a miracle it works as well as it does, and ext4’s best feature is its fsck (which does indeed work miracles).
    • xfs, which is reliable and robust but still fundamentally a classical design - it’s designed around update in place, not copy on write (COW). As someone who’s both read and written quite a bit of filesystem code, the xfs developers (and Dave Chinner in particular) routinely impress me with just how rigorous their code is - the quality of the xfs code is genuinely head and shoulders above any other upstream filesystem. Unfortunately, there is a long list of very desirable features that are not really possible in a non COW filesystem, and it is generally recognized that xfs will not be the vehicle for those features.
    • btrfs, which was supposed to be Linux’s next generation COW filesystem - Linux’s answer to zfs. Unfortunately, too much code was written too quickly without focusing on getting the core design correct first, and now it has too many design mistakes baked into the on disk format and an enormous, messy codebase - bigger that xfs. It’s taken far too long to stabilize as well - poisoning the well for future filesystems because too many people were burned on btrfs, repeatedly (e.g. Fedora’s tried to switch to btrfs multiple times and had to switch at the last minute, and server vendors who years ago hoped to one day roll out btrfs are now quietly migrating to xfs instead).
    • zfs, to which we all owe a debt for showing us what could be done in a COW filesystem, but is never going to be a first class citizen on Linux. Also, they made certain design compromises that I can’t fault them for - but it’s possible to better. (Primarily, zfs is block based, not extent based, whereas all other modern filesystems have been extent based for years: the reason they did this is that extents plus snapshots are really hard).

    I started evaluating bcachefs in my main workstation when it arrived to the stable kernels. It can do pretty good raid-1 with encryption and compression. This combination is not really available integrated to the filesystem in anywhere else but zfs. And zfs doesn’t work with all the kernels, which prevents updating to the latest and greatest. It is already a pretty usable system, and in a few years will probably take the crown as the default filesystem in mainstream distros.