SamSuka
bcachefs

bcachefs

patreon


bcachefs activity

Status update

 - Interior btree node updates are now journalled; removing the need for btree writes to be FUA

 - Interior btree node updates are now fully transactional, we no longer have to do any metadata scanning after unclean shutdown

 - Btree key cache code has been merged

View Post

Towards snapshots

Just finished a major rework that gets us a step closer to snapshots: the btree code is incrementally being changed to handle extents like regular keys.

Previously, when reading in a btree node we'd have to check for and handle partially overwritten extents, as part of the mergesort we do (...

View Post

Status update

There is now a (very work-in-progress) fuse port!

The fuse port isn't intended to ever be for serious use - but I do expect it to be useful for debugging in the future; if someone is hitting a repeatable bug in the bcachefs code, debugging it via the fuse version (with gdb) should be much e...

View Post

At long last - reflink is done

For those who aren't familiar with the idea - reflink means using shared, reference counted extents to do "shallow copies" - copies that share data transparently on disk, but are copy on write (unlike hardlinked files).

To use it, just use cp --reflink. It's great for virtual machine images...

View Post

Still hacking away at reflink

It's pretty close to done, but working through the last of the xfstests failures has been tedious.

But - I just pushed out a punch of prep work patches, and something else cool is now done - we're exporting the actual filesystem blocksize to the Linux VFS, instead of pretending the filesyst...

View Post

Notes on Phoronix benchmarks

Phoronix posted some bcachefs benchmarks: https://www.phoronix.com/scan.php?page=article&item=bcachefs-linux-2019

The results are actually pretty encouraging, even if they...

View Post

Fully persistent allocation info is finally done

Finally! It was a huge effort, but it's done and pushed out.

This means that when mounting a filesystem - even after an unclean shutdown - we don't have to walk all the metadata anymore, because it's always updated in a transactional manner and kept fully consistent in the b-tree.

Th...

View Post

Status update

5.0 rebase is up

And, more importantly - fully persistent allocation info is finally just about done! It's passing the tests, not much left before I can push it out...

View Post

Status update - persistent alloc info

So, first some background:

Fully persistent allocation info is going to require updating the alloc btree every time we update the extents btree - one key in the alloc btree for every pointer in an extent being inserted or overwritten.

That introduces a bit of a difficulty, in that ext...

View Post

More on fully persistent allocation information

So, to recap: bcachefs now persists allocation information on clean shutdown, so mounting after a clean shutdown doesn't require walking any metadata. However, we're not yet keeping allocation information updated as it's modified - that's my current project.

There's two main components to t...

View Post

Fast mounts update

Persistent alloc info for clean shutdowns is finally done - this means when mounting after a clean shutdown, we don't have to scan metadata anymore, and mounting should be just as fast or faster than other filesystems.

We do still run fsck by default on every mount, so to see any change yo...

View Post

bcachefs at FOSDEM

I'll be at FOSDEM. I'm not planning on giving a talk or anything, but if anyone else is interested and is going to be there, send a message and I'd love to meet up.

View Post

Status update - quotas and option handling

Option handling improvements: There's a single master list of option in opts.h, and that list is now used by bcachefs format as well, including for bcachefs format --help. This is a nice usability improvement - it means options are always specified the same way anywhere they can be used, and it m...

View Post

Status update - fast mount times, reflink

So for now, I'm leaving off the remaining parts of erasure coding - the important part was getting everything done that impacts both the on disk format, and the rest of the design. There's some commonality between erasure coding and some of the other upcoming features, so getting erasure coding most...

View Post

Erasure coding has been pushed

It's not production ready yet - stripe level copygc isn't implemented yet, so disk fragmentation could lead to your filesystem getting filled with partially empty stripes and getting stuck. But, aside from that it should be functional.

To use it, just enable the erasure_code option, either at moun...

View Post

Erasure coding is coming!

First off, sorry for the slow progress lately - I've been dealing with some health issues that have been making it incredibly difficult to work. But, the good news is that we may have finally figured out what's going on and *fingers crossed* aforementioned issues seem to finally, slowly be getting b...

View Post

Bcachefs extents - compression, checksumming

One topic that was asked about recently was compression in bcachefs, so I thought I'd write a bit about how extents are represented as a bunch of stuff falls out of that.

In bcachefs, checksumming and compression are done per extent, not per block or per page. This means we store one checksum per ...

View Post

Vote for the next deep dive topic!

I've gotten a few comments that people have been enjoying my technical deep dives into things I'm working on.


There's a lot of other things I could write about as well, not just bcachefs but perhaps also other kernel and storage topics. I'd like to hear what people are interested in, th...

View Post

Filesystem metadata operations are now all fully atomic

In the last post, I wrote about some new transaction infrastructure I was working on that would make it practical to make all the high level filesystem operations (e.g. create, link, unlink) fully atomic - that work is now finished and merged in.

The main benefit from this work is that now, on unc...

View Post

Progress towards faster mount times - new transaction infrastructure

I've talked a bit before about the new transaction infrastructure I've been working on, but to recap:

bcachefs has, for quite some time, had the ability to use multiple btree iterators simultaneously, and to do multiple btree updates atomically - the main btree update function takes a list of (ite...

View Post

Btree unit tests

Been spending a surprising amount of time lately on the core btree - in a good way, as in "oh, here's some good an useful improvements I can easily make", not "oh crap, this thing is broken and I have to fix it".

Some of this was motivated by the truncate bug and needing implement BTREE_INSER...

View Post

The bug squashing continues...

Been squashing quite a few bugs lately, but this latest one has been quite a trip down the rabbit hole...

Initial symptom was that on xfstest generic/475, very occasionally we'd see an extent past the end a file's current i_size (the test runs a filesystem stress test while injecting IO errors and...

View Post

Status update

definitely not drunk debugging right now


I know I've been shit at posting updates, so ask your questions now - about what's going on with upstreaming or anything else you can think of

View Post

New feature: specify a device's durability

Just pushed a new feature (only lightly tested so far): when formatting, you can specify a "durability" for each device: the effect of this is that data on that device will be counted as being replicated that many times.

So if you've got a filesystem with two SSDs and a big hardware RAID array...

View Post

Tiering is dead; long live disk groups

The new disk groups-based code for configuring data placement has been merged, and the notion of configuring disks into "tiers" has been removed. If you have an existing filesystem that uses tiering, you'll have to configure the new interfaces.

The reasoning behind the change was that a "disk...

View Post

Just pushed support for zstd compression

Please test (and don't assume it won't eat all your data)

View Post

ktest

The test framework I use for bcachefs - ktest - has been getting various cleanups and fixes to make it easier for other people to use - in particular, it works on non debian distributions now.

For anyone who's been interesting in getting started with kernel development or bcachefs development, kte...

View Post

Initramfs support for root on encrypted bcachefs

I just pushed initrams hooks/scripts for handling a bcachefs encrypted root filesystem - after you make install in bcachefs-tools, they'll be picked up next time you generate an initramfs, and if your root filesystem is encrypted you'll be promted for the passphrase to unlock it when booting up.
...

View Post

New rereplicate tool; replication ready for testing

Replication support is finally feature complete; it should have everything implemented that's needed for handling and recoving from device failure.

If replication is enabled on a filesystem, a device can fail and be removed while the filesystem is in use without returning any IO errors to use...

View Post

Migrate tool

just fixed some bugs in the migrate tool, should be working again

View Post