Based on all the numbers I've seen, it looks like bcachefs's b-tree might actually be the fastest ordered key value store around (there are faster persistent hash tables).
If anyone knows of anything that might be faster, I'd love to hear it - and I might rig up some head to head benchm...
2018-02-04 23:42:01 +0000 UTC
View Post
rand_mixed is 3/4 lookups, 1/4 update
Lookups, both sequential and random, are beautifully fast.
Random updates ought to scale better than that, but I haven't profiled it yet so I'm not sure what's going on.
Could probably do better on sequential inserts/deletes too, but I haven't even...
2018-01-28 03:32:51 +0000 UTC
View Post
I should have done this ages ago...
bcachefs: btree_perf_test() doing 10.0M rand_insert:
bcachefs: btree_perf_test() done in 27 sec, 2587 nsec per iter, 377k per sec
bcachefs: btree_perf_test() doing 10.0M rand_lookup:
bcachefs: btree_perf_test() done in 10 sec, 1001 n...
2018-01-24 20:07:19 +0000 UTC
View Post
You can now expand a filesystem on a device - shrink isn't implemented just yet. The command is bcachefs device resize, and it takes the same arguments as resize2fs:
bcachefs device resize /dev/sdb 10G
If you don't specify a size, it uses the current size of the device.
Online and offl...
2018-01-02 23:41:21 +0000 UTC
View Post
Replication tests are finally all passing! This means that device removal and write error handling (for replicated writes) should finally be fully working.
Those two codepaths have in common that they need to modify pointers to existing btree nodes - removing the pointer to the device that either ...
2017-12-24 21:21:48 +0000 UTC
View Post
2017-12-16 16:53:54 +0000 UTC
View Post
Just pushed out a patch to add per-inode options for some options that could previously only be set globally. Currently this is just checksum type and compression type, but more will be added in the future. The options are exposed as xattrs, and if you set them on a directory they'll be inherited on...
2017-12-15 16:21:18 +0000 UTC
View Post
I've been back to work for several months but procrastinating and neglecting the updates...
The biggest thing I've been spending my time on lately has been improving the test infrastructure and test suite and chasing bugs. It seems I was putting this off for far too long, relying just on xfstests ...
2017-12-13 16:07:29 +0000 UTC
View Post
Took this earlier today in Nashville :)
On the bcachefs front - might be announcing a corporate sponsor in the next few days! Stay tuned.
2017-08-21 23:48:22 +0000 UTC
View Post
2017-07-18 00:19:41 +0000 UTC
View Post
If you've noticed things have been quieter lately, you haven't been imagining things - I've been busy with getting ready for a rather big move, and in the process I'm taking an extended road trip. I'm pretty happy about it - I've been feeling stuck in a rut and it's been hard to make progress writin...
2017-06-13 23:48:51 +0000 UTC
View Post
As I think I mentioned awhile ago, for replication the last big item left was IO error handling - that is, handling IO errors without just going read only when we've got another replica to read from (for reads) or when only some of the replicas for a replicated write failed.
The really tricky...
2017-05-15 08:12:50 +0000 UTC
View Post
Recently pushed a patch to add prefetching of btree nodes. It's a rather minor change compared to the stuff I'm still working on for replication, but it does improve both mount and fsck times by around 2x - not too shabby for a relatively simple change.
On larger filesystems, bcachefs's mount...
2017-04-25 14:49:53 +0000 UTC
View Post
Debugging, debugging, more debugging...
If you've been wondering at the slow progress, that's where all my time's been going. The unfortunate reality about creating a filesystem is that a filesystem, much moreso than most software, isn't all that useful if it's only, say, 90% debugged - you d...
2017-04-11 05:18:01 +0000 UTC
View Post
http://bcachefs.org/
Don't have any _new_ content there yet, it's just all the existing stuff in one place. Would love to have people help out on the website.
Also, bcachefs has its own git repositories now - also linked to b...
2017-03-22 09:27:50 +0000 UTC
View Post
It's been far too long since the last announcement - lots of stuff has been
happening. The biggest milestone has been all the breaking on disk format
changes finally landing, but there's been lots of other stuff going on, too.
On the subject of the breaking on disk format ch...
2017-03-16 00:04:34 +0000 UTC
View Post
First off, some background on where we're at currently, regarding metadata IO:
- A userspace process will never block on IO - i.e., wait for a journal write or a btree node write - unnecessarily. Never ever. The only reason your userspace proccess will end up blocked waiting for a meta...
2016-12-03 03:59:18 +0000 UTC
View Post
Lately, the big bottleneck is getting to be testing - I really need more people willing to try out the latest code and make sure it isn't going to eat anyone's data before I push it out for general consumption. I do a lot of testing myself already - honestly, that's where most of my time goes - but ...
2016-11-06 08:27:02 +0000 UTC
View Post
These patches haven't landed yet, and the numbers should be higher when I'm done - but the fsmark numbers are now looking really nice. Delete performance is massively improved, too.
Time to completion for fs_mark -v -n 200000 -s 4096 -k -S 1 -D 1000 -N 1000 -t 10:
bcachefs: ...
2016-10-24 10:48:01 +0000 UTC
View Post
Tiering should finally be working with the last big batch of fixes I pushed.
Chris Halse Rogers (RAOF in the #bcache IRC channel) has been testing it. He has been seeing an intermittent deadlock while copying large amounts of data, which may or may not be tiering related: if anyone else hits it, I...
2016-09-13 02:10:00 +0000 UTC
View Post
First off, a word about definitions. In bcachefs, tiering is caching by another name: storage devices can be assigned to different tiers, and we can use a faster tier to cache a slower tier.
In some other storage systems, tiering means a setup where data can be dynamically moved between different ...
2016-09-13 01:40:07 +0000 UTC
View Post
Just saw this really excellent article about disk encryption - this explains better than I could the issues with encryption at the block layer:
http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/<...
2016-09-06 04:21:00 +0000 UTC
View Post
- Encryption's mostly done, got some useful feedback from the design doc.
- Starting to work on multiple devices and replication again. Found some "there's no way this could have possibly worked" bugs with tiering - evidently I've neglected all the multiple device stuff for too long.
2016-09-05 02:40:02 +0000 UTC
View Post
Been studying random papers/RFCs/Dan Bernstein's code and figuring out the plan for adding encryption to bcachefs... doing crypto right is hard. In storage land, I'm not sure anyone really gets it right - if you're doing block storage (e.g. dm-crypt), or if you're adding encryption to an existing fi...
2016-08-07 16:10:52 +0000 UTC
View Post
I'm kicking myself for not noticing this sooner (most likely I saw it months ago and then forgot about it because I'm terrible about taking notes...). Do not use bcachefs with compression enabled yet - if copygc ever has to run you'll very soon hit a BUG_ON().
The issue is that copygc will often h...
2016-08-06 05:12:49 +0000 UTC
View Post
This is my current project, so I thought I'd write something about it and how this area of bcachefs works.
So, for some background: every remotely modern filesystem has some sort of facility for what database people just call transactions - doing complex operations atomically, so that i...
2016-08-02 12:40:59 +0000 UTC
View Post
The last bit - disk space accounting - is finished, so it should actually be useful now.
Please test it out - I'd like to hear how well it's working for people.
Currently lz4 and gzip are supported, and lz4 is the recommended option. I'd like to add more compression algorithms in the...
2016-08-02 10:39:12 +0000 UTC
View Post
Thank you so much for the support this month!
2016-07-31 23:59:00 +0000 UTC
View Post
Finally figured out how to make compressed disk usage accounting work. It's a surprisingly thorny issue - I'll have to write more about it later.
The TL;DR is - disk usage is only allowed to increase when you're getting a disk reservation (which is also where you'd get -ENOSPC). We have to be ...
2016-07-26 11:34:27 +0000 UTC
View Post