Tech

jimn

hi all.

btrfs is like the very nicest development and experimentation b+tree database that pretends to be a filesystem

I'd like to state my goals here are to revive and update the various prior patches of lzma and lz4 for btrfs as they both exist in the vanilla kernel. I'd like to undertake my first zen-maintained patch if that's a thing... more reasons below.

so, over the years there has been some attention gained because btrfs picked up where reiserfs left off with tails and compression and then some really cool new features like snapshots and COW and ... newer compression options.

in this matter of compression the btrfs official wiki deems the end game of compression to be the ZSTD patches and has rejected lz4 and lzma options since time immemorial. for whatever good reasons, these decisions appear to be decreed law in 2012 and 2014.

the usecases for both lzma and lz4 (highest ratio and faster-than-native decompression respectively) have been exploding in every direction just like the rest of technology. i have a linear-access bz dump of wikipedia that occupies like 75 gigs but decompresses to 1.7 tb. having a btrfs filesystem with comparable or better random access seems like a goood thing considering the rarity of fast storage at this capacity or developer laptops that have old fashioned hdd spindles for such enormous raw data dumps.

i also have blockchain apps on AWS where the SSD is faster than the fastest compress/decompress option, zstd and the realtime requirements drop on the floor along with the presumed rewards belonging to the software clients for maintaining a low entropy ledger that grows quickly when uncompressed.

so i wanted to posit that i have used zen kernels in gentoo since the early days and im using one i build now, most recently the likely usecase appears to be Steam's new Vulkan take-over which performs well enough by assuming there will always be an on-disk texture cache. again another potential win for lz4 if we could only get that far with btrfs.

so back to the main point, it would be a nice thing for the community which includes a number of recent lkml requests for lz4 in btrfs, and while we're at it why can't we just abstract that compression feature to the existing compression libs in the vanilla kernel already? whatever they happen to be, snappy, bz, lz4, lzma, you name it (there's also some obscure ones for memory which shouldn't be excluded without a good reason)

what is involved with this journey? I've no problem handling the c programming specifics but im sure there is a cultural stairway to kernel patching to meet all the requirements and expectations for something shared to the community. I'd love to have some guidance and mentoring here. where to begin?

Thanks
Jim

< Edited by jimn :: Mar 23, 21, 12:17 >
Back to top

techAdmin

The first logical place to start, besides typing more carefully and using upper/lower case correctly so people will read your thoughts in the first place, is to learn why these compression types have been rejected by the kernel guys in the first place.

Look up all the relevant kernel threads, read them completely until you really understand it, not superficially. Avoid ignoring views that don't agree with yours, and try to understand how they are thinking about these issues.

Any time you are dealing with filesystems you are dealing with very critical code, where a single misunderstood thing can led to full data loss and file system corruption. I have, for example, zero interest in using btrfs for this reason, it's too complicated, does too much, or tries to do too much, and thus has many more ways to fail than clean basic stuff like ext4.

I believe it's somewhat arguable in the first place if btrfs is even truly ready for prime time, I know reiser3 and 4 never were, they were experimental and unstable, I know because I used them, which was funny to watch as they failed.

With all this said, obviously, in a sense, the place to start is to run it on your own kernel, then on other installs you have access to, that's aren't just toy installs, but which do real work in complicated ways, to stress the stuff and see if you can find the weak spots. I doubt anyone will volunteer to run alpha quality kernel code, so you'd have to get really good at testing and debugging, and above all, avoid the 'it works in a vm, and on my laptop', both utterly simplistic situations in terms of making issues and bugs appear.
Back to top

jimn

thanks.

I'm aware of the risks and reiser4 and btrfs have chewed a few partitions for me over the years. btrfs is not in the same category as xfs and ext4, it would be a bad choice for the work-horse usecases but will suffice for random-access at scale where you probably have abusive hierarchies and file counts.

I'm also not a full time kernel developer and would like to know that if i prove that a btrfs partition can accopmlish x,y, and z usecases in for instance NLP or blockchain that the same thing doesn't happen that came of openmosix -- advanced kernel for big compute that was quashed by the experiment ending and the submitter's moving on. eventually progress in the vein of supporting apache servers killed a seriously advanced multi-node architecture that was a decade ahead of hadoop.

So I'm not averse to proving a point that there is efficiency to gain by exactly what is in the title, a nudge, not an ambitious effort; and that a couple of years down the road it's something I or someone I work with can pick up and resume without going back to point-in-time patches which for whatever reason pose a bandwidth or mindshare problem for the (1) core maintainer who decides what goes forward and what is a unilateral decision about btrfs. The original author has moved on btw, we're dealing witha torch-bearer situation. The reasoning is different, coolness factor takes a back seat to whatever issue/bugtracking metrics dominate.

That brings me back to zen kernels, profittable to develop against leading edge or in this case, 15 year old compression algorithms which already exist in VFS implementations in the vanilla kernel developed against a different set of goals and judgement. I don't disqualify squashfs and f2fs as being viable for lz4 and lzma usecases but those are not specifically b+tree disguised as a VFS. I think i mentioned that up front.

as for maintainer's reasoning against lz4, because to paraphrase "lzo is good enough and we don't need another". I don't think anyone shares that opinion in 2021 or would look up the factors and agree with that as a potential final state.

Thank you for your kind advice. Your wisdom in the area of kernel is appropriate for kernel oriented priorities as a caution against overlooking hubris, but it's actually the case i saw the patches were in zen once upon a time and thought nothing of it for about 6 years. then i noticed someone was asking for it right around the time i was embarking on an application of the value potential.

Cheers!
Back to top

Tech

Tech

patterns.com

tech forums