I am an extreme moderate

January 22, 2012

Snappy compression added to BTRFS

Filed under: Uncategorized — niezmierniespokojny @ 2:55 pm

Recently, Intel’s Andy Kleen published a set of patches that add Snappy compression support to BTRFS.
My first reaction was:
What for?
BTRFS already has LZO, which is stronger and compresses faster. Decompression is slower, but in all real cases the bottleneck will be the disk, so higher strength should lead to lower disk use and better performance with reads too.
So what reason does Andy provide?

“snappy is a faster compression algorithm that provides similar compression as LZO, but generally better performance.”

Hmm.
I benchmarked these algorithms on several machines and did some benchmarks that didn’t end up published on several more and LZO was definitely faster during compression (but slower during decompression).
In an older post, Andy shared some details:

“This has been tested with various test workloads and also in a real distribution. One challenge with the benchmarks is that a lot of of benchmarks only read/write zeroes, which are a uncommon good case for IO compression. However even with modified workloads that use different patterns gains are seen.

In general snappy performs better with a 64bit CPU, but also generally outperforms LZO on a 32bit Atom system on IO workloads.

The disk space usage is similar to LZO (within 0.2%)”

I have a suspicion.
He probably tested on largely incompressible data. In my tests, the size delta was in the range of 0.8-1.5%, so way bigger than his. And it happens that on incompressible data, Snappy is indeed faster than LZO.
But if your data is incompressible why use compression at all?

It would be nice if Andy clarified what data did he use and maybe gave more details about the test setup.

I’ve been thinking about it for a while and I just don’t see a really compelling use case for Snappy in BTRFS. There are some like a mix of highly compressible and incompressible where it would be better than LZO, but just slightly.

7 Comments »

  1. Still looking to implement your own compressed FileSystem patch ?

    Comment by Yann Collet (@Cyan4973) — January 25, 2012 @ 5:32 am

    • Yeah, though there are 2 ‘but’s:
      – Not for BTRFS. That’s not something I ever planned and I don’t think there would be much use for it. LZO covers the ground for fast algorithms very nicely. zlib ain’t bad either, I didn’t look much but really I don’t think there’s any worthwhile open source competitor really. With my current state of knowledge, 7-zips implementation of Deflate64 seems like the only thing that might be better. Which is incredible, considering how old it is. And it looks like a lot of work for little benefit. It would be different if BTRFS allowed offline compression, but for now it doesn’t.
      – I wanted to do it for ZFS, which could really use a better fast algorithm. But now I have 0 time for it, so it got delayed indefinitely. Also, your LZ4 is in swift development which makes me want to wait and see. In the latest, unpublished evaluation it looked great because it scaled with multiple threads better than any other algorithm that I tried, but it changed so much since I made the test that I don’t really know if it still has the advantage. I’d have to recheck it and while I expect that it would look even better, I don’t know. I don’t have to time to learn it and even if I found some, the results would likely be irrelevant in half year again, which makes me unwilling to invest the time. It’s an interesting effect, on one hand seeing how alive is the project is very encouraging, yet the variability drives evaluation costs much higher. Also, I reckon that there’s no need to hurry. There are still critical bug appearing in LZ4 from time to time, so I consider it to be suitable only for experimentation ATM. And, sadly, as long as you’ll keep turning the code upside down, the bugs will keep appearing. So while I think there’s a great value in improving LZ4 performance in all ways that you can find, as long as you do it, LZ4 won’t advance from being a toy to being a tool.

      Comment by niezmierniespokojny — January 26, 2012 @ 6:15 pm

  2. Well, everything changes. Snappy got an update January 4th, but let’s look beyond the tiny scope of compression algorithms : GCC changes, Linux evolves, ZFS updates, etc. You can’t expect everything frozen before doing an evaluation. It will necessarily be a snapshot into a moving timeline, which is nonetheless useful.

    Note : LZ4 should receive a final update with r52, specifically targeting PowerPC architecture, so it should have little impact, if any, for x86.

    Comment by Yann Collet (@Cyan4973) — January 28, 2012 @ 1:17 pm

    • If you compare LZ4 from half year ago and today’s, it’s entirely different code. All tests from back then are somewhat relevant wrt strength and entirely irrelevant wrt speed. I know that software constantly changes and all evaluations become obsolete at some point, but if I wanted to stay roughly up to date with LZ4 and LZO, the cost of doing so with LZ4 would be way bigger. I find that with my time being very scarce nowadays, I just can’t afford doing good tests of LZ4 often enough. And since I can’t be roughly up to date anyway, I’m not willing to invest in it at all, at least until the pace slows down to the point when I can keep up.

      What do you mean by ‘final update’?
      You run out of ideas? Not for the first time. And I think that not the last one either. 🙂

      Comment by niezmierniespokojny — January 28, 2012 @ 8:55 pm

      • Well, i’m already beyond what i’ve ever planned for LZ4. So yes, pending external contributions, i don’t see much more optimisation left, at least for current x86 CPU.
        The only field where i see some potential left is ARM optimisation. But i have little tool and time to do it, so it’s very likely that i will wait for some request before investigating more on it.

        For LZ4 HC, it’s a bit different : it can still be used as a testing ground for new parsing ideas. By the way, an interesting update is ongoing, to replace MMC with a methodology based on hash chain, as explained here : http://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html
        It should reduce memory usage and be more suitable to low-end CPU such as duron/celeron and sibblings.

        Comment by Yann Collet (@Cyan4973) — January 29, 2012 @ 2:23 pm

  3. Well, no relapse then ….
    http://code.google.com/p/lz4/issues/detail?id=7

    Comment by Yann Collet (@Cyan4973) — February 1, 2012 @ 4:32 am

    • 🙂
      The end is not near.
      And I don’t believe that you reached the end of x86 optimisations. I see how you keep learning new tricks (like the hardware replacement for DeBrujin) and I’m sure there’s still more left. I wouldn’t be surprised to see algorithmic improvements either.

      Comment by niezmierniespokojny — February 5, 2012 @ 3:05 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to niezmierniespokojny Cancel reply

Blog at WordPress.com.