Recently, Intel’s Andy Kleen published a set of patches that add Snappy compression support to BTRFS.
My first reaction was:
What for?
BTRFS already has LZO, which is stronger and compresses faster. Decompression is slower, but in all real cases the bottleneck will be the disk, so higher strength should lead to lower disk use and better performance with reads too.
So what reason does Andy provide?
“snappy is a faster compression algorithm that provides similar compression as LZO, but generally better performance.”
Hmm.
I benchmarked these algorithms on several machines and did some benchmarks that didn’t end up published on several more and LZO was definitely faster during compression (but slower during decompression).
In an older post, Andy shared some details:
“This has been tested with various test workloads and also in a real distribution. One challenge with the benchmarks is that a lot of of benchmarks only read/write zeroes, which are a uncommon good case for IO compression. However even with modified workloads that use different patterns gains are seen.
In general snappy performs better with a 64bit CPU, but also generally outperforms LZO on a 32bit Atom system on IO workloads.
The disk space usage is similar to LZO (within 0.2%)”
I have a suspicion.
He probably tested on largely incompressible data. In my tests, the size delta was in the range of 0.8-1.5%, so way bigger than his. And it happens that on incompressible data, Snappy is indeed faster than LZO.
But if your data is incompressible why use compression at all?
It would be nice if Andy clarified what data did he use and maybe gave more details about the test setup.
I’ve been thinking about it for a while and I just don’t see a really compelling use case for Snappy in BTRFS. There are some like a mix of highly compressible and incompressible where it would be better than LZO, but just slightly.