I am an extreme moderate

August 14, 2011

Synthetic test of filesystem compression, part 1

Filed under: Uncategorized — niezmierniespokojny @ 5:31 pm

This is the first post of a series.
Part 2, single core performance.

There are a several file systems that compress your data on the fly. The most notable ones are Windows’ NTFS, Linux’s BTRFS and ZFS which is used by several operating systems. It’s a great feature, saves your disk space and in many cases improves performance.
I created a benchmark that allows fairly accurate evaluation of various compression algorithms that are (or can be) used for this purpose. Thanks to Przemysław Skibiński for a more generic benchmark that he created which I heavily relied on. I replicated the way compression works in ZFS, almost exactly. Differences are for the sake of simplicity and shouldn’t have any measurable impact. First, background:
Hard disks store data in ‘sectors’ of usually 512 or 4096 bytes. This is the smallest unit that can be read or written at the time.
The time needed to read or write a 512 byte sector is almost the same as to r/w 8 such sectors, but with bigger ones it starts to grow. Therefore filesystems store data in blocks of usually 4 KB. With compression, data is first split into blocks and then compressed. When data is poorly compressible and you can’t save a full sector, it’s left uncompressed. Please note that when you have a 4k sector and 4k block, you can’t save anything with compression.

In this post I concentrate purely on strength of different algorithms in different scenarios. Performance will come later.
I tested the following algorithms:

  • LZ4 r11
  • snappy 1.0.3
  • LZJB 2010
  • LZO 2.0.5 1x_1, 1x_999
  • quicklz 1.5.1 -1
  • zlib 1.2.5 -1, -6, -9
  • ZFS uses LZJB and zlib, BTRFS – LZO (there are like 20 versions, I don’t know which one) and zlib, NTFS a proprietary algorithm.

    2 data points:

  • Silesia Compression Corpus, which is general purpose and
  • TCUP – my own data meant to represent software distribution
  • I tried sector sizes of both 512 bytes and 4 KB and varied block size from 4 KB to 128 KB (the maximum value in ZFS).


    On the x-axis is size in percentage of the original.

    Conclusions?

    1. One thing that I never saw mentioned in ZFS guides is that larger blocks improve compression by a lot. In some cases it may halve your storage needs. I guess people skip it because block size has huge impact on performance and you should never change it unless you know what you’re doing. However, if you do know, it’s a trick that’s worth remembering.
    [UPDATE]
    Since I wrote this post I learned that 128k is not only the maximum, it’s the default block size in ZFS. So you can only loose here. It’s still valid for BTRFS though.
    [/UPDATE]

    2. 4k sectors offer lower granularity, which hurts compression. In particular an 8k block has to be halved to get any savings. Not many are, so compression ratio is dreadful. I think that with 4k drives, for many workloads compression is not worth it, large blocks aren’t general purpose and you need them to get reasonable savings.

    3. LZJB sucks. It’s clearly the weakest of contenders. Does performance save it? I’ll answer it in a later post.

    4. Some guides recommend to use zlib -9 where performance doesn’t matter and LZJB elsewhere. It happens that -6 is practically just as strong.

    5. Some algorithms scale better with block size then others. In particular:

  • LZJB scales very poorly
  • LZ4 is weaker than snappy on small blocks, but overtakes it on bigger ones
  • LZO 1x_999 scales better than zlib -1, starts weaker, but matches it at 128k
  • You can download rough spreadsheets with more detailed results here.

    1 Comment »

    1. Great work! 🙂

      Comment by Guest32123 — August 14, 2011 @ 10:07 pm


    RSS feed for comments on this post. TrackBack URI

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    Blog at WordPress.com.

    %d bloggers like this: