I am an extreme moderate

August 20, 2011

Synthetic test of filesystem compression, part 2

Filed under: Uncategorized — niezmierniespokojny @ 7:35 pm

This post is a continuation of a series, the first part is here, please read it before going forth.
This time I’m looking at performance of the 9 algorithms.
I performed the test on 4 computers:

  • P4:
    Pentium D 805 @ 2.66 ghz
    Windows XP x64
    Gcc 4.5.2
  • Atom:
    Atom Z530 @ 1.6 ghz
    Lubuntu 11.04 386
    Gcc 4.5.2
  • Core2:
    Core2 Duo E4400 @ 2 ghz
    Windows XP
    Gcc 4.5.2
  • Phenom:
    Phenom 2 X4 955 Black Edition @ 3.2 ghz
    PC-BSD 9.0 x64 beta 1.5
    Gcc 4.5.2
  • I failed to port the code to BSD on time, so I tested a Windows executable via wine. It shouldn’t have a measurable impact.

    I tested the same 2 datasets as before with 4 KB sector size and block sizes of 8 KB and 128 KB.

    All test were single threaded.

    I’d like to start with noting that synthetic benchmark that would compare performance on a filesystem with a particular compressor would be a much too big task, that’s not what I’m trying to do. I’m measuring only performance of compressors in a setting that closely resembles work of filesystems like described in the previous post.

    To give you impression of what are the speeds I’ll say that the fastest measured was 367 MB/s (Phenom, lzo 1x_1, TCUP, 128k) and the slowest – 1.1 MB/s (Atom, lzo 1x_999, SCC, 128k). From now on I’ll be talking about more abstract, but also more useful performance metrics.
    First, MB/s is bad because it varies so much between CPUs. I wanted to make some average over different systems to avoid talking about 4 sets of very similar data. So I took zlib -9 as the baseline and measure performance as ‘how many times is it faster than zlib?’. This gives similar figures on all CPUs. Though not the same. Averaging is a simplification and I think it’s good for this post, but I encourage readers who want deeper insight to look at a spreadsheet attached at the end of the post.

    So the first chart for you, performance relative to zlib -9, averaged across all machines.
    speed on compressible data

    As you can see, there are 2 groups of codecs, 5 fast and 4 slow ones. I didn’t include anything in the middle because preliminary testing of over 50 different codecs / settings didn’t show anything really interesting in there and the charts from the previous post looked much more cramped even with 1 more codec.

    Now there’s a matter of efficiency. The most popular approach is charting compression speed vs. size ratio for different codecs. If one is both stronger and faster than another, then it’s better. If they are close in one metric, but distant in the other you can draw conclusions on efficiency too. I propose a scheme that’s more accurate and more suitable for decompression measurements in this test.

    I propose to replace raw speed with speed of reducing dataset, i.e. codec 1 saves 5 MB/s until the size reaches 40% of the original. It’s more accurate because when a codec 1 gives half of savings of codec 2 in 2/3 time, it’s not comparable using the first metric, but looses with the latter – which is good because if you take codec 2, compress half of the data and leave the other half uncompressed, you have something that’s just as strong as codec 1 and faster by 1/6. Also, in cases when it doesn’t show a clear winner, it brings codecs with potentially similar characteristics closer together making it easier to do evaluations.

    So we have a great way of charting compression performance, but decompression is problematic.
    Like I said in the previous post, data is split to blocks that are compressed independently. If a block has good enough compression ratio, filesystem stores it compressed, otherwise – uncompressed. The problem is – for each codec, the data it will eventually decompress is different. Not only amount of data decompressed differs, particular blocks that get decompressed differ too. Here, raw speed comparisons don’t make sense. A stronger compressor will almost always loose. Saved MB/s (decompression seconds) works better. I’m not very happy with this metric, but it’s the best that I see. Also, comparing different decoders on different data is bad, but that’s how it works in real world.

    So the most important charts. I divided data into 2 sets, 8k and 128k blocks and averaged over other parameters (CPU, dataset). Performance is relative to zlib-9 again.

    compression efficiency with 8k block

    decompression efficiency with 8k block

    compression efficiency with 128k block

    decompression efficiency with 128k block

    Conclusions?

  • If you thought that LZJB was bad, think again. It’s terrible. Much slower and weaker than competition.
  • LZO is great at compression, the fastest and fairly strong. Decompression is worse, but still not bad.
  • Snappy and LZ4 are incredibly close. In fact, when I averaged over block size too, they performed almost identically until I added Phenom to the data where Snappy performs better. Anyway, Snappy is a clear winner on 8k and less clear looser on 128k. I’d like to add that the default block size on ZFS is 128k, though with small files there are many smaller blocks too. BTRFS uses smaller blocks (I didn’t check exact size).
  • zlib -6 is practically as strong as -9 and c.a. 30 to 110% faster
  • But there was one more claim about LZJB being good for the task. It has some smarts that detect incompressible data and resigns much quicker than competition.
    There was a test that verified it positively. My results?
    speed on incompressible data

    Well, different. I verified that what I use is indeed the same code that’s in Oracle repository. Looking closer at the data, in the test performed by Denis and erdgeist, LZJB speed increased by ~30%. In my it decreased by 3-33%. I don’t know the reason.
    Furthermore, their test shows that LZO doesn’t have such speedy way of dealing with incompressible data. In my it does. Well, there are like 20 versions of LZO and in preliminary tests I chose one that has this feature.
    Also, there are such modes in other codecs too (not in zlib) and Snappy on Phenom does a whooping 1.8 GB/s.

    I promised to give you a spreadsheet with more data. It contains raw logs from my benchmark and a ton of statistics. If you want to learn more, it’s a highly recommended reading. Click.
    Benchmark sources.

    So do we have enough data to answer the question of what is the best codec for filesystem compression? No. There are at least 2 other important questions:
    -How do they work with different compilers? Linux uses GCC. FreeBSD uses Clang. Solaris, I think, SunCC.
    -How do they scale with multiple cores? Filesystem compression is inherently scalable but are these algorithms?
    TODO.

    Update:
    Yann Collet, the author of LZ4 contacted me and suggested to use the other of 2 decoding functions, which should be faster. I will test it soon.

    Update2 (16.09.2011):
    In the meantime there were several LZ4 updates and today Snappy got performance improvements too. I’m glad I didn’t rush to test stuff because it would be obsolete in 2 weeks!
    Anyway please note that the contents of this post are outdated and should be taken with a grain of salt.

    August 14, 2011

    Synthetic test of filesystem compression, part 1

    Filed under: Uncategorized — niezmierniespokojny @ 5:31 pm

    This is the first post of a series.
    Part 2, single core performance.

    There are a several file systems that compress your data on the fly. The most notable ones are Windows’ NTFS, Linux’s BTRFS and ZFS which is used by several operating systems. It’s a great feature, saves your disk space and in many cases improves performance.
    I created a benchmark that allows fairly accurate evaluation of various compression algorithms that are (or can be) used for this purpose. Thanks to Przemysław Skibiński for a more generic benchmark that he created which I heavily relied on. I replicated the way compression works in ZFS, almost exactly. Differences are for the sake of simplicity and shouldn’t have any measurable impact. First, background:
    Hard disks store data in ‘sectors’ of usually 512 or 4096 bytes. This is the smallest unit that can be read or written at the time.
    The time needed to read or write a 512 byte sector is almost the same as to r/w 8 such sectors, but with bigger ones it starts to grow. Therefore filesystems store data in blocks of usually 4 KB. With compression, data is first split into blocks and then compressed. When data is poorly compressible and you can’t save a full sector, it’s left uncompressed. Please note that when you have a 4k sector and 4k block, you can’t save anything with compression.

    In this post I concentrate purely on strength of different algorithms in different scenarios. Performance will come later.
    I tested the following algorithms:

  • LZ4 r11
  • snappy 1.0.3
  • LZJB 2010
  • LZO 2.0.5 1x_1, 1x_999
  • quicklz 1.5.1 -1
  • zlib 1.2.5 -1, -6, -9
  • ZFS uses LZJB and zlib, BTRFS – LZO (there are like 20 versions, I don’t know which one) and zlib, NTFS a proprietary algorithm.

    2 data points:

  • Silesia Compression Corpus, which is general purpose and
  • TCUP – my own data meant to represent software distribution
  • I tried sector sizes of both 512 bytes and 4 KB and varied block size from 4 KB to 128 KB (the maximum value in ZFS).


    On the x-axis is size in percentage of the original.

    Conclusions?

    1. One thing that I never saw mentioned in ZFS guides is that larger blocks improve compression by a lot. In some cases it may halve your storage needs. I guess people skip it because block size has huge impact on performance and you should never change it unless you know what you’re doing. However, if you do know, it’s a trick that’s worth remembering.
    [UPDATE]
    Since I wrote this post I learned that 128k is not only the maximum, it’s the default block size in ZFS. So you can only loose here. It’s still valid for BTRFS though.
    [/UPDATE]

    2. 4k sectors offer lower granularity, which hurts compression. In particular an 8k block has to be halved to get any savings. Not many are, so compression ratio is dreadful. I think that with 4k drives, for many workloads compression is not worth it, large blocks aren’t general purpose and you need them to get reasonable savings.

    3. LZJB sucks. It’s clearly the weakest of contenders. Does performance save it? I’ll answer it in a later post.

    4. Some guides recommend to use zlib -9 where performance doesn’t matter and LZJB elsewhere. It happens that -6 is practically just as strong.

    5. Some algorithms scale better with block size then others. In particular:

  • LZJB scales very poorly
  • LZ4 is weaker than snappy on small blocks, but overtakes it on bigger ones
  • LZO 1x_999 scales better than zlib -1, starts weaker, but matches it at 128k
  • You can download rough spreadsheets with more detailed results here.

    August 9, 2011

    Why?

    Filed under: Uncategorized — niezmierniespokojny @ 12:32 pm

    There are massive riots in England happening now. There are numerous coverages, yet I can’t find any that would answer the question: Why is it happening?
    All point out that it started with a guy killed by the police. But it’s not the reason. It’s a spark that started the flame, but apparently there was a lot of fuel just waiting for it, some huge tensions in the society. What tensions? The only thing I know about British police is that they love Orwell (example source if you want to know more). But is this the reason? Is it a part of the reason? Any suggestions where to look?

    One thing is sure. The problem was neither Twitter nor Blackberry. It’s so stupid….I’m out of words.

    UPDATE:
    found some info.

    “This is the uprising of the working class, we’re redistributing the wealth,” said Bryn Phillips, a 28-year-old self-described anarchist, as young people emerged from the store with chocolate bars and ice cream cones.

    Phillips claimed rioters were motivated by distrust of the police, and drew a link between the rage on London’s street and insurgent right-wing politics in the United States. “In America you have the tea party, in England you’ve got this,” he said.

    As the unrest spread, some pointed to rising social tensions in Britain as the government slashes 80 billion pounds ($130 billion) from public spending by 2015 to reduce the huge deficit, swollen after the country spent billions bailing out its foundering banks.

    The past year has seen mass protests against the tripling of student tuition fees and cuts to public sector pensions. In November, December and March, small groups broke away from large marches in London to loot. In the most notorious episode, rioters attacked a Rolls-Royce carrying Prince Charles and his wife Camilla to a charity concert.

    However, the full impact of spending cuts has yet to be felt and the unemployment rate is stable — although it highest among youth, especially in areas like Tottenham, Hackney and Croydon.

    Some locals insisted that joblessness was not to blame. “We are going to get people blaming the economy and what happened last week, but that’s not the real reason this happened,” said Brixton resident Marilyn Moseley, 49. “It’s just an excuse for the young ones to come and rob shops.”

    Young ones don’t just go and rob shops because they can. And certainly not in masses, so I think Marilyn is wrong.

    More views from Reuters. Sorry, I can’t figure out how to link to particular messages.

    Hackney pensioner Lloyd said the violence had been inevitable as today’s children no longer had any respect for authority.

    “It was just a matter of time. We’ve lost control of the kids.

    “Three things have come together to help them act like this – Facebook, Twitter and YouTube – because they can now communicate with like-minded kids.

    The riots have been in the making for several years. Its the Bush-Blair wars that set the economic down turn on course. This was compunded by the UK plugging itself closer, as a bridge betwixt, to the US and Europe at the expense of the Commonwealth. The tipping point has been the Cameron-Cleg hubris that has given Britain a taste of Indian style insouciant and insensitive governance instead of building a disadvantaged sensitive development programe as the central thrust in a bedraggled economy.

    Violence can not be justified. But it can be explained. There is a clear link between global economic crisis and riots in London. Interpreted to be a “small number of criminals,” blaming “family education” – a hasty conclusion. In the history of the world is much rebellion and unrest being created in this way we see – on the basis of seemingly secondary cause. Right?

    The economic crisis awakens a sense of hopelessness and meaninglessness of life in the working class and the poor. The system ignores the fact that for decades have chronic depression in the population. This uneasy feeling dampen the activity of workers in trade unions. The rich are busy saving their money. The politicians are possessed with power. Young people – there’s nowhere to soothe their painful feelings. Psychological needs of young people has been ignored for dangerous long period of time. Chronic psychological frustration, which found its “trigger” – that is the image we see, let’s be honest.

    Young people, using crash and burn – are guilty. But politicians need to acknowledge their responsibility.

    Also, some data:
    Percentage of population at risk of poverty in Europe
    List of countries by public debt

    More from Reuters:

    The issue is a lack of education and civic pride. Too stupid to realise they are destroying their own communities.

    UPDATE:
    A psychological view from BBC

    Another update:
    It seems that number of people asking and answering the quesion keeps increasing.
    Recommended reading:
    http://falkvinge.net/2011/08/10/the-consequence-of-no-consequences/

    The way to fight piracy

    Filed under: Uncategorized — niezmierniespokojny @ 10:06 am

    People found numerous ways to prevent people from sharing their works.
    Sue companies for trillions.
    Sue small users for millions.
    Lock the stuff down so much that even your customers can’t use it.

    Now mannheim discovered another creative attempt, from Universal Music:
    Make your product bad enough, so pirates will find something better to share.
    In this case, it was audiophile music with purposefully added distortion. Who would pirate it? Nobody. It’s so simple and effective that it makes me think how came nobody thought about it before?
    Actually they killed 2 birds with one stone. It is assumed that different shops received tracks distorted in different ways, which can act as a watermark, so if they fail and somebody will be too lazy to find a better copy, they can track it more easily.

    Way to go.

    Down.

    August 3, 2011

    Lost emails

    Filed under: Uncategorized — niezmierniespokojny @ 8:11 am

    My thumb drive died yesterday. I got 2 important things on it, Thunderbird and Miranda IM. I backed Miranda less than 2 weeks ago, so it was not a big deal. However, my latest Thunderbird backup is from April.

    Long time ago I thought about the risk of loosing my email database and simply configured it to never delete messages from server. Nowadays with many gigabytes of email space on average account there’s no reason to delete them.
    But received emails are not everything. There are also sent ones and I lost them. I could have prevented this loss too. Thunderbird can BCC all sent messages automatically to any account you want. So I just created one that will be used solely for backup purposes. I recommend that you do the same.

    Create a free website or blog at WordPress.com.

    %d bloggers like this: