I am an extreme moderate

November 28, 2011

A web-centric image compression benchmark

Filed under: Uncategorized — niezmierniespokojny @ 6:39 pm

There are quite a few established image compression benchmarks, but none are centred around images that you can find on the web. And as Google recently released their new contender in the space, I thought that such test would be useful.

Competitors:

I tested all compressed image formats with reasonable popularity and some more:

Webp Lossless
– the star of today’s show. Extremely slow, but supposedly strong compression and fast decompression.

Optimized PNG – an old and weak, but hugely popular image format. I optimized the images with the scheme that I use for myself:

    deinterlacing
    pngwolf
    pngout /s0 /f6
    deflopt
    defluff

I find it to be the strongest scheme that’s worth using, though clearly some disagree and make stronger but much slower ones.

BCIF
– the state of art encoder with fast decoding. Nobody uses it, but it works well.

JPEG2000 – The second most popular file format in the comparison, far from the first and from the third. I wondered whether to include it, because its decompression speed is bad, it’s not really in the same league. Still, I was interested how does it fare on web images.

JPEG-LS – even less popular than JPEG2000, but quite fast and in some tests it was overall very efficient. I used CharLS as the reference implementation. Again I had to use the splitting trick. Encoded

I considered testing FLIC too, but decided against it. Much like JPEG2000 it’s in a different league and unlike JPEG2000, nobody uses it. And it’s likely that it wouldn’t score great because it is implemented to compress photographic images.

I used the following program versions:
webpll version 2011-11-20
ImageMagick 6.7.3-8
pngwolf 2011-04-08
pngout 2011-07-02
deflopt 2.07
defluff 0.3.2
BCIF 1.0 beta for Windows
BCIF 1.0 beta Java (when the previous one crashed)
Kakadu 6.4.1
Loco 1.00

Test Data:
I downloaded all PNG images from Alexa top 100 sites and everything 1 link away from them. I didn’t block ads. This yielded ~350 MB of images with average size of just over 17 KB. That was too much, I wanted ~100 MB. So I chose 6000 randomly and got a dataset of 105.2 MB.
Next time I would do it better though. In the dataset there are a few series of very similar images, throwing away almost 3/4 of the dataset couldn’t eliminate them and I think that spidering the entire top 1000 would be better. Or maybe first-page results of top 100 Google searches would work even better? Either selection is biased anyway, it’s hard to come up with something that isn’t.

The files take 105.2 MB as PNGs and 486.3 MB decompressed to PPMs.
The smallest image has 70 bytes, the largest 1.8 MB. The average size is 18 KB and the median – 4.3 KB.
47.4% use transparency. Or rather – have transparency information embedded. In quite a few cases the info says just that the image is fully opaque, which takes some space and does nothing.
I wanted to determine how many of them came optimized. But how to tell them from the rest? I guess that the best way is to see whether my optimization gets significant savings. What is significant? At first I decided: 2%. Under this definition, 9.2% of images came optimized. But later I decided to sort files by compression ratio achieved by optimization and plot compression ratio against percentile of files.
The results are below:

optimization savings

As you can see, the line is quite smooth, there is no clear divide between files that came optimized already and those that didn’t.
You can also note that compression ratio goes almost down to 0 (more precisely – to 0.01004). On one file recompression saved almost 99% of initial size, on 11 – at least 95%. Quite a lot, I think.

Problems with making the test:
It’s sad to say, but the state of image manipulation software and partially with the ecosystem in general is just terrible. Problems?
1. There is no standard image format. When it comes to viewing, there is no problem. But with format conversion, there is. Some encoders require PNG input. Some BMP. Some PNM. And these formats are not compatible with each other. The limitations that hurt me with this test were lack of support for transparency and for bit depths greater than 8 bits per colour per pixel.
2. Encoders have limited support for input file formats. Sometimes its bugs, sometimes just lacking features, but I’ve had problems with, I think, 3 programs because of it.
3. Encoders don’t fully support their target output. For example, nconvert can’t compress to JPEG2000 losslessly and OpenJPEG JPEG2000 encoder doesn’t support transparency.
4. Image conversion tools are limited or buggy. There were 3 image conversion tools that I tried, ImageMagick, Nconvert and ImageWorsener. The last one wasn’t really useful because it supported ony BMP output and only for non-transparent files. Both ImageMagick and Nconvert would silently corrupt data from time to time..
5. The worst of all. I found no image comparison program that would work correctly. Tried 2. ImageMagick and Imagediff. Both produced both false positives and false negatives. There are 1000s of others out there, but seemingly nothing that would allow batch comparisons. So I worked with terribly buggy programs that frequently corrupted my data and I couldn’t reliably detect whether they had luck to produce correct results or not. I spent a lot of time doing different tests to assess correctness and whatever bug I found – I worked around it. But there may be more. I don’t know. It’s really disheartening.

To give you a sense what I went through, a JPEG2000 transparency story:
OpenJPEG doesn’t support transparency at all.
Nconvert doesn’t support lossless mode at all.
ImageMagick corrupts data when used this way.
Jasper accepts only bitmaps. But not transparent ones produced by ImageMagick.
Kakadu accepts only bitmaps. But not transparent ones produced by ImageMagick.
Jasper doesn’t accept bitmaps created by Nconvert either.
But Kakadu does!
Nconvert corrupts data with such translation.

So after a lot of work, I didn’t find a way to make it work. I decided to use (unreliable) verification of Nconvert’s work and when there were problems – I split each file into 2; one contained only colours and the other only transparency data. Compressed them independently and summed the sizes.

I did the same splitting trick with formats that don’t support transparency, JPEG-LS and BCIF.
Also, none format other than PNG supported 64-bit images. Or rather – JPEG2000 supports them, but Nconvert doesn’t. With such images I added a (unoptimized) PNG size to the listing of each codec (except for optimized PNG, which is the only that got any savings on these images).

Performance:

I didn’t measure performance. The data was quite large and I couldn’t afford leaving the computer exclusively for compression and using it while compressing skews the results. So the only strong data provided is compressed size. But to give you some idea of what’s going on, webp took over 4 days and nights of CPU time. PNG optimization half as much. All others did their jobs in minutes.

Results:

First, some numbers:
Optimization compressed the data to 84.1%
Webpll to 65.7%
BCIF to 93.7%
JPEG2000 to 110.8%
JPEG-LS to 115.8%.

Quite interesting. BCIF, JPEG2000 and JPEG-LS all beat PNG by a significant margin in all tests that I’ve seen before. And now they got beaten by it. I guess there is a merit in naming the file format “Portable Network Graphics”.
By looking at fragmentary test results of other people I expected webpll to be rather pale in comparison, but it leads the pack with a big margin. 21.9% over the second, to be precise.
Let’s look into more details.

One picture is worth 1000 words:

But sometimes needs explaining. Oh well.
The files are sorted by their original sizes, on the left are the small and on the right are the big. Each line represents a compressor. The y-axis is compression ratio, the smaller the better.
Both axes have logarithmic scales. Also, to make it readable I applied a low-pass filter to the data.

As you can see, except for files < 1 KB, the distance between optimized PNG and webpll is quite constant. The other formats are very bad with small images and improve all the way when sizes grow. At roughly 256 KB BCIF is the strongest of the pack, ex-equo with webpll.
Part of the disconnection between webpll and the rest is file format complexity. Let's look at the smallest image size that I got in each format:
webpll: 6 B
PNG: 68 B
JPEG-LS: 38 B
JPEG2000: 287 B
BCIF: 134 B.
When compressing 1-pixel files, 280 B overhead is something.
Though it should be noted that HTTP protocol has some overhead too and differences in performance would be much lower (I guess that practically none). Still, JPEG2000 overhead (assuming it's constant over all files) adds to over 1.6 MB over the entire dataset.
ADDED: Correction, my guess was wrong. People who know the stuff better then I do said that despite HTTP overhead, the minor changes still matter.

You can download full results here.

Summary:

There are 2 classes of image codecs, those that are web centric and that are not. The web-centric care little about speed (you can find some really extreme PNG optimization scripts), but are much stronger on web data. Wepb lossless worked well, it’s very slow but significantly stronger then anything else.

November 25, 2011

When I don’t report bugs

Filed under: Uncategorized — niezmierniespokojny @ 2:23 pm

Bugs are everywhere. I’m working on a test of lossless image compressors and in 2 days I found 6 in 5 different programs.
I didn’t report all of them. In 2 cases I just didn’t care, I switched to alternative products and went on.
But there’s one that I tried to report, but didn’t succeed with the amount of effort that I was willing to put into it.
This was with ImageMagick. I found 2 critical bugs where the conversion tool would silently produce corrupted data. I went to their site to report them.
The first stop: I had to register.

What idiot thought that it was a good idea? It makes absolutely no sense. I came to them to help them and they put artificial barriers before me. What did they think, maybe that after registration I would stick around? No way, I don’t find the project interesting.

Annoyed, I proceeded with the registration. They asked me from email. WTH, why do you need my email? I see no use other than to spam me. I did what I always do in such cases, went to 10minutemail to create a 1 time email address.

Filled all the registration fields and they notified me that I got a letter and I should click on a link to be allowed to report the bug. Nonsense again, but expected. I went to the mailbox, refreshed the page, the email is not there. I waited 5 minutes, no mail. I clicked on ‘give me another 10 minutes’, so the email wouldn’t self-destruct too early. Returned to the mailbox 8 minutes later. The confirmation letter was not there. I gave up.

It’s not the first time I went through exactly the same process. Actually it is usually quite similar. And today it really got me. It makes no sense. If you want me to contribute to your project, be kind enough not to waste my time with useless formalities.
Look at wikipedia. If I want to contribute to it, it’s just 1 click away. That’s how it’s supposed to be done.

ADDED: The numbers are growing. 7 bugs in 6 programs in 2 days.
ADDED: 7 bugs in 5 programs. The last issue turned out to be a bug in ImageMagic’s comparison tool which claims that 2 identical images are different.

November 21, 2011

Webp gallery oddities

Filed under: Uncategorized — niezmierniespokojny @ 6:22 pm

I looked at the sample webp lossless images provided by Google.
Let’s see how do they look after we strip the transparency…
1:
1
2:
2
3:
3
4:
4
5:
5

Notice the oddities on 1 and 4? This is a dirty transparency optimization. You can see the basic scheme explained here. This version looks to be the most advanced one, blacking out is good, but adds some edges that are uneasy for the codec to predict. Here, instead of black there’s exactly the thing that the codec would predict.
What codec?
Certainly not PNG.
I tried CryoPNG on the image, which tries to manipulate dirty transparency so it works best with PNG. The best result looked like this:
1 - optimized, no alpha
Compression improved from 118.5 to 108.4 KB. The result can be seen below.
1 - optimized
I suspected that Google chose the transparency pattern to make webp look good, so I tried to compress the image optimized for PNG with webpll, expecting to see the size increasing. Surprise, it dropped too. From 88.1 to 82.2 KB. So I don’t have an explanation for the pattern.

November 20, 2011

Webp lossless – first impressions

Filed under: Uncategorized — niezmierniespokojny @ 7:13 pm

Google released a lossless version of their webp codec recently. Or more likely – a totally new codec under the same brand.
It’s a good development, PNG is very bad and there’s hardly anything that could replace it. Almost all research codecs released in last years were unsuitable for many uses, because their decompression is slow and memory hungry. The only exception that comes to my mind is BCIF. BCIF compresses fast and usually well, but on some kinds of images is very poor; its decompression is very fast.
There are also JPEG 2000 and Microsoft’s JPEG XR. The former is not very good and is quite slow. On weaker machines decompression is slower than transfer. Can’t tell much about the latter, nobody uses it.
So on one hand I think that any good evaluation of webpll should take BCIF as the main competitor. OTOH no large business stands behind it and it has little chance of adoption, so while it’s the state of art compressor with fast decoding, it’s actually hardly a webpll competitor. JPEG 2000 and JPEG XR both failed and don’t seem to have any future, which leaves poor old PNG and the only format in the field.

I didn’t read the google performance report, since their original webp announcement turned out to be hugely misleading, I don’t think it’s worth my time. I downloaded the codec (Surprise, no sources. Yet?) and wanted to test it on a couple of my files. The first one was 27 MB uncompressed or 10 MB as a PNG. The strongest codec on the file is BMF, which gets it down to 6.2 MB. I’ve heard that webpll was very slow, so for the start I decided to try the fastest mode (0).
It took 20 minutes. And compressed to 9.6 MB. Well, I like hugely asymmetric codecs, but after the test I hoped that stronger modes are much stronger and not much slower. Also, compression needed over 200 MB RAM. OK, we have a baseline, let’s try what’s the best it can do. The strongest mode, 100.

I waited for 20 minutes.
Then for one hour.
After 5 hours I started to wonder whether it would ever return. Maybe it’s hung? How can I know? I won’t be able to tell unless it returns.
After 6 hours and 7 minutes there was a crash. Out of memory. I don’t know how much was needed, but I’m pretty sure that over 1 GB. Damn. It’s was 6 hours and 7 minutes of CPU time, but closer to 7h realtime wasted.

I decided to make another trial on the file, this time with the default mode. The bad thing is that the program doesn’t tell what is the default. Maybe it’s 100? The program text suggests that it’s not, saying that the setting is to make files denser. Maybe it’s 0? After 20 minutes I know it isn’t. So it can be 99 and crash too. And there are no sources to check.

While waiting I decided to make another trial. I didn’t use any of my standard images, they are all too large. I searched the disk for the smallest file that I got. It wasn’t a good one, because it clearly have been compressed in a lossy way before, but well, it’s better to have some data then none.
The file was 24-bit, 300×300, 264 KB uncompressed or 67.4 KB as PNG. I didn’t try many compressors on it, but BMF shrunk it to 53.3 KB.
webpll 0: 0.9 s., 63.2 KB. Better than on the previous file.
webpll 100: 176 s., 60.3 KB. Half way from PNG to BMF.
webpll default: 181 s., 60.5 KB. Weirdly, slightly slower. I guess it’s a testing variance, the test was not supposed to be accurate anyway.
For comparison, BCIF: 0.2 s., 60.2 KB. Well….testing on a single file, and one that was compressed in a lossy way, on a machine loaded with another compression job certainly isn’t accurate, but webpll doesn’t look great, does it?
And, BTW, webpll used a little over 40 MB of memory at peak.

~3 minutes / quarter MB = ~12 minutes / MB = ~5 hours 24 minutes / 27 MB. Damn, I was likely very close to finish the first test before it crashed. And since the default mode takes about as much as much as 100, I terminated the test that I had running in background.

I tried also decompression. Too bad that webpll can be only decompressed to PNG, which means that it actually first decompresses the file to a bitmap and then compresses to png, which skews the timing, possibly by a lot.

Anyway, decompressing the small file took 0.091 s. BCIF needed 0.031 s. Hard to tell which one is better.

To sum up, webpll seems less than thrilling. I searched a bit and it seems that there are no independent, reasonably well done tests yet. One good tester reported that he was unable to run the test, because there’s only a 32-bit Windows executable and 32 bits are not enough to compress some 100 MB files that he uses.
So overall, huge memory consumption and huge slowness, they are sure. I don’t have data to assess webpll strength, but seeing that it took 6+ hours on the first file, I hoped it had something in common with MRP, which should make it about as strong if not stronger than BMF. I guess it’s weaker.
I’m severely disappointed.

UPDATE:
Alexander Ratushnyak did a test of webpll in mode 0. He failed to run tests on his entire test corpus because of the mentioned memory problems, but you can see the results of other compressors here. It should be noted that the corpus covers only photographic images. And that PNGs are not optimized.
PNG size: 1 220 278 081
webpll size: 1 100 509 276
BCIF size: 975 294 955
There’s no good timing, because while he shows sizes of individual files, which let me calculate BCIF result, he only shows timing on the entire corpus.
Time is in seconds.
webpll compression: 17 561
webpll decompression: 1 286
BCIF compression: 832.30
BCIF decompression: 167.29
If we assume that BCIF performance is the same on all files and adjust the times to reflect a smaller corpus size, the result would be:
BCIF compression: 705.43
BCIF decompression: 141.79

Still, it doesn’t look good at all, though the fastest modes are frequently inefficient.

ADDED:
Some fragmentary test results:
http://encode.ru/threads/1136-WebP-(lossy-image-compression)?p=27211&viewfull=1#post27211
http://tech.slashdot.org/comments.pl?sid=2533368&cid=38102662
http://encode.ru/threads/1136-WebP-(lossy-image-compression)?p=27195&viewfull=1#post27195
http://www.heise.de/developer/news/foren/S-Re-Ein-typischer-PNG-Anwendungsfall-fehlt-in-den-Beispielen/forum-216147/msg-21079458/read/
https://groups.google.com/a/webmproject.org/group/webp-discuss/msg/bfa4a880ef68f877
https://groups.google.com/a/webmproject.org/group/webp-discuss/msg/f99d74e73bdd3386
http://encode.ru/threads/1136-WebP-(lossy-image-compression)?p=27254&viewfull=1#post27254
http://encode.ru/threads/1136-WebP-(lossy-image-compression)?p=27261&viewfull=1#post27261
I will probably add more as I spot them. If I missed something, please let me know.

The results are mixed. 2 testers report wonderful savings, others quite bad ones, one very bad (1.7% saved over PNG).

ADDED: Actually the webpll source is available, just not next to the binaries. It’s here.
The default compression level is 95.

ADDED: Alexander Ratushnyak posted some more results. He tested c10 and c30 on a much restricted (because of crashes) set of files.
I’d like to add BCIF results again:
49 385 276. 4.6% less then c30 though on file by file comparison there are differences swinging wildly both ways. You can see results on individual files in his post and BCIF restricted to only those files here.

ADDED: I created a quite good benchmark involving WebP Lossless.

November 17, 2011

Coincidence

Filed under: Uncategorized — niezmierniespokojny @ 8:17 pm

Sometimes weird things happen.
I wrote the previous post today because of a discussion on some message board. In that discussion I’ve been criticising people who shoot private servers of a particular game (Metin2) as illegal just like that.
It happens that my nick on that forum is Slurp.
It didn’t take long before sb. found that a person nicknamed S.L.U.R.P runs a private Metin2 server.
And is interested in copyright too.

I wonder how many will believe that it’s not me…

Everything you know about copyright is wrong

Filed under: Uncategorized — niezmierniespokojny @ 3:15 pm

There’s a thing bothering me for quite some time – private copyright enforcement over the net. I spend a lot of time on discussion boards. From time to time sb. asks a question related to content they got from unofficial sources. Usually there’s somebody – frequently forum staff – shouting that talk about illegal things is prohibited. The topic gets closed.

The problem is that such illegality assessments are always premature. First, there are many misconceptions about copyright. I can speak for how it is in my country – and it’s very bad. There’s a lot of stories that don’t hold water and many ideas about the law that are just wrong, but don’t prevent people from shouting at ‘pirates’.
The bigger reason is that internet is global and laws are very different in different places. Even if doing some things would be illegal for the shouting member, it doesn’t mean it is for the person who did it.

To give you some examples of what is legal in Poland, but not in many other places:

  • to make a copy of a Avatar and give it to a friend,
  • to rip Lady Gaga’s Just Dance from internet radio station,
  • to download Sapkowski’s The Witcher pentalogy from Megaupload and enjoy the read.
  • (for the curious ones, English translation of the relevant part of the bill)

    OTOH there are things that are legal in some places, but not here, i.e. I can’t take a Bach concerto and claim to be its author.

    In other countries there are even weirder rules, i.e. in Iran, there’s no copyright for foreign works at all.

    Internet changed many things. Globalisation of communications wasn’t followed by globalisation of law. Copyright is just one example, although one that’s recurring quite often. I think it’s a big problem, and one that’s growing as individual countries proceed with Internet regulations. Hardly any big site looks the same in all countries now, they just can’t. Small sites usually ignore laws in individual countries. When threatened with sanctions they often cut a country off. Because of legislators, Internet is not global already. One can often read how one government or another tries to break the Internet, but they already did it. There’s a tiny hope that they recognize what they have done and try to repair it though. Recently India proposed to make internet regulations a duty of UN (click). I totally like the proposal. While I don’t like how some countries have more saying there than I’d give to them and I expect that laws created by my country would be better than ones from such body, I see the damage to Internet unity to be a bigger problem than not-too-great laws.

    As to the post title – well, it doesn’t match the contents too well. But at least it’s catchy.

    Blog at WordPress.com.