I am an extreme moderate

March 16, 2013

Copyright infringing post

Filed under: Uncategorized — niezmierniespokojny @ 9:38 pm

Just read a torrentfreak article on a torrentz censorship and it really strikes me how far-fetched are claims of copyright abusers.

They send a DMCA notice to google. This means that they meant that google was infringing. According to them, Google was infringing because it linked to an infringing site Рtorrentz.eu.

torrentz was infringing…why? The notice doesn’t mention it. But torrentz is a search engine that links to sites containing torrent files. Probably some of this sites hosted a torrent file that they considered infringing.

That file enabled downloading of another file. That other file was copyrighted and may have been distributed w/out permission.

To sum up, the link flow went like this:

  1. google
  2. torrentz
  3. some torrentz search result
  4. some site
  5. torrent file
  6. pirate swarm

So they mean that if you’re 5 links away from a file they hold copyright on, you’re a pirate.

I just linked to torrentz.eu, exactly like google did. I’m a pirate.

Actually, Paramount is a pirate too. On their main page they link to youtube. You can find plenty of pirated files in there just another 1-2 clicks farther. That’s just 2-3 links in total.

I wonder what limit of the number of links do they consider satisfactory to not be a pirate. 5 is too little. Is there any limit? Anyway, senary liability is nonsense. Just like secondary liability, just far more extreme.

Somebody should punish those punks for abusing a broken system (DMCA). And fix the system.

November 23, 2012

My way to BSD, part 1

Filed under: Uncategorized — niezmierniespokojny @ 10:37 pm

I finally made it, I made BSD (PC-BSD to be exact) my desktop OS. It’s been a long road…and one that will keep being bumpy for as long as I can foresee. Still, I believe it goes in the right direction.
I’d like to sum up the years I spent preparing….
It all started in 2007, in January. At the time I have been a loyal Windows fan. I had spent huge amounts of time to learn Windows, not without successes. I kept investing in my Windows skills and wanted to spend my life using it and career – working with it, mostly coding for it.
Then came Vista…and I was in shock.
What was so bad about it? For me, the list was longer than for most.

  • It was bloated.
  • Most agree with this one.

  • It was slow.
  • Again, a common point.

  • It was freaking ugly.
  • I like simplicity. It was complex. I like good use of screen space – OS is supposed to stay out of my way. It was wasting pixels everywhere, huge icons, huge titlebars, huge borders around windows. I like readability. I was showing all text on a background that was highly uneven and frequently moving. It reminded of Apple’s glossy screens…engineers had spent a ton of effort to rid us from reflections on screens, Apple decided it would be cool to do it different and people have been buying it.

  • UAC killed usability.
  • Another common complaint.

  • It was significantly more closed.
  • I was OK with XP. I could do anything with it and though w/out access to the sources some things were hard, there was enough documentation to make them possible. With Vista, MS tried to keep independent developers (like me) off the kernel. You needed to have all your drivers signed, which was supposed to improve security. I didn’t believe it would make computers any more secure (and I was right) and viewed as a pure obstacle to keep people from messing too deeply with their systems (and I’m not sure if I was right). Soon there was a hole discovered that allowed one to use self-signed drivers, but I didn’t want to use it. Even if it stayed unpatched, I didn’t want to fight my PC, I wanted it to just do what I ask it to.

  • It added privileged processes that users couldn’t touch with debuggers.
  • A DRM enabler. It was a hot time for DRM, many still believed they could stop piracy with it and kept investing in it. I found it hugely disturbing. It was my PC and MS gave media companies some control over it that it didn’t give to me. I still find it weird that it’s so rarely talked about, while few need it (and I didn’t need it either), the principle is just terrible.

    What was my reaction?
    First, I didn’t believe. It couldn’t be that bad.
    It was. Then, deluded myself that with the next Windows things would improve.
    Then realised that people loved the look => if anything, it would get worse. People complained about slowness and bloat. It could improve. People didn’t talk at all about freedom. So it too didn’t have a chance to get better either.
    I was stumped. I didn’t see any way forward. My experiences with Linux have been simply terrible and I didn’t expect any Unix to make a better desktop, so this was not an option. I knew I had to do something. I had quite a few years, I could keep using XP until all hardware vendors would stop supporting it and then another couple before the last supported PC would die. (I guess I was not the only as MS does its best to make new software stop working on XP (new .NETs don’t work, Visual Studio 2012 binaries don’t work and for the latter I see no reason other than the will to kill XP))
    After Vista, I didn’t have a great period… I had limited time to do something, with a limit being loose but still. And no way forward, just darkness. It stayed like that for over 3 years…
    To be continued.

    June 10, 2012

    xxhash256 update

    Filed under: Uncategorized — niezmierniespokojny @ 11:41 am

    One of the drawbacks of the previous version of the hash was limited avalanching, each bit of the input could affect at most 64 bits of the output.
    I decided to fix it by adding slight interactions between internal state variables. The process is slow and has high latency, but the function was designed for large inputs, so it doesn’t matter much. Also, I changed the main loop so that multiplications occur less often, so it’s something between the strong and the fast hash with a better avalanching.
    To make the interactions really small, I unrolled the main loop and the code got faster in the process, clocking at 3024 MB/s (previously – the fast version did 3008 and the fast one – 2751 MB/s), so I obsoleted the fast hash. Well, I could unroll it too, but I don’t think it’s worthwhile.

    /*
       xxHash256 - A fast checksum algorithm
       Copyright (C) 2012, Yann Collet & Maciej Adamczyk.
       BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
    
       Redistribution and use in source and binary forms, with or without
       modification, are permitted provided that the following conditions are
       met:
    
           * Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.
           * Redistributions in binary form must reproduce the above
       copyright notice, this list of conditions and the following disclaimer
       in the documentation and/or other materials provided with the
       distribution.
    
       THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
       "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
       LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
       A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
       OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
       SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
       LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
       DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
       THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
       (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
       OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    
    ************************************************************************
       This file contains a super-fast hash function for checksumming
       purposes, designed for large (1 KB++) inputs.
       Known limitations:
        * they use 64-bit math and will not be so fast on 32-bit machines
        * on architectures that disallow unaligned memory access, the input
          must be 8-byte aligned. Aligning it on other architectures
          is not needed, but will improve performance.
        * it produces different results on big and small endian.
      
       Changelog:
        v0: initial release
        v1: the strong hash is faster and stronger, it obsoletes the fast one
    */
    
    //**************************************
    // Includes
    //**************************************
    #include "stddef.h"
    #include "stdint.h"
    #include "string.h"
    
    
    //**************************************
    // Macros
    //**************************************
    #define _rotl(x,r) ((x << r) | (x >> (64 - r)))
    
    //**************************************
    // Constants
    //**************************************
    #define PRIME 11400714819323198393ULL
    
    //******************************
    // Hash functions
    //******************************
    void XXH_256(const void* input, size_t len, uint64_t* out)
    {
        const uint8_t* p = (uint8_t*)input;
        const uint8_t* const bEnd = p + len;
        uint64_t v1 = len * PRIME;
        uint64_t v2 = v1;
        uint64_t v3 = v1;
        uint64_t v4 = v1;
    
        const size_t big_loop_step = 4 * 4 * sizeof(uint64_t);
        const size_t small_loop_step = 4 * sizeof(uint64_t);
        // Set the big loop limit early enough, so the well-mixing small loop can be executed twice after it
        const uint8_t* const big_loop_limit   = bEnd - big_loop_step - 2 * small_loop_step;
        const uint8_t* const small_loop_limit = bEnd - small_loop_step;
    
        while (p < big_loop_limit)
        {
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v1 += v2 *= PRIME;
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 += v3 *= PRIME;
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 += v4 *= PRIME;
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 += v1 *= PRIME;
        }
    
        while (p < small_loop_limit)
        {
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 += v1 *= PRIME;
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 += v2 *= PRIME;
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 += v3 *= PRIME;
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v1 += v4 *= PRIME;
        }
        memcpy(out, p, bEnd - p);
    
        out[0] += v1;
        out[1] += v2;
        out[2] += v3;
        out[3] += v4;
    }

    June 9, 2012

    A fast checksum

    Filed under: Uncategorized — niezmierniespokojny @ 6:22 am

    Inspired by xxhash and fletcher4 I decided to write a checksum meant to be stronger than fletcher4 (not an ambitious goal) and as fast as possible. I started off with xxhash (both strong and fast versions of it) and turned it to a 256-bit (like fletcher4) checksum optimised for 64-bit machines.
    I was surprised how well it went. There are several algorithms on the edge of my memory bandwidth limits and I didn’t expect to be able to beat them by a noteworthy margin, but I got almost 10%.

    The 2 hashes:
    xxhash-fast256 3008 MB/s
    xxhash-strong256 2751 MB/s

    Competition:
    xxhash-fast 2777 MB/s
    fnv1-tesla 2716 MB/s
    CityHash128 2666 MB/s
    SpookyHash 2280 MB/s
    xxhash-strong 2015 MB/s
    fletcher4 1855 MB/s
    fnv1-meiyan 1533 MB/s
    CrapWow 1409 MB/s
    murmur3_x64_128 1272 MB/s

    All benchmarked on Pentium D 805, compiled with mingw64 4.6.2. CrapWow comes with C and ASM versions, what you see above is C. ASM is faster, but nothing to brag about.

    BTW, I am surprised also by how badly Murmur3 worked. I guess it doesn’t like my CPU or compiler or something.

    The code:

    /*
       xxHash - Fast Hash algorithm
       Copyright (C) 2012, Yann Collet & Maciej Adamczyk.
       BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
    
       Redistribution and use in source and binary forms, with or without
       modification, are permitted provided that the following conditions are
       met:
    
           * Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.
           * Redistributions in binary form must reproduce the above
       copyright notice, this list of conditions and the following disclaimer
       in the documentation and/or other materials provided with the
       distribution.
    
       THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
       "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
       LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
       A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
       OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
       SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
       LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
       DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
       THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
       (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
       OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    
    ************************************************************************
       This file contains a pair of super-fast hashes for checksumming
       purposes, designed for large (1 KB+) inputs.
       Known limitations:
        * they use 64-bit math and will not be so fast on 32-bit machines
        * on architectures that disallow unaligned memory access, the input
          must be 8-byte aligned. Aligning it on other architectures
          is not needed, but will improve performance.
    */
    
    //**************************************
    // Includes
    //**************************************
    #include "stddef.h"
    #include "stdint.h"
    #include "string.h"
    
    
    //**************************************
    // Macros
    //**************************************
    #define _rotl(x,r) ((x << r) | (x >> (64 - r)))
    
    //**************************************
    // Constants
    //**************************************
    #define PRIME 11400714819323198393ULL
    
    //******************************
    // Hash functions
    //******************************
    void XXH_fast256(const void* input, size_t len, uint64_t* seed_out)
    {
        const uint8_t* p = (uint8_t*)input;
        const uint8_t* const bEnd = p + len;
        uint64_t v1 = len * PRIME;
        uint64_t v2 = v1;
        uint64_t v3 = v1;
        uint64_t v4 = v1;
        const uint8_t* const limit = len > 4 * sizeof(uint64_t) ? bEnd - 4 * sizeof(uint64_t) : NULL;
    
        while (p<limit)
        {
            v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        }
        memcpy(seed_out, p, bEnd-p);
    
        seed_out[0] += v1;
        seed_out[1] += v2;
        seed_out[2] += v3;
        seed_out[3] += v4;
    }
    void XXH_strong256(const void* input, size_t len, uint64_t* seed_out)
    {
        const uint8_t* p = (uint8_t*)input;
        const uint8_t* const bEnd = p + len;
        uint64_t v1 = len * PRIME;
        uint64_t v2 = v1;
        uint64_t v3 = v1;
        uint64_t v4 = v1;
        const uint8_t* const limit = len > 4 * sizeof(uint64_t) ? bEnd - 4 * sizeof(uint64_t) : NULL;
    
        while (p<limit)
        {
            v1 += _rotl(v1, 29); v1 *= PRIME; v1 += (*(uint64_t*)p); p+=sizeof(uint64_t);
            v2 += _rotl(v2, 31); v2 *= PRIME; v2 += (*(uint64_t*)p); p+=sizeof(uint64_t);
            v3 += _rotl(v3, 33); v3 *= PRIME; v3 += (*(uint64_t*)p); p+=sizeof(uint64_t);
            v4 += _rotl(v4, 35); v4 *= PRIME; v4 += (*(uint64_t*)p); p+=sizeof(uint64_t);
        }
        memcpy(seed_out, p, bEnd-p);
    
        seed_out[0] += v1;
        seed_out[1] += v2;
        seed_out[2] += v3;
        seed_out[3] += v4;
    }

    Final note:
    The code is very young and I’m a checksumming amateur, there might be bugs.

    January 22, 2012

    Snappy compression added to BTRFS

    Filed under: Uncategorized — niezmierniespokojny @ 2:55 pm

    Recently, Intel’s Andy Kleen published a set of patches that add Snappy compression support to BTRFS.
    My first reaction was:
    What for?
    BTRFS already has LZO, which is stronger and compresses faster. Decompression is slower, but in all real cases the bottleneck will be the disk, so higher strength should lead to lower disk use and better performance with reads too.
    So what reason does Andy provide?

    “snappy is a faster compression algorithm that provides similar compression as LZO, but generally better performance.”

    Hmm.
    I benchmarked these algorithms on several machines and did some benchmarks that didn’t end up published on several more and LZO was definitely faster during compression (but slower during decompression).
    In an older post, Andy shared some details:

    “This has been tested with various test workloads and also in a real distribution. One challenge with the benchmarks is that a lot of of benchmarks only read/write zeroes, which are a uncommon good case for IO compression. However even with modified workloads that use different patterns gains are seen.

    In general snappy performs better with a 64bit CPU, but also generally outperforms LZO on a 32bit Atom system on IO workloads.

    The disk space usage is similar to LZO (within 0.2%)”

    I have a suspicion.
    He probably tested on largely incompressible data. In my tests, the size delta was in the range of 0.8-1.5%, so way bigger than his. And it happens that on incompressible data, Snappy is indeed faster than LZO.
    But if your data is incompressible why use compression at all?

    It would be nice if Andy clarified what data did he use and maybe gave more details about the test setup.

    I’ve been thinking about it for a while and I just don’t see a really compelling use case for Snappy in BTRFS. There are some like a mix of highly compressible and incompressible where it would be better than LZO, but just slightly.

    January 14, 2012

    A quick look at Windows 8 storage virtualisation

    Filed under: Uncategorized — niezmierniespokojny @ 7:15 am

    A couple of days ago, Microsoft’s Steven Sinofsky blogged about some cool new Windows 8 storage features that could be valuable for about anyone.
    For home users, Win 8 offers a nicely implemented data protection scheme where you have multiple drives in the system and in case some of them die, you don’t lose any of your photos, music, emails or whatever you find valuable.
    OK, Windows could do it for a while with RAID. But the new ‘Storage Spaces’ are much more flexible and usable. In fact, it looks like the best implementation that I’ve seen.

    Here’s how it goes:
    First, you have a pool of drives. You can add a new drive at any time and your available capacity (and performance) will increase.
    On top of it you create ‘Spaces’. They are much like partitions, but:

  • They are thinly provisioned. The total size of all spaces can be bigger than your drives can hold. In fact even a single space can be bigger. Obviously, you won’t be able to store thus much, but it’s useful, because you don’t have to decide up front how to carve your drives to separate partitions. And when you add drives, you can leave your partitions untouched, the system will just start using them.
  • They provide RAID-like protections. You can use RAID 1, 5 and 6-like schemes.
  • I just love the data protection implementation. It’s different because it’s not at the drive level. Having just 1 drive pool, you can protect different data in different ways. In example, temp doesn’t need any protection. For system, RAID-1 is probably the best because it offers the best performance. And for important data I would use RAID 6 because it’s very secure and rather cheap.
    I don’t think there is anything as good available anywhere and I’m sure – not for home users. The closest thing is provided by ZFS, but it’s less flexible:
    You have a similar separation of pools and filesystems.
    You can have any RAID-like protections at the pool level
    You can have RAID-1-like protection at filesystem level

    This means that if you want to use the cheap protections (RAID5-6) in ZFS, the way to do it is by having it on the pool level and no protection at the filesystem level. So you must have it for all filesystems in the pool or for none. Also, it prevents you from adding drives one by one. You have to add an entire RAID group at the time.

    OTOH the Microsoft’s thin provisioning implementation is quite ugly. For most uses you don’t want to have any partition size at all. It should take just as much as space as it needs, w/out any limits. Or – better – there should be limits, but ones that you can increase freely. That’s how ZFS implements it.
    [ADD] Microsoft explained in their FAQ that you can increase size of a Space. So it’s practically as good as others in this regard. [/ADD]

    OK, I covered 2 options available for home users. There’s also a 3rd one, BTRFS.
    How does it stand against them when it comes to virtualisation features? Not well, but could be worse. There’s only RAID-1. And at the level I’d describe as being between the pool and filesystems. All files use the same protection, but you can add drives with different sizes one by one. Thin provisioning works like on ZFS.

    There’s one virtualisation feature here that ZFS has and others don’t – hybrid storage pools. You put both HDDs and SSDs in a single pool and the system uses them according to their capabilities. I don’t see any reason for MS not to add it at some point though.

    Also there’s one thing that is not virtualisation in itself, but it’s an important feature that can be implemented only at the level that provides RAID. Protection from silent data corruption. ZFS and BTRFS offer it. Microsoft doesn’t talk about it, so I guess they don’t. But it’s likely just a matter of time. In the Steven’s blog I see nothing standing on the way.

    January 8, 2012

    Thoughts on moderation

    Filed under: Uncategorized — niezmierniespokojny @ 11:47 am

    I’ve been thinking about moderation today. Moderation of internet forums, comments, films and everything that people let others write. Or in short: of communication.
    What purpose does it serve?
    Sadly, too often it is used to control what people say or do. Silence criticism. Prevent talks about competition. Remove all information that moderator finds sensitive. I think this approach is wrong because it goes against the main purpose of moderated mediums – to let people communicate. And the more important the thing that you try to stifle is, the higher likeness that users will communicate it anyway, just in a different place. So it brings little benefit for you and hurts your users, often the most important thing about your site.

    In my opinion, moderation should be viewed differently. As a service provided to users, meant to improve their experiences.
    It’s hard to get right. Different people have different needs. I’d like to mention 2 approaches…

    One extreme is Frost. It’s a censorship-resistant, anonymous, unmoderated forum. Total freedom. And pure chaos. Some like it. I spent literally minutes on it, couldn’t stand all the threads about child abuses and all the spam.

    On the other side of the spectrum are Ubuntu Forums. I certainly doubt it’s the most heavily moderated site on the internet and if I wanted, I would find one. But still, its goal of being safe for children forces them to put a lot of limitations on how they let their users act.

    For me, both approaches are wrong. In fact, all approaches that I’ve seen are wrong.

    Because different people have different needs and everybody tries to shoehorn all their users into 1 category. I totally see uses for family friendly sites and I think that being one is good for Ubuntu. Nevertheless I would like the place better if they didn’t remove emotional but unkind actions. The place could be much more lively.

    Why no forums lets users choose their level of moderation?
    A couple of checkboxes, ‘spam’, ‘hate speech’, ‘curses’, etc.

    It doesn’t seem hard to carry out and would work much better.

    And I have 1 more thought.
    The service does not have to be provided by the same person / organisation that administers the medium. A mechanism that lets anybody become a moderator and provide their own view of the medium would be a great choice. Don’t like the default moderation? Provide your own. This is surely not an option for those who think they are entitled to control what people talk. But it’s something that would make your site more valuable to your users. There will always be ones who don’t like the way you moderate and alternatives could serve them better…and, oh, providing such feature is only some capital expense (though probably quite large), operational costs would be carried by users themselves.

    December 10, 2011

    A great illustration of BitTorrent’s public status

    Filed under: Uncategorized — niezmierniespokojny @ 9:15 pm

    YouHaveDownloaded is a public database of who downloads what. Click the link and it’s likely to show some of your recent downloads.
    They claim that they monitor 20% of global BitTorrent transfer. In my case they showed 1 file of several that I got recently. They didn’t show the dozen of *nix distros that I seed, so I guess that they don’t monitor some kinds of trackers.
    But overall, I think that the service is great. BitTorrent downloads are public and anybody can see them, but awareness of this fact is very low. Various pay-up-or-else shakedown schemes popping up around the world recently raised it significantly, but I don’t regard them as the best way of educating the public…And they aren’t used in places where P2P is legal anyway.

    What if you don’t want your neighbours, kids or whoever know your favourite porn film genres?
    Use more anonymous forms of P2P.
    Wikipedia lists some.

    Myself I recommend BitTorrent anonymised by I2P. It’s not the easiest to set up, but probably the biggest network – and size is important both for anonymity and amount of content available.

    December 8, 2011

    Falkvinge’s speech on free speech

    Filed under: Uncategorized — niezmierniespokojny @ 10:12 am

    Yesterday Rick Falkvinge posted a piece advocating free speech. Mostly, he talked the usual stuff, importance of free speech, dangers for it, chilling effects and such.
    But one statement turned me on.
    “the public may be just as harsh a judge (…). In such circumstances, it is the role of government to suppress those people and activities who would make people feel threatened by stating opinions that are out of line with commonly held beliefs.
    In other words, “Government should censor speech that makes people afraid of speaking”.
    Actually I think that he meant something weaker like “Government should censor speech that makes people reasonably afraid of speaking”, but it’s not my main point.
    It’s the “We need to protect our society from harmful forms of speech” argumentation that’s common for all forms of censorship, from China and Pakistan all the way to EU. In this regard, censorship is the same everywhere.
    And it’s a funny thing to see in a post advocating for freedom.

    To clarify one thing, I’m not against censorship in general. But I think it should be recognised as such and spoken clearly about. It’s the most healthy way.

    November 28, 2011

    A web-centric image compression benchmark

    Filed under: Uncategorized — niezmierniespokojny @ 6:39 pm

    There are quite a few established image compression benchmarks, but none are centred around images that you can find on the web. And as Google recently released their new contender in the space, I thought that such test would be useful.

    Competitors:

    I tested all compressed image formats with reasonable popularity and some more:

    Webp Lossless
    – the star of today’s show. Extremely slow, but supposedly strong compression and fast decompression.

    Optimized PNG – an old and weak, but hugely popular image format. I optimized the images with the scheme that I use for myself:

      deinterlacing
      pngwolf
      pngout /s0 /f6
      deflopt
      defluff

    I find it to be the strongest scheme that’s worth using, though clearly some disagree and make stronger but much slower ones.

    BCIF
    – the state of art encoder with fast decoding. Nobody uses it, but it works well.

    JPEG2000 – The second most popular file format in the comparison, far from the first and from the third. I wondered whether to include it, because its decompression speed is bad, it’s not really in the same league. Still, I was interested how does it fare on web images.

    JPEG-LS – even less popular than JPEG2000, but quite fast and in some tests it was overall very efficient. I used CharLS as the reference implementation. Again I had to use the splitting trick. Encoded

    I considered testing FLIC too, but decided against it. Much like JPEG2000 it’s in a different league and unlike JPEG2000, nobody uses it. And it’s likely that it wouldn’t score great because it is implemented to compress photographic images.

    I used the following program versions:
    webpll version 2011-11-20
    ImageMagick 6.7.3-8
    pngwolf 2011-04-08
    pngout 2011-07-02
    deflopt 2.07
    defluff 0.3.2
    BCIF 1.0 beta for Windows
    BCIF 1.0 beta Java (when the previous one crashed)
    Kakadu 6.4.1
    Loco 1.00

    Test Data:
    I downloaded all PNG images from Alexa top 100 sites and everything 1 link away from them. I didn’t block ads. This yielded ~350 MB of images with average size of just over 17 KB. That was too much, I wanted ~100 MB. So I chose 6000 randomly and got a dataset of 105.2 MB.
    Next time I would do it better though. In the dataset there are a few series of very similar images, throwing away almost 3/4 of the dataset couldn’t eliminate them and I think that spidering the entire top 1000 would be better. Or maybe first-page results of top 100 Google searches would work even better? Either selection is biased anyway, it’s hard to come up with something that isn’t.

    The files take 105.2 MB as PNGs and 486.3 MB decompressed to PPMs.
    The smallest image has 70 bytes, the largest 1.8 MB. The average size is 18 KB and the median – 4.3 KB.
    47.4% use transparency. Or rather – have transparency information embedded. In quite a few cases the info says just that the image is fully opaque, which takes some space and does nothing.
    I wanted to determine how many of them came optimized. But how to tell them from the rest? I guess that the best way is to see whether my optimization gets significant savings. What is significant? At first I decided: 2%. Under this definition, 9.2% of images came optimized. But later I decided to sort files by compression ratio achieved by optimization and plot compression ratio against percentile of files.
    The results are below:

    optimization savings

    As you can see, the line is quite smooth, there is no clear divide between files that came optimized already and those that didn’t.
    You can also note that compression ratio goes almost down to 0 (more precisely – to 0.01004). On one file recompression saved almost 99% of initial size, on 11 – at least 95%. Quite a lot, I think.

    Problems with making the test:
    It’s sad to say, but the state of image manipulation software and partially with the ecosystem in general is just terrible. Problems?
    1. There is no standard image format. When it comes to viewing, there is no problem. But with format conversion, there is. Some encoders require PNG input. Some BMP. Some PNM. And these formats are not compatible with each other. The limitations that hurt me with this test were lack of support for transparency and for bit depths greater than 8 bits per colour per pixel.
    2. Encoders have limited support for input file formats. Sometimes its bugs, sometimes just lacking features, but I’ve had problems with, I think, 3 programs because of it.
    3. Encoders don’t fully support their target output. For example, nconvert can’t compress to JPEG2000 losslessly and OpenJPEG JPEG2000 encoder doesn’t support transparency.
    4. Image conversion tools are limited or buggy. There were 3 image conversion tools that I tried, ImageMagick, Nconvert and ImageWorsener. The last one wasn’t really useful because it supported ony BMP output and only for non-transparent files. Both ImageMagick and Nconvert would silently corrupt data from time to time..
    5. The worst of all. I found no image comparison program that would work correctly. Tried 2. ImageMagick and Imagediff. Both produced both false positives and false negatives. There are 1000s of others out there, but seemingly nothing that would allow batch comparisons. So I worked with terribly buggy programs that frequently corrupted my data and I couldn’t reliably detect whether they had luck to produce correct results or not. I spent a lot of time doing different tests to assess correctness and whatever bug I found – I worked around it. But there may be more. I don’t know. It’s really disheartening.

    To give you a sense what I went through, a JPEG2000 transparency story:
    OpenJPEG doesn’t support transparency at all.
    Nconvert doesn’t support lossless mode at all.
    ImageMagick corrupts data when used this way.
    Jasper accepts only bitmaps. But not transparent ones produced by ImageMagick.
    Kakadu accepts only bitmaps. But not transparent ones produced by ImageMagick.
    Jasper doesn’t accept bitmaps created by Nconvert either.
    But Kakadu does!
    Nconvert corrupts data with such translation.

    So after a lot of work, I didn’t find a way to make it work. I decided to use (unreliable) verification of Nconvert’s work and when there were problems – I split each file into 2; one contained only colours and the other only transparency data. Compressed them independently and summed the sizes.

    I did the same splitting trick with formats that don’t support transparency, JPEG-LS and BCIF.
    Also, none format other than PNG supported 64-bit images. Or rather – JPEG2000 supports them, but Nconvert doesn’t. With such images I added a (unoptimized) PNG size to the listing of each codec (except for optimized PNG, which is the only that got any savings on these images).

    Performance:

    I didn’t measure performance. The data was quite large and I couldn’t afford leaving the computer exclusively for compression and using it while compressing skews the results. So the only strong data provided is compressed size. But to give you some idea of what’s going on, webp took over 4 days and nights of CPU time. PNG optimization half as much. All others did their jobs in minutes.

    Results:

    First, some numbers:
    Optimization compressed the data to 84.1%
    Webpll to 65.7%
    BCIF to 93.7%
    JPEG2000 to 110.8%
    JPEG-LS to 115.8%.

    Quite interesting. BCIF, JPEG2000 and JPEG-LS all beat PNG by a significant margin in all tests that I’ve seen before. And now they got beaten by it. I guess there is a merit in naming the file format “Portable Network Graphics”.
    By looking at fragmentary test results of other people I expected webpll to be rather pale in comparison, but it leads the pack with a big margin. 21.9% over the second, to be precise.
    Let’s look into more details.

    One picture is worth 1000 words:

    But sometimes needs explaining. Oh well.
    The files are sorted by their original sizes, on the left are the small and on the right are the big. Each line represents a compressor. The y-axis is compression ratio, the smaller the better.
    Both axes have logarithmic scales. Also, to make it readable I applied a low-pass filter to the data.

    As you can see, except for files < 1 KB, the distance between optimized PNG and webpll is quite constant. The other formats are very bad with small images and improve all the way when sizes grow. At roughly 256 KB BCIF is the strongest of the pack, ex-equo with webpll.
    Part of the disconnection between webpll and the rest is file format complexity. Let's look at the smallest image size that I got in each format:
    webpll: 6 B
    PNG: 68 B
    JPEG-LS: 38 B
    JPEG2000: 287 B
    BCIF: 134 B.
    When compressing 1-pixel files, 280 B overhead is something.
    Though it should be noted that HTTP protocol has some overhead too and differences in performance would be much lower (I guess that practically none). Still, JPEG2000 overhead (assuming it's constant over all files) adds to over 1.6 MB over the entire dataset.
    ADDED: Correction, my guess was wrong. People who know the stuff better then I do said that despite HTTP overhead, the minor changes still matter.

    You can download full results here.

    Summary:

    There are 2 classes of image codecs, those that are web centric and that are not. The web-centric care little about speed (you can find some really extreme PNG optimization scripts), but are much stronger on web data. Wepb lossless worked well, it’s very slow but significantly stronger then anything else.

    Older Posts »

    The Shocking Blue Green Theme Blog at WordPress.com.

    Follow

    Get every new post delivered to your Inbox.

    %d bloggers like this: