I am an extreme moderate

November 16, 2015

Fuzz testing compressors

Filed under: Uncategorized — niezmierniespokojny @ 9:13 pm

Introduction

Inspired by Yann Collet’s post I decided to play with fuzz testing.

The idea is tempting, I do nothing and the computer tests code for me. Why not? I took a look at what people use (nearly all use afl). I took a look at how people use it. And saw that the big guys use clusters with thousands of cores fuzzing single project for years.

Oops. I could dedicate 2 old cores at the task. Maybe for a couple of months, but still my effort is puny. Could I find any bugs with what I have at all?

I decided to try anyway. After all, single developers (like Yann) don’t have clusters at disposal and still find fuzzing useful.

OK, I’m in. What should I fuzz?

Some widely used library like zlib is out, too well fuzzed already to be worthwhile for me.

A research codec like crush? I like it. I’m relatively likely to find bugs in it. But who cares? It has no pretensions to be useful in production and nobody uses it for anything but fun.

I decided to concentrate on something in between. Codecs that are

considered to be production quality
overall working good
not version 1 committed last week

Initially, I picked bsc. Later on I extended the test to cover also:

Did I manage to find any bugs?

Yes, I did. In each of them in fact… In all but 2, within seconds from the first run of a proper build.

What did I find?

bsc – A lot of bugs in decompression. One in compression too.
density – After compiling with bug-detecting flags wouldn’t work, there was a division by zero in speed calculation when the file was small and compression time was 0. I fixed it and fuzzed out a lot of bugs.
heatshrink – just a pair of broken assertions (one in compressor, one in decompressor). And the author has fixed them before I reported them.
lzham – After compiling with bug-detecting flags wouldn’t work at all
zpaq – After compiling with bug-detecting flags wouldn’t compress. Fuzzing decompression yielded several bugs.
zstd – Quite resilent. After several hours of fuzzing I found a lone bug in decompression. I fuzzed it for another week, found nothing.

I have 3 thoughts concluding that:

2 programs that resisted my attempts better than others. Both have been afl-fuzz tested before. Both used other automated testing too; heatshrink – property based tests and zstd – a custom fuzzer.
Building a program with instrumentation is a large portion of success. With 3 codecs I found bugs before I started fuzzing.
It’s much harder to write correct code than I thought. All these codecs have been written by good devs. All but zstd exist for years. Not sure about density, but all others are used in production. All were easy to break, though heatshrink only with assertions on.

Finding bugs

You may wonder how to fuzz effectively? I’m still just a noob, but I have learned some lessons already.

There are 2 components for finding a bug by fuzzing.

Your fuzzer has to run the code on data that causes fault condition and
it has to notice the problem

Let’s concentrate on noticing first. Afl has no knowledge of what programs state is correct and what is not. It finds just one kind of errors. Crashes. Therefore the goal is to cause the program to crash in as many erroneous conditions as possible.

The first thing to do is enable assertions. If you’re a developer and you’re not used to writing them – start to do so. Combined with terabytes of fuzzer-generated data, they are a really good way of validating the assumptions that you make when writing the code.

Then, there’s clang with its wonderful sanitizers. The ones useful are:

One could enable them all, but I recommend that in general case it’s not a great idea. The reason is that Memory Sanitizer slows the code where I tried it by a factor of 20-30. It enables you to find one more class of bugs, but greatly reduces chances of finding others. It would make more sense to start without it and after afl stops finding new test cases that improve coverage, recompile with MSan and continue fuzzing for a while.

Also, docs for Undefined Behavior Sanitizer are broken and it’s not trivial to turn it on properly.

The basic usage is to add -fsanitize=undefined to the command line. However if you the read docs of another switch, -fsanitize-recover, you’ll learn that by default majority of checks caused by -fsanitize=undefined will not cause a crash but just print a message. You have to add -fsanitize-trap for all such checks. And there’s a common problem with UBSan and compressors on AMD64. This architecture allows unaligned memory access and codecs tend to exploit it. UBSan detects them and flags as error, so you should add -fno-sanitize=alignment too.

There are further switches that improve the situation:

-fstack-protector-all guards against stack buffer overflows. I’m not sure, with ASan it might be redundant.

-DFORTIFY_SOURCE=2 with some stdlibs makes functions like memcpy more aware of what buffers they operate on, so they can detect overflows too. Again it, may be redundant.

To sum up, the combinations of switches that I arrived at is:

-DFORTIFY_SOURCE=2 -fstack-protector-all -fsanitize=undefined,address -fsanitize-trap=array-bounds,bool,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound -fno-sanitize=alignment

With density and zpaq, compiling like that would take several hours. At least in case of density the reason was code bloat that I managed to fix by changing inlining settings. I didn’t investigate zpaq.

I should note however that this is not bulletproof. I’m not sure if it’s a clang bug, but I noticed cases when there would be an error printed, but no crash. I don’t know what to do about it. OK, I should ask llvm people, but so far I’m busy with other stuff.

Now, many types of errors will cause a crash. But there’s one more thing that can be done and of the tested programs only lzham does (and I’m not sure if it does in a fuzzer-detectable way). Add a switch for the compressor that, after it’s done, decompresses the data and verifies that it matches the original. After all, a compressor that destroys data w/out crashes is not a good one.

Initial testset

Generation of data is another important topic. First, if your codec has a checksum verification on decompression, disable checking it. It’s pointless to fuzz with files that get discarded before reaching your algorithm.

After that you need to create an initial test set. Several rules to follow:

Try hard not to use large testsets. It really kills performance, both because it makes each run of your exe slower and because it slows down deterministic tests of afl itself. Afl even has a limit of 1 MB, though normally it’s way to large. The recommended size is up to 1 KB. I once got several days of fuzzing accumulated that I decided to discard. Almost all of the test files were ~64 KB in size and it would take c.a. one year to finish the first round. I created a new test set with the largest file being 5 KB. Initially it was much poorer, but overtook the original in several hours.
Start with a diverse set of files. Get as many as you can and then run afl-cmin to trim the set down to cover only interesting cases. For zstd decompression, I started with:
- several standard benchmark files cut into small pieces (2-16 KB) and compressed; then filtered to include only the ones under 1 KB,
- 1 MB of zeros compressed. zstd splits data into 512 kB blocks, I wanted to have a case with more than one,
- Every file that I got while testing compression; again filtered to keep only smaller files,
- 2 highly compressible (but not as simple as a stream of zeros) files of 129 KB, so they are larger than window size. The larger was 5 KB compressed,A block of random data, compressed as well.

In all, that was 120 000 files that afl-cmin trimmed to under 500.

3. Sort the files by size, ascending. Afl fuzzes files in order determined by file name. When interesting file is found, it gets added to its fifo. When you can find the same interesting feature with 2 files, you want to keep the smaller one. Afl doesn’t do it, it keeps the first one. So you want it to find the smaller one first and sorting is a hack to do just that. The proper way would be to use a priority queue instead of a fifo in afl itself. I suggested it to the author but got no reply.

Running afl

Afl employs a number of strategies to find common errors. For example, it injects special integers like 0xFFFF, increases or decreases words or flips bits. Not all strategies work equally well.

For testing compression, there were 2 strategies that ruled:

havoc, which does a lot of random changes,
splice, which takes a front from one file, the back from another, joins these together and runs havoc on the result.

With splice finding 2-4 times as many test cases as havoc.

You can instruct afl to run only these strategies by running it with a -S switch. You can’t instruct it to use only slicing. A source modification for doing so would be welcome (doing it the dirty way is easy, though I believe command line switches with strategy selection would be much better).

Different compression levels result in different code paths. Therefore it’s important to fuzz all distinct ones. What I noticed is that a file that’s interesting for one level is often interesting for another too. Therefore it’s useful to run a number of afl-fuzz instances (even more than cores available) and make them share results. Afl does a sharing itself as long as all instances use -M or -S switch (the latter I mentioned already) and use the same output directory.

For decompression, havoc and splice are again the efficiency leaders (superiority of the latter surprises me), though bit flips are remarkably effective too. I recommend running a single master (-M) instance and a number of slaves (-S). Maybe even run the master with lowered priority, so the more effective finders get more CPU.

Summary

If you’re a codec developer, do your users a favour. Fuzz your code. And do it carefully. And every time you find a bug, sing along with me:

Fuzzing's great, fuzzing's good, 
fuzzing does what fuzzing should.
Fuzzing's just, fuzzing's right,
fuzzing is our guiding light.
Always right, never wrong,
we praise fuzzing in this song.

Also, share your test data. A readily available high-coverage test set makes it easier for others to do QA for your project on their own.

Leave a Comment

March 16, 2013

Copyright infringing post

Filed under: Uncategorized — niezmierniespokojny @ 9:38 pm

Just read a torrentfreak article on a torrentz censorship and it really strikes me how far-fetched are claims of copyright abusers.

They send a DMCA notice to google. This means that they meant that google was infringing. According to them, Google was infringing because it linked to an infringing site – torrentz.eu.

torrentz was infringing…why? The notice doesn’t mention it. But torrentz is a search engine that links to sites containing torrent files. Probably some of this sites hosted a torrent file that they considered infringing.

That file enabled downloading of another file. That other file was copyrighted and may have been distributed w/out permission.

To sum up, the link flow went like this:

google
torrentz
some torrentz search result
some site
torrent file
pirate swarm

So they mean that if you’re 5 links away from a file they hold copyright on, you’re a pirate.

I just linked to torrentz.eu, exactly like google did. I’m a pirate.

Actually, Paramount is a pirate too. On their main page they link to youtube. You can find plenty of pirated files in there just another 1-2 clicks farther. That’s just 2-3 links in total.

I wonder what limit of the number of links do they consider satisfactory to not be a pirate. 5 is too little. Is there any limit? Anyway, senary liability is nonsense. Just like secondary liability, just far more extreme.

Somebody should punish those punks for abusing a broken system (DMCA). And fix the system.

Leave a Comment

November 23, 2012

My way to BSD, part 1

Filed under: Uncategorized — niezmierniespokojny @ 10:37 pm

I finally made it, I made BSD (PC-BSD to be exact) my desktop OS. It’s been a long road…and one that will keep being bumpy for as long as I can foresee. Still, I believe it goes in the right direction.
I’d like to sum up the years I spent preparing….
It all started in 2007, in January. At the time I have been a loyal Windows fan. I had spent huge amounts of time to learn Windows, not without successes. I kept investing in my Windows skills and wanted to spend my life using it and career – working with it, mostly coding for it.
Then came Vista…and I was in shock.
What was so bad about it? For me, the list was longer than for most.

It was bloated.

Most agree with this one.

It was slow.

Again, a common point.

It was freaking ugly.

I like simplicity. It was complex. I like good use of screen space – OS is supposed to stay out of my way. It was wasting pixels everywhere, huge icons, huge titlebars, huge borders around windows. I like readability. I was showing all text on a background that was highly uneven and frequently moving. It reminded of Apple’s glossy screens…engineers had spent a ton of effort to rid us from reflections on screens, Apple decided it would be cool to do it different and people have been buying it.

UAC killed usability.

Another common complaint.

It was significantly more closed.

I was OK with XP. I could do anything with it and though w/out access to the sources some things were hard, there was enough documentation to make them possible. With Vista, MS tried to keep independent developers (like me) off the kernel. You needed to have all your drivers signed, which was supposed to improve security. I didn’t believe it would make computers any more secure (and I was right) and viewed as a pure obstacle to keep people from messing too deeply with their systems (and I’m not sure if I was right). Soon there was a hole discovered that allowed one to use self-signed drivers, but I didn’t want to use it. Even if it stayed unpatched, I didn’t want to fight my PC, I wanted it to just do what I ask it to.

It added privileged processes that users couldn’t touch with debuggers.

A DRM enabler. It was a hot time for DRM, many still believed they could stop piracy with it and kept investing in it. I found it hugely disturbing. It was my PC and MS gave media companies some control over it that it didn’t give to me. I still find it weird that it’s so rarely talked about, while few need it (and I didn’t need it either), the principle is just terrible.

What was my reaction?
First, I didn’t believe. It couldn’t be that bad.
It was. Then, deluded myself that with the next Windows things would improve.
Then realised that people loved the look => if anything, it would get worse. People complained about slowness and bloat. It could improve. People didn’t talk at all about freedom. So it too didn’t have a chance to get better either.
I was stumped. I didn’t see any way forward. My experiences with Linux have been simply terrible and I didn’t expect any Unix to make a better desktop, so this was not an option. I knew I had to do something. I had quite a few years, I could keep using XP until all hardware vendors would stop supporting it and then another couple before the last supported PC would die. (I guess I was not the only as MS does its best to make new software stop working on XP (new .NETs don’t work, Visual Studio 2012 binaries don’t work and for the latter I see no reason other than the will to kill XP))
After Vista, I didn’t have a great period… I had limited time to do something, with a limit being loose but still. And no way forward, just darkness. It stayed like that for over 3 years…
To be continued.

Comments (2)

June 10, 2012

xxhash256 update

Filed under: Uncategorized — niezmierniespokojny @ 11:41 am

One of the drawbacks of the previous version of the hash was limited avalanching, each bit of the input could affect at most 64 bits of the output.
I decided to fix it by adding slight interactions between internal state variables. The process is slow and has high latency, but the function was designed for large inputs, so it doesn’t matter much. Also, I changed the main loop so that multiplications occur less often, so it’s something between the strong and the fast hash with a better avalanching.
To make the interactions really small, I unrolled the main loop and the code got faster in the process, clocking at 3024 MB/s (previously – the fast version did 3008 and the fast one – 2751 MB/s), so I obsoleted the fast hash. Well, I could unroll it too, but I don’t think it’s worthwhile.

/*
   xxHash256 - A fast checksum algorithm
   Copyright (C) 2012, Yann Collet & Maciej Adamczyk.
   BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are
   met:

       * Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
       * Redistributions in binary form must reproduce the above
   copyright notice, this list of conditions and the following disclaimer
   in the documentation and/or other materials provided with the
   distribution.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

************************************************************************
   This file contains a super-fast hash function for checksumming
   purposes, designed for large (1 KB++) inputs.
   Known limitations:
    * they use 64-bit math and will not be so fast on 32-bit machines
    * on architectures that disallow unaligned memory access, the input
      must be 8-byte aligned. Aligning it on other architectures
      is not needed, but will improve performance.
    * it produces different results on big and small endian.
  
   Changelog:
    v0: initial release
    v1: the strong hash is faster and stronger, it obsoletes the fast one
*/

//**************************************
// Includes
//**************************************
#include "stddef.h"
#include "stdint.h"
#include "string.h"


//**************************************
// Macros
//**************************************
#define _rotl(x,r) ((x << r) | (x >> (64 - r)))

//**************************************
// Constants
//**************************************
#define PRIME 11400714819323198393ULL

//******************************
// Hash functions
//******************************
void XXH_256(const void* input, size_t len, uint64_t* out)
{
    const uint8_t* p = (uint8_t*)input;
    const uint8_t* const bEnd = p + len;
    uint64_t v1 = len * PRIME;
    uint64_t v2 = v1;
    uint64_t v3 = v1;
    uint64_t v4 = v1;

    const size_t big_loop_step = 4 * 4 * sizeof(uint64_t);
    const size_t small_loop_step = 4 * sizeof(uint64_t);
    // Set the big loop limit early enough, so the well-mixing small loop can be executed twice after it
    const uint8_t* const big_loop_limit   = bEnd - big_loop_step - 2 * small_loop_step;
    const uint8_t* const small_loop_limit = bEnd - small_loop_step;

    while (p < big_loop_limit)
    {
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v1 += v2 *= PRIME;
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 += v3 *= PRIME;
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 += v4 *= PRIME;
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 += v1 *= PRIME;
    }

    while (p < small_loop_limit)
    {
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 += v1 *= PRIME;
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 += v2 *= PRIME;
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 += v3 *= PRIME;
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v1 += v4 *= PRIME;
    }
    memcpy(out, p, bEnd - p);

    out[0] += v1;
    out[1] += v2;
    out[2] += v3;
    out[3] += v4;
}

Comments (5)

June 9, 2012

A fast checksum

Filed under: Uncategorized — niezmierniespokojny @ 6:22 am

Inspired by xxhash and fletcher4 I decided to write a checksum meant to be stronger than fletcher4 (not an ambitious goal) and as fast as possible. I started off with xxhash (both strong and fast versions of it) and turned it to a 256-bit (like fletcher4) checksum optimised for 64-bit machines.
I was surprised how well it went. There are several algorithms on the edge of my memory bandwidth limits and I didn’t expect to be able to beat them by a noteworthy margin, but I got almost 10%.

The 2 hashes:
xxhash-fast256 3008 MB/s
xxhash-strong256 2751 MB/s

Competition:
xxhash-fast 2777 MB/s
fnv1-tesla 2716 MB/s
CityHash128 2666 MB/s
SpookyHash 2280 MB/s
xxhash-strong 2015 MB/s
fletcher4 1855 MB/s
fnv1-meiyan 1533 MB/s
CrapWow 1409 MB/s
murmur3_x64_128 1272 MB/s

All benchmarked on Pentium D 805, compiled with mingw64 4.6.2. CrapWow comes with C and ASM versions, what you see above is C. ASM is faster, but nothing to brag about.

BTW, I am surprised also by how badly Murmur3 worked. I guess it doesn’t like my CPU or compiler or something.

The code:

/*
   xxHash - Fast Hash algorithm
   Copyright (C) 2012, Yann Collet & Maciej Adamczyk.
   BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are
   met:

       * Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
       * Redistributions in binary form must reproduce the above
   copyright notice, this list of conditions and the following disclaimer
   in the documentation and/or other materials provided with the
   distribution.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

************************************************************************
   This file contains a pair of super-fast hashes for checksumming
   purposes, designed for large (1 KB+) inputs.
   Known limitations:
    * they use 64-bit math and will not be so fast on 32-bit machines
    * on architectures that disallow unaligned memory access, the input
      must be 8-byte aligned. Aligning it on other architectures
      is not needed, but will improve performance.
*/

//**************************************
// Includes
//**************************************
#include "stddef.h"
#include "stdint.h"
#include "string.h"


//**************************************
// Macros
//**************************************
#define _rotl(x,r) ((x << r) | (x >> (64 - r)))

//**************************************
// Constants
//**************************************
#define PRIME 11400714819323198393ULL

//******************************
// Hash functions
//******************************
void XXH_fast256(const void* input, size_t len, uint64_t* seed_out)
{
    const uint8_t* p = (uint8_t*)input;
    const uint8_t* const bEnd = p + len;
    uint64_t v1 = len * PRIME;
    uint64_t v2 = v1;
    uint64_t v3 = v1;
    uint64_t v4 = v1;
    const uint8_t* const limit = len > 4 * sizeof(uint64_t) ? bEnd - 4 * sizeof(uint64_t) : NULL;

    while (p<limit)
    {
        v1 = _rotl(v1, 29) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 = _rotl(v2, 31) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 = _rotl(v3, 33) + (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 = _rotl(v4, 35) + (*(uint64_t*)p); p+=sizeof(uint64_t);
    }
    memcpy(seed_out, p, bEnd-p);

    seed_out[0] += v1;
    seed_out[1] += v2;
    seed_out[2] += v3;
    seed_out[3] += v4;
}
void XXH_strong256(const void* input, size_t len, uint64_t* seed_out)
{
    const uint8_t* p = (uint8_t*)input;
    const uint8_t* const bEnd = p + len;
    uint64_t v1 = len * PRIME;
    uint64_t v2 = v1;
    uint64_t v3 = v1;
    uint64_t v4 = v1;
    const uint8_t* const limit = len > 4 * sizeof(uint64_t) ? bEnd - 4 * sizeof(uint64_t) : NULL;

    while (p<limit)
    {
        v1 += _rotl(v1, 29); v1 *= PRIME; v1 += (*(uint64_t*)p); p+=sizeof(uint64_t);
        v2 += _rotl(v2, 31); v2 *= PRIME; v2 += (*(uint64_t*)p); p+=sizeof(uint64_t);
        v3 += _rotl(v3, 33); v3 *= PRIME; v3 += (*(uint64_t*)p); p+=sizeof(uint64_t);
        v4 += _rotl(v4, 35); v4 *= PRIME; v4 += (*(uint64_t*)p); p+=sizeof(uint64_t);
    }
    memcpy(seed_out, p, bEnd-p);

    seed_out[0] += v1;
    seed_out[1] += v2;
    seed_out[2] += v3;
    seed_out[3] += v4;
}

Final note:
The code is very young and I’m a checksumming amateur, there might be bugs.

Leave a Comment

January 22, 2012

Snappy compression added to BTRFS

Filed under: Uncategorized — niezmierniespokojny @ 2:55 pm

Recently, Intel’s Andy Kleen published a set of patches that add Snappy compression support to BTRFS.
My first reaction was:
What for?
BTRFS already has LZO, which is stronger and compresses faster. Decompression is slower, but in all real cases the bottleneck will be the disk, so higher strength should lead to lower disk use and better performance with reads too.
So what reason does Andy provide?

“snappy is a faster compression algorithm that provides similar compression as LZO, but generally better performance.”

Hmm.
I benchmarked these algorithms on several machines and did some benchmarks that didn’t end up published on several more and LZO was definitely faster during compression (but slower during decompression).
In an older post, Andy shared some details:

“This has been tested with various test workloads and also in a real distribution. One challenge with the benchmarks is that a lot of of benchmarks only read/write zeroes, which are a uncommon good case for IO compression. However even with modified workloads that use different patterns gains are seen.

In general snappy performs better with a 64bit CPU, but also generally outperforms LZO on a 32bit Atom system on IO workloads.

The disk space usage is similar to LZO (within 0.2%)”

I have a suspicion.
He probably tested on largely incompressible data. In my tests, the size delta was in the range of 0.8-1.5%, so way bigger than his. And it happens that on incompressible data, Snappy is indeed faster than LZO.
But if your data is incompressible why use compression at all?

It would be nice if Andy clarified what data did he use and maybe gave more details about the test setup.

I’ve been thinking about it for a while and I just don’t see a really compelling use case for Snappy in BTRFS. There are some like a mix of highly compressible and incompressible where it would be better than LZO, but just slightly.

Comments (7)

January 14, 2012

A quick look at Windows 8 storage virtualisation

Filed under: Uncategorized — niezmierniespokojny @ 7:15 am

A couple of days ago, Microsoft’s Steven Sinofsky blogged about some cool new Windows 8 storage features that could be valuable for about anyone.
For home users, Win 8 offers a nicely implemented data protection scheme where you have multiple drives in the system and in case some of them die, you don’t lose any of your photos, music, emails or whatever you find valuable.
OK, Windows could do it for a while with RAID. But the new ‘Storage Spaces’ are much more flexible and usable. In fact, it looks like the best implementation that I’ve seen.

Here’s how it goes:
First, you have a pool of drives. You can add a new drive at any time and your available capacity (and performance) will increase.
On top of it you create ‘Spaces’. They are much like partitions, but:

They are thinly provisioned. The total size of all spaces can be bigger than your drives can hold. In fact even a single space can be bigger. Obviously, you won’t be able to store thus much, but it’s useful, because you don’t have to decide up front how to carve your drives to separate partitions. And when you add drives, you can leave your partitions untouched, the system will just start using them.

They provide RAID-like protections. You can use RAID 1, 5 and 6-like schemes.

I just love the data protection implementation. It’s different because it’s not at the drive level. Having just 1 drive pool, you can protect different data in different ways. In example, temp doesn’t need any protection. For system, RAID-1 is probably the best because it offers the best performance. And for important data I would use RAID 6 because it’s very secure and rather cheap.
I don’t think there is anything as good available anywhere and I’m sure – not for home users. The closest thing is provided by ZFS, but it’s less flexible:
You have a similar separation of pools and filesystems.
You can have any RAID-like protections at the pool level
You can have RAID-1-like protection at filesystem level

This means that if you want to use the cheap protections (RAID5-6) in ZFS, the way to do it is by having it on the pool level and no protection at the filesystem level. So you must have it for all filesystems in the pool or for none. Also, it prevents you from adding drives one by one. You have to add an entire RAID group at the time.

OTOH the Microsoft’s thin provisioning implementation is quite ugly. For most uses you don’t want to have any partition size at all. It should take just as much as space as it needs, w/out any limits. Or – better – there should be limits, but ones that you can increase freely. That’s how ZFS implements it.
[ADD] Microsoft explained in their FAQ that you can increase size of a Space. So it’s practically as good as others in this regard. [/ADD]

OK, I covered 2 options available for home users. There’s also a 3rd one, BTRFS.
How does it stand against them when it comes to virtualisation features? Not well, but could be worse. There’s only RAID-1. And at the level I’d describe as being between the pool and filesystems. All files use the same protection, but you can add drives with different sizes one by one. Thin provisioning works like on ZFS.

There’s one virtualisation feature here that ZFS has and others don’t – hybrid storage pools. You put both HDDs and SSDs in a single pool and the system uses them according to their capabilities. I don’t see any reason for MS not to add it at some point though.

Also there’s one thing that is not virtualisation in itself, but it’s an important feature that can be implemented only at the level that provides RAID. Protection from silent data corruption. ZFS and BTRFS offer it. Microsoft doesn’t talk about it, so I guess they don’t. But it’s likely just a matter of time. In the Steven’s blog I see nothing standing on the way.

Comments (2)

January 8, 2012

Thoughts on moderation

Filed under: Uncategorized — niezmierniespokojny @ 11:47 am

I’ve been thinking about moderation today. Moderation of internet forums, comments, films and everything that people let others write. Or in short: of communication.
What purpose does it serve?
Sadly, too often it is used to control what people say or do. Silence criticism. Prevent talks about competition. Remove all information that moderator finds sensitive. I think this approach is wrong because it goes against the main purpose of moderated mediums – to let people communicate. And the more important the thing that you try to stifle is, the higher likeness that users will communicate it anyway, just in a different place. So it brings little benefit for you and hurts your users, often the most important thing about your site.

In my opinion, moderation should be viewed differently. As a service provided to users, meant to improve their experiences.
It’s hard to get right. Different people have different needs. I’d like to mention 2 approaches…

One extreme is Frost. It’s a censorship-resistant, anonymous, unmoderated forum. Total freedom. And pure chaos. Some like it. I spent literally minutes on it, couldn’t stand all the threads about child abuses and all the spam.

On the other side of the spectrum are Ubuntu Forums. I certainly doubt it’s the most heavily moderated site on the internet and if I wanted, I would find one. But still, its goal of being safe for children forces them to put a lot of limitations on how they let their users act.

For me, both approaches are wrong. In fact, all approaches that I’ve seen are wrong.

Because different people have different needs and everybody tries to shoehorn all their users into 1 category. I totally see uses for family friendly sites and I think that being one is good for Ubuntu. Nevertheless I would like the place better if they didn’t remove emotional but unkind actions. The place could be much more lively.

Why no forums lets users choose their level of moderation?
A couple of checkboxes, ‘spam’, ‘hate speech’, ‘curses’, etc.

It doesn’t seem hard to carry out and would work much better.

And I have 1 more thought.
The service does not have to be provided by the same person / organisation that administers the medium. A mechanism that lets anybody become a moderator and provide their own view of the medium would be a great choice. Don’t like the default moderation? Provide your own. This is surely not an option for those who think they are entitled to control what people talk. But it’s something that would make your site more valuable to your users. There will always be ones who don’t like the way you moderate and alternatives could serve them better…and, oh, providing such feature is only some capital expense (though probably quite large), operational costs would be carried by users themselves.

Leave a Comment

December 10, 2011

A great illustration of BitTorrent’s public status

Filed under: Uncategorized — niezmierniespokojny @ 9:15 pm

YouHaveDownloaded is a public database of who downloads what. Click the link and it’s likely to show some of your recent downloads.
They claim that they monitor 20% of global BitTorrent transfer. In my case they showed 1 file of several that I got recently. They didn’t show the dozen of *nix distros that I seed, so I guess that they don’t monitor some kinds of trackers.
But overall, I think that the service is great. BitTorrent downloads are public and anybody can see them, but awareness of this fact is very low. Various pay-up-or-else shakedown schemes popping up around the world recently raised it significantly, but I don’t regard them as the best way of educating the public…And they aren’t used in places where P2P is legal anyway.

What if you don’t want your neighbours, kids or whoever know your favourite ~~porn~~ film genres?
Use more anonymous forms of P2P.
Wikipedia lists some.

Myself I recommend BitTorrent anonymised by I2P. It’s not the easiest to set up, but probably the biggest network – and size is important both for anonymity and amount of content available.

Leave a Comment

December 8, 2011

Falkvinge’s speech on free speech

Filed under: Uncategorized — niezmierniespokojny @ 10:12 am

Yesterday Rick Falkvinge posted a piece advocating free speech. Mostly, he talked the usual stuff, importance of free speech, dangers for it, chilling effects and such.
But one statement turned me on.
“the public may be just as harsh a judge (…). In such circumstances, it is the role of government to suppress those people and activities who would make people feel threatened by stating opinions that are out of line with commonly held beliefs.”
In other words, “Government should censor speech that makes people afraid of speaking”.
Actually I think that he meant something weaker like “Government should censor speech that makes people reasonably afraid of speaking”, but it’s not my main point.
It’s the “We need to protect our society from harmful forms of speech” argumentation that’s common for all forms of censorship, from China and Pakistan all the way to EU. In this regard, censorship is the same everywhere.
And it’s a funny thing to see in a post advocating for freedom.

To clarify one thing, I’m not against censorship in general. But I think it should be recognised as such and spoken clearly about. It’s the most healthy way.

Comments (3)

Older Posts »

I am an extreme moderate

November 16, 2015

Fuzz testing compressors

Introduction

Finding bugs

Initial testset

Running afl

Summary

March 16, 2013

Copyright infringing post

November 23, 2012

My way to BSD, part 1

June 10, 2012

xxhash256 update

June 9, 2012

A fast checksum

January 22, 2012

Snappy compression added to BTRFS

January 14, 2012

A quick look at Windows 8 storage virtualisation

January 8, 2012

Thoughts on moderation

December 10, 2011

A great illustration of BitTorrent’s public status

December 8, 2011

Falkvinge’s speech on free speech

Email Subscription

RSS Subscription

Tags

Archives

November 16, 2015

Introduction

Finding bugs

Initial testset

Running afl

Summary

March 16, 2013

Share this:

November 23, 2012

Share this:

June 10, 2012

Share this:

June 9, 2012

Share this:

January 22, 2012

Share this:

January 14, 2012

Share this:

January 8, 2012

Share this:

December 10, 2011

Share this:

December 8, 2011

Share this:

Email Subscription

RSS Subscription

Tags

Archives