Palettizing Images with K-Means Clustering

Hey folks!

If you're like me and have always wondered how the color palette is decided when you're converting an image to GIF (or another format with limited a color palette), I've got something for you. Thanks to all the machine learning hype on Reddit and Hacker News, I'd heard about k-means clustering quite a while ago, but recently I had one of those sudden realizations: You can use it to find color palettes, too.

So, on monday evening, while mildly intoxicated, I hacked together a little console application that reads in an image, calculates a color palette, and writes the palettized image out to a BMP file. Since then, I spent almost all my free time on making it better;

Optimal k-means clustering is NP-hard, so I use Lloyd's randomized approximation algorithm, which gave decent results from the get-go, but to improve the quality of the output image, I started by generating multiple color palettes, and measuring the Peak Signal to Noise Ratio, to determine which one gives the best output image.
This turned out to not work all that well, because the measured PSNR values are usually within one dB of one another, meaning you hit diminishing returns rather quickly as you increase the number of generated palettes.

Next I tried to be a little more correct with my color science, starting with linearizing the RGB values, then converting them to YCbCr. This should be better, since Lloyd's algorithm is essentially just alternating between measuring euclidian distances (to determine which data points go in which cluster) and taking averages (to determine new cluster mid-points).
This actually decreased the quality of the output, however, because internally I stored the color components packed into the low 24 bits of a uint32_t, i.e. with 8 bits of precision per channel. Linearization effectively destroyed information, so I switched to 10 bits per channel, packed into the low 30 bits. Surprisingly, this merely brought me back up to the level of quality I had before doing any of this.

My first, naive implementation iterated over every single pixel for every step of Lloyd's algorithm. Obviously, this was abysmally slow, taking several seconds to generate a single palette, even for small images. After carefully looking at my code for a while, I determined that I could instead use a list of unique colors, with associated pixel counts, i.e. a histogram.
The easiest way to get one (without just allocating a giant array to use as a lookup table) that I could come up with is to simply make a copy of the image data, qsort it (it's just an array of uint32_t, after all), and then determine the lengths of runs of identical values. This optimization worked way better than I expected, cutting down the time taken per generated palette by about 80%, with little upfront cost.

I still wasn't going fast enough to process large images in a reasonable amount of time, though. My ad-hoc profiling code told me that more than 99% of my time was spent finding the closest color in the palette for each of the unique colors in the histogram. While I didn't see a way to get any algorithmic improvements here, this seemed easy to do with SIMD intrinsics.
Again, this worked better than expected. I was optimistic I'd get more than a 2x speedup, probably more like 3x. It couldn't possibly be more than 4x, since I'd only be using SSE2 instructions with 4 32-bit lanes, or so I thought. Imagine my expression when I got another 5x speedup, presumably because the SIMD code had to be mostly branch-free, while my first implementation wasn't.

The final optimization I put in was to ease the exit criterium of Lloyd's algorithm. My naive implementation iterated until the color palette had fully converged, i.e. didn't change anymore between iterations. This was fine initially, taking between 20 and 60 iterations, depending on the image and the RNG, but after increasing the internal precision to 10 bits, it was more like 50 to 150 iterations.
The solution to this problem is obvious: Using the same color distance metric (i.e. euclidian distance), we exit when no color in the updated palette is further from the previous value than some threshold value. Even with a conservative choice, this allows us to terminate after 5 to 15 iterations with no perceived loss of image quality.

That concludes my summary of the journey up to this point. There's still a couple of things I'd like to do with this thing (adding dither, downscaling the image, replacing PSNR with SSIM, finding more ways to improve palette quality, make a decent CLI to turn it into something that's actually useful, the list goes on...), but for now I need a breather.

If I'm honest with myself and take a look at how I usually treat little projects like this, I'm unlikely to come back and continue working on it within the next couple of weeks, but unlike most of my projects, this is in a somewhat presentable state, and I figured others might want to learn from it as well, so here I am. The source code can be found here, by the way.

Now, let's take a look at some of the generated images:

Original photo by Tony Cherbas

Original photo, apparently by Reddit user Christopher1295 (?)

(Both images are 32 colors, to show how well the tool can cope with a limited color palette. At 256 colors, there's barely any noticable difference, depending on the image, of course.)