The 2024 Wheel Reinvention Jam just concluded. See the results.

Compile time string hashing with C++ constexpr VS. your own preprocessor

Yesterday I was working on adding compile time hashing of strings to my game engine. I use string IDs for referencing assets and so far my asset systems API looks like this:

1
Asset* font_asset = assetm->get("mono_font");


Pretty straight forward and working quite well. I have caching of baked assets and hot asset reloading working and all. Now I wanted to add compile time hashing, so I neither have the runtime overhead of hashing hundreds of string IDs each frame nor the additional complexity of hashing the strings once and using the computed hashes everywhere.

I wanted it to look something like this:

1
Asset* font_asset = assetm->get(C_HASH(mono_font));


Where C_HASH(string) is magically replaced by the correct hash during compile time.

Of course I quickly concluded that a simple C preprocessor à la Handmade Hero is just what I need but (alas!) I thought, why not have a quick googling around to see what other people are doing. On the the pages of the all-beloved (especially around here) Stackoverflow I stumbled on a comment that said something about using C++11 constant expressions (constexpr) to compute hashes at compile time. So let's have a look, I thought to myself.

Using just the C++11 version of constexpr for hashing flew right out the window as it doesn't support loops in the constexpr function. So to compute the hash you have to basically do it in a pure functional style. I had so much common sense left that I knew: the C++ compiler interpreting (probably) thousands of function calls, that each recurse potentially dozens of levels deep can only _suck_. Maybe it doesn't, I haven't tried, but I strongly doubt it.

So I kept looking and found that in C++14 constexpr's supposedly got a big powerup as they now support loops and stuff. Cautiously hopeful, I decided to try them out.

And what do you know, the C++14 constexpr version of my string hashing functions looks exactly the same as the runtime version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
constexpr static u32
_mk_const_hash_fnv1a32_null_terminated(const char* string)
{
    u32 hash = MK_FNV32_OFFSET_BASIS;
    while (*string)
    {
        hash = hash ^ (u32)(*string++);
        hash = hash * MK_FNV32_PRIME;
    }

    return hash;
}


Great, now all we need is to write our little macro and we should be good:

1
#define C_HASH(name) _mk_const_hash_fnv1a32_null_terminated( #name )


I make up a little test like this:

1
mk_log("%u\n", C_HASH(test));


It compiles just fine. I jump into trusty gdb just to confirm everything worked alright but of course (probably to nobodys suprise around here)... it did not.

The hash function got executed at runtime. I double checked that I called the right version and did everything else correctly. I did. So I asked google:

I found this on isocpp.org (emphasis by me):

Bjarne Stroustrup said on Jan 14, 2013 08:58 PM:

[...] according to the standard a constexpr function may be evaluated at compiler time or run time unless it is used as a constant expression, in which case it must be evaluated at compile-time. To guarantee compile-time evaluation, we must either use it where a constant expression is required (e.g., as an array bound or as a case label) or use it to initialize a constexpr.

WHAT? So compilers are just free to do whatever even if the constexpr could be perfectly computed at compile time because the _STANDARD_ says they *may be* evaluated at compile time... This is beyond stupid!

So to ensure that everything runs at compile time we have to jump through more C++ standard committee hoops. We either have to do this everywhere we use a hash:

1
2
constexpr u32 h = C_HASH(test);
mk_log("%u\n", h);


Which is unacceptable to me as I want to be able to call C_HASH wherever I could just type in the hash manually. It's also incredibly easy to forget. Or you have to create this little C++ abomination:

1
2
3
4
5
6
7
8
template <u32 hash>
static inline constexpr u32 __attribute__((always_inline))
_mk_compile_hash()
{
    return hash;
}

#define C_HASH(string) (_mk_compile_hash<_mk_const_hash_fnv1a32_null_terminated(string)>())


So that the compiler is forced to evaluate the hash function because its return value is used as a template argument. I have absolutely given up on the constexpr idea by now but I wanted to see just how horrible it's gonna be.

By the way you also have to still *force* the compiler to inline the _mk_compile_hash function or else you will be stuck with one function call per hash at runtime. Something which, as far as I know, you can't reliably do using MSVC.

Okay, so I assembled a little test suite that consists of (supposedly) every english word there is (brought to you by google). All in all, that's about 354984 words all lowercase.

Here's what it looks like:

1
2
3
4
5
6
7
    // 354979 more lines like the following go here...

    mk_log("%u\n",C_HASH(zymurgy));
    mk_log("%u\n",C_HASH(zythem));
    mk_log("%u\n",C_HASH(zythum));
    mk_log("%u\n",C_HASH(zyzzyva));
    mk_log("%u\n",C_HASH(zyzzyvas));


Baseline:
With C_HASH defined as a constant (so no hash computing at all) I tested the compile time and size of the executable using clang on linux. This is just a quick test, btw. No scientific methods at work here.

C_HASH defined as constant:

Compile time: ~ 10.4s
Executable size: ~ 8.6MB

constexpr
Here are the values using C++14 constexpr as defined above:

Compile time: ~ 289.8s (4m 49.8s)
Executable size: ~ 53MB

Notice the super bloated executable. I assume this has to do with the fact that the compiler has to instantiate a template function per C_HASH call (the _mk_compile_hash function from above).

My own C preprocessor:
I then used about 2 hours to produce a simple c preprocessor (very similar to what Casey showed on HMH) to go through all my source files, look for the C_HASH identifier, take the ID name and produce a header file that looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// NOTE(pp): Auto generated. Do not edit manually!
#pragma once
#include <stdint.h>     // uintN_t, INTN_MAX

#define C_HASH(name) (_PP_UID_ ## name)

enum _PP_UID_ENUM : uint32_t
{
    // about 355000 lines omitted

    _PP_UID_zymurgy = 44216644,
    _PP_UID_zythem = 1492189414,
    _PP_UID_zythum = 2026410198,
    _PP_UID_zyzzyva = 704117786,
    _PP_UID_zyzzyvas = 2053265739,
};


As you can see C_HASH gets defined as an enum value. This generates optimal code (it just inserts the constant value) but it puts some pressure on the compilers preprocessor which we see in the slightly increased compile time compared to the baseline. Also: my preprocessor does collision checking for all the used string IDs in my codebase. Something I wouldn't have dared to try implement in C++14 constexpr's (would that even be possible?).

Compile time: ~ 16.9s
Executable size: ~ 19MB

In that compile time included are the ~0.5s my preprocessor takes to process the whole codebase (362 KLOC). It also had to process 13 fewer hashes than the constexpr version because those produced collisions and we don't want none of that.

Now the results aren't as dramatic if you take for example about 3000 string IDs (a more realistic number) rather than 350000. Some quick testing gave me these compile times:

Baseline: ~ 1.1s
My prepro: ~ 1.3s
constexpr: ~ 1.5s

But keep in mind that my preprocessor does collision checking for all produced hashes. It will also be super easy to add additional functionality (like generating code to look up the original names for each hash).

Still, I was quite shocked at how badly the constexpr stuff scales. I always thought people who wrote compilers would be able to make stuff go _fast_. The most shocking thing to me is, that the standard would say something as stupid as 'constexpr functions might be executed at runtime'. I guess in the programs these people write nobody is gonna notice if they take a second more or less at startup time...
Was your 289.8s test compiled with or without optimizations?
My understanding is that compiler is running in kind of interpreter mode over AST tree to evaluate constexpr. So that's why it is not much better than doing recursive template instantiations to calculate hash. Same thousands of function calls.

How about using template specializations on string literal to directly evaluate hash function? Something like this, which would actually work on pre-C++11: https://blog.molecular-matters.com/2011/06/24/hashed-strings/ A bit ugly code, but just one expression to evaluate for compiler in each case. Could you measure time for such approach?

Edited by Mārtiņš Možeiko on
mmozeiko
Was your 289.8s test compiled with or without optimizations?


Everything was compiled without optimizations.

mmozeiko
How about using template specializations on string literal to directly evaluate hash function? Something like this, which would actually work on pre-C++11: https://blog.molecular-matters.com/2011/06/24/hashed-strings/ A bit ugly code, but just one expression to evaluate for compiler in each case. Could you measure time for such approach?


My understanding from reading your link is, that the way it's done there would not allow for the values to be used as cases in a switch statement without making it a constexpr again. At that point, as far as I can tell, you have basically the tail recursive C++11 version (vs. the iterative C++14 version I used) of the hash function that I was to lazy to implement. I suppose I should test it though, even when I doubt that it is going to be superior.
Yeah that approach won't work with switch statements. But how often do you need asset-name constant in switch expression? In your example, you are passing it to some get function which probably uses it as index into array. At least its not recursive, so it should be faster.
mmozeiko
At least its not recursive, so it should be faster.


I see, you're right. I assumed it was recursive. Sorry, I'm not very good at reading "advanced" C++. Yeah, then it would indeed be interesting to see the result. But only really out of curiosity as I don't see the benefit of trying to make it work in C++ compared to a pre build step.

mmozeiko
But how often do you need asset-name constant in switch expression?


Well, with global hash collision detection and reverse lookup there is nothing stopping me from using the hashes basically as GUIDs (as is practice at for example Naughty Dog) for all sorts of stuff.

Btw, I just timed the tail recursive C++11 constexpr version of fnv1a32 hashing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// NOTE(mk): C++11 version
static constexpr u32
_mk_cpp11_hash_rec(char head, const char* tail, u32 hash)
{
    return head ?   _mk_cpp11_hash_rec(tail[0],
                                       tail + 1,
                                       (hash ^ (u32)head) * MK_FNV32_PRIME)
                :   hash;
}

static constexpr u32
_mk_cpp11_hash(const char* string)
{
    return _mk_cpp11_hash_rec(string[0], string + 1, MK_FNV32_OFFSET_BASIS);
}

template <u32 hash>
static inline constexpr u32 __attribute__((always_inline))
_mk_compile_hash()
{
    return hash;
}

#define C_HASH(string) _mk_compile_hash<_mk_cpp11_hash( # string )>()


(I again double checked to see if I need the _mk_compile_hash and the compiler wouldn't actually evaluate it without the extra wrapper.)

It performed only slightly worse than the C++14 version which could very well have been the slightly altered conditions on my PC compared to yesterday. At least me learning some Haskell was useful now ;-)

Compile time: ~ 299.7s (4m 59.7s)
Executable size: ~ 53MB
Here is a C preprocessor based compile time hash from lolengine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#include <string.h>
#include <stdint.h>
#include <stdio.h>

#define H1(s,i,x)   (x*65599u+(uint8_t)s[(i)<strlen(s)?strlen(s)-1-(i):strlen(s)])
#define H4(s,i,x)   H1(s,i,H1(s,i+1,H1(s,i+2,H1(s,i+3,x))))
#define H16(s,i,x)  H4(s,i,H4(s,i+4,H4(s,i+8,H4(s,i+12,x))))
#define H64(s,i,x)  H16(s,i,H16(s,i+16,H16(s,i+32,H16(s,i+48,x))))
#define H256(s,i,x) H64(s,i,H64(s,i+64,H64(s,i+128,H64(s,i+192,x))))

#define HASH(s)    ((uint32_t)(H256(s,0,0)^(H256(s,0,0)>>16)))

// Usage:
int main(void)
{
    printf("%08x\n", HASH("funny_bone"));
    printf("%08x\n", HASH("incredibly_large_string_that_gcc_groks_easily"));
}

Compiled code:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  ...
  movl    $-238617217, %esi
  movl    $.LC0, %edi
  xorl    %eax, %eax
  call    printf
  movl    $-453669173, %esi
  movl    $.LC0, %edi
  xorl    %eax, %eax
  call    printf
  ...
graeme
Here is a C preprocessor based compile time hash from lolengine.


So using clang++ 3.8.1 on linux I had to add the build parameter -fbracket-depth=512 to get to compile.

Then I thought to time it compiling all 350000 hashes. After about 10 minutes my PC froze.
Turns out to compile just 63 hashes (63, not 63000 or something) it takes clang 13.8s on my machine.

I also had to compile with optimizations to make it actually generate constant hashes (which took 18.68s for 63 hashes).

All in all I wouldn't say I'm very fond of this method...


Edit:
gcc 6.1.1 doesn't do any better on my machine. Also doesn't generate constant hashes with -O0 and takes even longer to compile.

Edited by mathk on
Interesting! I wonder if that persons code base has such long compile times that taking so long is not worth remarking upon. I guess it goes to show there's no substitute for keeping your metaprogramming out of the way of c++ compilers
Do you have the hashes in a header file, so that they get instanciated multiple times?
Marthog
Do you have the hashes in a header file, so that they get instanciated multiple times?


No.
Is the result of strlen(s) memoized, or does it get called 1024 times?
It's possible to do compile-time string hashing in all the major compilers without using either:
c++11 and constexpr
a preprocessor
a CHASH() function - you can have the hash implicitly happen.

I discovered this and implemented it when I worked at Valve. It's used extensively for many different purposes, and you can see applications of it in some of valve programmer's GDC presentations.

all of the following patterns work with it and generate code that loads simple integer constants with the constant strings not even stored in the binary. The hash is even case-insensitive if desired.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int a = properties.GetAttr( "hitPoints" );

void f( stringhash_t x )
{
   if ( x == "option1" )
   {
   }
   else if ( x == "option2" )
   ...
}

What i do is take advantage of the compiler's robust detection of expressions which can be evaluated at compile time:

- a simple perl script generates an inline constructor function for each possible string length. The need for N functions is because (at least in visual c++),the compiler's detection of constant valued expressions does not extend to for() loops, even when these loops are a known number of iterations and only do computations involving values known at compile time :-(.

so, stringhash_t has a bunch of constructors that look like this. For the sake of clarity in this example, I use a hash function which is just the sum of the characters. The actual hash function can be any set of mathematical operations on the characters, including CRC or other hash functions of arbitrary complexity.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
INLINE stringhash_t( const char ( &str )[1] )
{
	m_nStringHash = 0;
}

INLINE stringhash_t( const char ( &str )[2] )
{
	uint32 h = str[0];
        m_nStringHash = h;
}

INLINE stringhash_t( const char ( &str )[3] )
{
	uint32 h = str[0];
        h += str[1];
        m_nStringHash = h;
}


gotchas:
- Because c++ can't distinguish properly between char pointers and arrays, to dynamically hash from a char *, you do need to call a function.
- there's no way to change the hash back into the string without a dictionary, and so also no way to detect collisions. This can be helped somewhat by either scanning the source code for constant strings to make a dictionary, or by a debug-build option that maintains such a dictionary. A smart guy at valve used this to implement a VS debug visualizer for them.


Edited by Chris Green on
I guess this is one reason I'm increasingly liking the d programming language. I can just write this and it all just works at compile time without doing anything special.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import std.stdio;

enum MK_FNV32_OFFSET_BASIS = 0x811c9dc5;
enum MK_FNV32_PRIME = 16777619;

uint hash(string str)
{

    uint hash = MK_FNV32_OFFSET_BASIS;
    
    foreach(ch ; str)
    {
        hash = hash ^ ch;
        hash = hash * MK_FNV32_PRIME;
    }

    return hash;
}

enum test1 = hash("zymurgy");
enum test2 = hash("zythem");
void main()
{
    writeln(test1);
    writeln(test2);
}
Chris, that's a neat idea for compile time hashes. Is runtime performance an issue when running in debug? It looks like it will generate quite a bit of bloat code until you compile in -O2.
constexpr (and ctfe in D) are required to be evaluated and pruned at compile time even without optimizations active.