Post

Compile Time 'Hashing' in C++

This post is an amalgamation of a couple of previous posts that have been deleted and merged together for cleanliness and conciseness.

  • Hashing C++ types as compile time identifiers (2023-08-09)
  • Improving constexpr hashing and iteration (2023-08-31)

A few months ago, I started looking into compile time hashing in C++. The main reason for this, was wanting a way to identify components in a system at based on their types. This is a pattern that had been used in a previous role, and I found the code that we ended up with being very easy to read and understand. Consequently, I thought about how this could be done in C++, which has no standard reflection capabilities (yet). Whilst it’s not the prettiest approach, I thought hashing any types I needed to use as identifiers would allow everything that was needed for this system.

Due to the rich constexpr semantics of C++, I thought it might also be desirable to do this at compile time. The main reason for this, is that one use of these type identifiers has some impact on the flow of control of components in the system. If we could make it possible (but certainly not required) to do this at compile time, we may be able to leverage some performance/static reasoning benefits later. As this was an experiment, I was happy enough to see if it would be possible purely for the sake of it!

Hashing algorithm

For the sake of this experiment, I used the 64-bit FNV-1a hash function. This is because there are no cryptographically secure requirements for this use case and it is easy to implement.

The pseudocode implementation of this (ripped from the Wikipedia link above) is:

1
2
3
4
5
6
7
8
algorithm fnv-1a is
    hash := FNV_offset_basis

    for each byte_of_data to be hashed do
        hash := hash XOR byte_of_data
        hash := hash × FNV_prime

    return hash 

If you are very memory limited and/or don’t need a large hash range, you can use a 32-bit hash output instead.

What is hash-able?

Thus far, I have mentioned hashing types, but types are not hash-able in and of themselves. In order to make the types hashable, I have used the nameof library. This library is authored by Daniil Goncharov, who has authored another cool library in the area of static reflection: magic_enum. This will convert our type names to std::string_view at compile time. There are some considerations when using this library, which might no make it appropriate for your use case:

  • C++17 is required.
  • The compiler compatibility is limited to GCC, MSVC and Clang.
  • Compiler specific functionality (e.g. __PRETTY_FUNCTION__) is used to generate the type names.

Whilst I only want to support hashing types through my interface for now, I thought it would be a good idea to try and generalise what could work with this approach. Beware that if we support other types of data (such as strings) at an API level with this method, we could easily end up with collisions. For example, the hash value of "X" and struct X could collide.

Referring to the FNV1a implementation above, any data we could pass in to hash will need to support XOR and multiply operations. We will also want to support ranges of these types so we can have types of arbitrary length.

If we have C++20 ranges available, then we can use std::ranges::for_each for the hash function implementation. Consequently, we can accept any range we can iterate at compile time.

1
2
3
4
5
6
7
template <typename T>
concept HashableRange = requires
{
    requires std::ranges::range<T>;
    { std::ranges::range_value_t<T>{} ^ std::uint64_t{} };
    { std::ranges::range_value_t<T>{} * std::uint64_t{} };
};

If we do not have ranges available, either through the standard library or ranges library, then we can fall back to using compile time iteration as described in this article Approximating ‘constexpr for’. Regardless of what tools you have available, this article is worth a read as it is very informative and could prove useful on codebases where you don’t have access to the latest standards! If we must take this approach, will naturally change our generalisation concept. Primarily, if we don’t have C++20, we probably won’t have concepts to begin with! If we had to use std::enable_if with another compile time boolean expression, we would need to constrain ourselves to random access ranges to use the implementation the linked article. We will also have to consider the other limitation: the “trip count” of the loop cannot exceed our compiler’s maximum template instantiation depth! Using GCC, this can be adjusted with -ftemplate-depth=N.

Hash function implementation

Assuming that we can use C++20 and thus our HashableRange concept, we can now implement the hash function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
template <HashableRange T>
[[nodiscard]] constexpr static std::uint64_t Fnv1a64(T const& range) noexcept
{
    std::uint64_t constexpr offsetBasis{14695981039346656037ull};
    std::uint64_t constexpr prime{1099511628211ull};
    std::uint64_t hash{offsetBasis};

    std::ranges::for_each(range,
                        [&hash](auto const& value)
                        {
                            hash ^= value;
                            hash *= prime;
                        });

    return hash;
}

Of course, there are a few adjustments we could make here e.g. with value categories, lambda noexcept-ness etc., but this implementation gets us most of the way there.

API Level

At an API level, if we want to use identifiers within a system, it is probably a good idea to pass them around as their own type. This type will have the following requirements:

  • Constructible from a template type parameter
  • Equality comparable

So the basic implementation of our type (without good initialisation) looks like this:

1
2
3
4
5
6
7
8
9
class Id
{
    std::uint64_t _id;

    public:
        [[nodiscard]] constexpr bool operator==(Id const&) const noexcept = default;

        explicit constexpr Id(std::uint64_t const id) noexcept : _id{id} {}
};

I’ve made the constructor explicit to avoid accidental conversions, but this is not required. You could also add a getter for the underlying hash value if you need that – I’ve omitted it for the sake of code brevity.

Now we can initialise this type from a template type parameter. We will also add a quick check that we can use nameof, so we get a nice compiler error if not. Another implementation note is that I am not distinguishing const/volatile qualifiers or references types for my hashes. That is to say the hash values for T, T&, T const& etc. will all be the same.

1
2
3
4
5
6
7
template <typename T>
[[nodiscard]] constexpr static Id FromType() noexcept
    requires(nameof::is_nameof_type_supported)
{
    using U = std::remove_cvref_t<T>;
    return Id{Fnv1a64(nameof::nameof_type<U>())};
}

I chose to use the pattern of named, static creation methods just because I like them and think it can enhance readability by clearly communicating intent. It’s also worth mentioning that I added this to the class as a static member function and made the constructor private to enforce creation through this function. You could of course just template the Id constructor if you prefer!

Most compilers seem to cope with this pretty well at higher optimisation levels, with the exception of MSVC. MSVC seems to require the results of these calls to be assigned to a constexpr assignment to optimise everything whereas GCC and Clang manage just fine with auto const.

Conclusion

I’m not sure how useful this will prove to be, but it was a fun experiment to play around with!

There are definitely a few pitfalls to consider: such as collisions arising from allowing IDs to be created from other inputs and the compiler intricacies of the nameof library.

Full Code Sample

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <nameof.hpp>
#include <algorithm>
#include <concepts>
#include <cstdint>
#include <ranges>

template <typename T>
concept HashableRange = requires
{
    requires std::ranges::range<T>;
    { std::ranges::range_value_t<T>{} ^ std::uint64_t{} };
    { std::ranges::range_value_t<T>{} * std::uint64_t{} };
};

class Id
{
    std::uint64_t _id;

    public:
        [[nodiscard]] constexpr bool operator==(Id const&) const noexcept = default;

        template <typename T>
        [[nodiscard]] constexpr static Id FromType() noexcept
            requires(nameof::is_nameof_type_supported)
        {
            using U = std::remove_cvref_t<T>;
            return Id{Fnv1a64(nameof::nameof_type<U>())};
        }

    private:
        explicit constexpr Id(std::uint64_t const id) noexcept : _id{id} {}

        template <HashableRange T>
        [[nodiscard]] constexpr static std::uint64_t Fnv1a64(T const& range) noexcept
        {
            std::uint64_t constexpr offsetBasis{14695981039346656037ull};
            std::uint64_t constexpr prime{1099511628211ull};
            std::uint64_t hash{offsetBasis};

            std::ranges::for_each(range,
                                [&hash](auto const& value)
                                {
                                    hash ^= value;
                                    hash *= prime;
                                });

            return hash;
        }
};

Whilst this was unit tested in a separate project, I added the following to play around in Godbolt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <iostream>

int main()
{
    auto constexpr idOne = Id::FromType<int>();
    auto constexpr idTwo = Id::FromType<float>();

    auto constexpr sameIdsEqual = idOne == idOne;
    auto constexpr differentIdsNotEqual = idOne != idTwo;

    std::cout << std::boolalpha << sameIdsEqual << "\n";
    std::cout << std::boolalpha << differentIdsNotEqual << "\n";

    return 0;
}
This post is licensed under CC BY 4.0 by the author.