Enable Ray Tracing: Ray Traversal

Last time I covered the design of acceleration structure (AS) for my game engine’s ray tracing system. This time, I will go into depth on how ray traversal is done with such an AS. As I go on to describe how ray traversal is done, it should become much clearer on why the AS is designed this way.

In a traditional ray tracer, all you have to do is to build an AS over some static geometry during startup, and for the rest of the program, you just keep tracing some rays through the AS. That part is well-covered in literatures such as Physically Based Rendering, but it becomes messier when we bring ray tracing to real-time games.

Rigid Body Transformation

The first obstacle is to track rigid body transformations. Building BVH over a mesh is an expensive operation, so we only do it on startup time. Transforming the mesh then would invalidate the BVH we’ve built. It is not uncommon for a mesh to be continuously transformed throughout multiple frames, and we must have a way to handle that.

The simplest solution is to transform all the vertices and refit the BVH every frame. However, this can cause problems. First of all, BVH is composed of a hierarchy of axis-aligned bounding boxes (AABB), subdividing the space that the mesh occupies as the level goes deeper. The subdivision is carefully chosen at build-time to have optimal ray traversal performance. Expanding/shrinking the AABBs to fit the transformed mesh will no longer guarantee the same traversal characteristics. Especially if the mesh is rotated, the AABBs will change drastically and lead to almost crippled BVHs in terms of performance. Another obvious thing is we need to transform all the vertices first, which of course takes time.

A much better solution is to transform the ray to object space instead of transforming mesh to world space. This will preserve the same intersection, and it removes the need to transform vertices. And of course, the biggest win here is the BVH is no longer updated; we still get the optimal subdivision that we had from the first time we built it.

This is why each BLAS in the AS stores an inverse transformation matrix. Rays will be transformed to local space using this inverse transform before the actual traversal starts.


One thing I’d like to be supported is instancing. In the game, we inevitably have multiple entities that share the same static mesh. Storing duplicate BVHs and mesh data for these entities is a waste. Therefore, the AS is pulled into two levels. Top level entities who share the same mesh will point to the same bottom level AS.

Deformable Mesh

This is the trickiest case to handle. Vertices actually change, so we need to apply skinning every frame. This also means extra storage for these deformed vertices _and_ the BVHs that are refitted specifically for these deformed vertices. Here we just have to eat the cost of refitting BVH. BVH’s quality will suffer, but refitting is quite fast using GPU compute (covered in last post).

Two-Level Ray Traversal

The traversal itself is quite simple, but the fact that AS is two-level might seem daunting at first, so I will cover that briefly.

The algorithm used to traverse a single BVH is the same as the one in Physically Based Rendering. I don’t think I can do a better job explaining it, so here’s the link.

The Top-level AS is a BVH of instances, each instance node storing pointers to bottom-level AS and storage for the inverse transform. You can also say that it’s a BVH of BVHs. Applying the typical BVH traversal to top-level AS, we eventually hit one of the leaf nodes (instance nodes in this case). Now we stop traversal, remember this location in the top-level AS, then context switch to the bottom-level AS that it’s pointing to. This is also where we apply the inverse transform to our rays (though we should still preserve rays in global space for traversal of other instances).

After we context switched to the bottom-level AS, loading up the BVH address and transforming our ray, we again apply the same BVH traversal algorithm. Except this time, the leaf nodes are triangle nodes. A hit with the leaf node will become a candidate to ray intersection.

After the bottom-level traversal, we pop back to top-level. We continue this loop until we are done with all top-level nodes.

Now we are done, that’s all there is to it. The real work here is maintaining the two-level AS. The actual ray traversal is light work in comparison.
You seem to have converged on almost exactly the same structure as Vulkan:
The design is very much inspired by DXR, which seems to be same as Vulkan's!