Numerical Robustness and New Approach

Thanks to the debugging system from last post, I was able to pinpoint the bugs fairly quickly. However, as I tweak the algorithm to improve its numerical robustness, it becomes obvious that the problem is not as simple as just making the algorithm robust. In order to elaborate my concern and problem, here’s a review of the main algorithm.

Recap of Faurby’s
As I have stated in my previous posts, the current collision system is based on Faurby’s paper, which is really just a simple swept sphere vs triangle mesh collision detection & response routine. Here’s a pseudo-code version:

vec3 move_sphere(vec3 sphere_p, float sphere_r, vec3 vel) {

   Triangle_Array mesh = get_scene_static_mesh();

   // NOTE(chen):contact_p is initial contact point of swept sphere against mesh
   //            t is the normalized time of impact used to modulate vel
   vec3 contact_p, float t = try_move_sphere(sphere_p, sphere_r, vel, mesh);

   vec3 legal_vel = vel * t; 
   vec3 remain_vel = vel - legal_vel; 
   vec3 new_sphere_center = sphere_p + legal_vel;

   if (magnitude(remain_vel) < 0.00001f)  {
      return new_sphere_center;
   }

   //project remain_vel onto tangent plane of contact point
   vec3 collision_normal = normalize(new_sphere_center - contact_p);
   vec3 next_vel = remain_vel - dot(collision_normal, remain_vel) * collision_normal;

   move_sphere(new_sphere_center, sphere_r, next_vel);

}

The algorithm strives to ensure a lack of overlap between the sphere and triangles while trying its best to preserve the intended motion. The mathematics checks out, but the paper fails to address some crucial numerical robustness issues.

Numerical robustness readdressed

I use 32-bit floating point throughout my codebase, including the collision system. One important characteristic of floating point numbers is that it is fundamentally discretized due to its 32-bit limitation. As a result, there are real numbers that a floating point cannot represent exactly.

This characteristic potentially introduces small errors which can lead to terrible consequences, such as player getting stuck or falling through the ground. The nature of such errors is also unpredictable, making it all the harder to deal with. That is the numerical robustness problem. When an algorithm compensates for these small errors and prevents them from affecting the result, it is called being numerically robust. I have alluded to this in previous posts, but since I am really going to talk about it in this one, I thought it deserves a proper reintroduction.

Faurby’s problems

The algorithm is mostly consist of vector math, but since the vectors are composed of floating point numbers, it suffers from the same issues. By repeated running the code on the bad inputs captured by the debug system from the previous post, I have figured out the weaknesses of Faurby’s algorithm and made a few adjustments to strengthen its numerical robustness. As you will see, all the issues lie within the collision resolution part of the code (that makes sense since the collision detection method used is already a well-established one).

First problem: an inexact arrival point/direction

As illustrated above, one of the steps in the algorithm is to stop the sphere exactly at the initial contact point. However, this “exact” value often cannot be represented by floating point values. The position where the sphere is stopped at is either too shallow or too deep. And when it’s too deep, that means the sphere has clipped through the triangle and overlaps with the triangle mesh. This error results in player getting stuck or falling through the triangle.

In his paper, Faurby made an effort to resolve this issue. His proposed solution is to pull the sphere back along its velocity vector by a tiny bit. This way, sphere always remains hovering over the collision plane (ostensibly), which prevents any possible overlap between sphere and triangle mesh.

Unfortunately, the solution above solves the inexactness of the magnitude of sphere’s velocity but ignores that the velocity’s direction is also inexact. When sphere’s velocity is tangential to the collision plane, it forms a swept sphere parallel to the triangle’s surface:

However, due to floating point’s inexactness, the actual direction stored in computer often is not exactly parallel to the surface. Therefore, due to the error of margin, the swept sphere could potentially enclose a larger volume, possibly overlapping with the collision plane.

The red swept spheres are the same swept sphere, but offseted by the floating point errors. As you can see, the inexactness of velocity’s direction causes precision issues. It cannot be fixed by simply retracting the sphere along the velocity vector itself, because the velocity vector itself is misdirected.

Fix to the first problem

By simple observation, it is clear that when the sphere clips through the plane, only a small part of the sphere sinks through the triangle. Therefore, by pushing the sphere out along the collision normal by a tiny amount, it can be ensured that the sphere never intersects with the triangle regardless of its travel direction and travel distance. However, this solution has its own drawbacks, which will be discussed later.

Second problem: infinite recursion due to inexact velocity projection

As you can see from the pseudo-code, the algorithm recursively resolves sphere’s movement until the movement is exhausted. In each recursion, the remained velocity is projected onto the collision normal, then negated. The result of this projection is a reflecting vector, which is then used to project the remaining velocity into a new direction that is parallel to the collision plane. Again, for the same reason, the distance of the reflection might not be exact, and therefore it will fail to reflect the remaining velocity vector out of the collision plane.

The problem is not as severe as the first problem, as the collision detection will catch that and defer the collision resolution onto next recursion. However, in some cases, the work will be deferred far too many times, which results in recursion exceeding the program’s stack memory.

Fix to the second problem

Again, the solution is applying some form of bias to the calculation. In this case, the reflecting vector is increased in length by a tiny amount to ensure the reflected velocity will always not intersect with the collision plane.

Problems beyond numerical robustness

The above solutions improve the numerical robustness quite a bit. Before these fixes, the debug system will report around 500 errors for 10000 movement inputs, but now it consistently gives 0 errors for 20000 movement inputs.

But they introduce new problems. These biases that I apply make the algorithm fundamentally incorrect, and therefore errors might manifest itself as weird movements in the game. As I feared, these fixes result in jerky movements.

I think it’s time to say goodbye to this algorithm. Even if I can come up with someway to fix it, it will just be the same dirty work and loses its elegance. The reason I experimented with a collision system that works with triangle mesh is because I want the collision system to be based on a single method. Be it uneven terrain, convex or concave geometries, I want this one method that can handle all of these problems. The fact that I have to slap patches onto this broken algorithm just makes me sick.

Instead, here’s what I am going to try implement:

1. Do a top-down depth pass using the GPU, then download it into a heightmap texture (could be done just initially or per-frame, we will see).
2. Sample terrain height using entity’s coordinates and the heightmap, then ensure character will always be above that height (that solves the collision detection, but not resolution).
3. For other entities, manually composite collision primitives to fit their geometric shape and use them for collision purposes.

This method obviously has its own drawbacks, but they aren’t unsolvable. The heightmap approach makes it harder to implement multi-layered terrain, but I don’t think there will be any in Monter. Collision resolution will also likely to be a problem, but I can take the gradient of the heightmap and use that to estimate terrain slope.

I can’t really tell if that’s going to work unless I start experimenting, but at least it seems more promising than what I have currently. I will have to tear down the old work and rewrite it again, but hey, that’s just what happens when you are in the process of exploring.

Numerical Robustness and New Approach

Comments