I'm working on audio for a game, and I'm trying to get frequency (and other parameters) interpolation for playing notes in a simple "tracker". I have several interpolation types, and one of them is to draw a curve in the editor that will drive the interpolation.
A way to do it is just to say that we pass the "time value" (normalized between 0 and 1) to the bezier curve, and use the resulting value as the output.
/* p0 is the start value; p1 is the end value; cp0 is the first control point; cp1 is the second control point; All are single f32. */ f32 frequency_lerp_factor = bezier_cubic(p0, cp0, cp1, p1, t);
I can graph the curve by using t for x, and frequency_lerp_factor for y. This is (what I call) 1D. I can also graph the points and control points, but it gets confusing because their value would only be the y component, there isn't really a x here (except that p0 is at t = 0 and p1 is at t = 1. The control points, if I graph them, can be at any x in the graph, it doesn't matter where because their x value doesn't affect the curve.
Am I correct about it so far ?
But I'd like more control over the curve, so I want to "draw" a 2d curve using tangents (with some constraints to make sure the curve doesn't go backward). I've used 2d curves with tangents in the past while working on 3D/2D animations, so that was what I wanted when I started to work on the curve for interpolation.
In this case, the curve is controlled by 2D points, that I can graph normally, and I can get the curve by using bezier_cubic on x and y. But now, I think that to get the frequency lerp factor, I need to "sample" the resulting curve.
If I want to get the lerp factor (y
) at x
on the graph, I need to figure out which t
will give me x
, and use that t
in bezier_cubic
to get the y
.
/* I don't know how to do bezier_cubic_inverse. */ f32 t = bezier_cubic_inverse( p0.x, cp0.x, cp1.x, p1.x, x ); f32 frequency_lerp_factor = bezier_cubic( p0.y, cp0.y, cp1.y, p1.y, t );
I tried to do the math and got to a point where it looks like I need to solve a cubic polynomial, which I looked a bit how to do, but it doesn't look like there is an easy way to do that on a computer.
This might be wrong. Only using x components of the points. t³(-p0+3cp0-3cp1+p1) + t²(3p0-6cp0+3cp1) + t(-3p0+3cp0) + p0 = x Since p0, p1, cp0, cp1, x are known, we could compute the inside of the parenthesis and it becomes: t³a + t²b + tc + p0 = x And we need to solve for t.
If all that is correct, is solving the equation the way to do this or is there another way ? Or have I misunderstood something ?
On Discord, people suggested to use the Newton–Raphson method to estimate the value we are searching. They also pointed me to A Primer on Bézier Curves which contains a lot of information about Bézier curves.
So I was correct that we need to solve the cubic equation to get what we need. I wanted to write a small summary in case it's useful to somebody.
To re-state the problem: we use a cubic Bézier interpolation with an interpolation factor t
, a start point p0
, a end point p1
, a first control point cp0
(that with p0
creates a tangent at the start point), a second control point cp1
(that with p1
creates a tangent at the end point), and we interpolate on both x and y to "get" the curve. The equation for the curve is: f32 result = (u*u*u * p0) + (3*u*u*t * cp0) + (3*u*t*t * cp1) + (t*t*t * p1);
Where u = (1-t)
.
There are constraints on this curve, since we want to use it for interpolation, we don't want the curve to "go back" on x, in other words: x(n+1) > x(n)
. To make sure of that we have: p0x < p1x
and p0x <= cp0x <= p1x
and p0x <= cp1x <= p1x
.
vec2 point = { 0 }; point.x = bezier_cubic( p0.x, cp0.x, cp1.x, p1.x, t); point.y = bezier_cubic( p0.y, cp0.y, cp1.y, p1.y, t); /* Point is a point on the curve at interpolation factor t. */
This allows us to draw a curve, but how can we use it to interpolate something. If we consider only the graph, we consider the x axis as being the interpolation factor, and y being the result of the interpolation. So we want to say result = cubic(..., x );
but we don't know x, as it is the result of the interpolation (point.x above).
So the question is what interpolation factor t
gives us a x
value on the graph that correspond to the interpolation factor we want from the curve.
In a math sense (which I'm not good at), we can see it as:
x = (u*u*u * p0) + (3*u*u*t * cp0) + (3*u*t*t * cp1) + (t*t*t * p1)
where we know x
, p0
, p1
, cp0
and cp1
, and we search t
, so we want an equation that looks like t = ...
. But this is a cubic equation so there is no simple way to do that. Cubic equation can have several solutions, including solutions with complex numbers. But with the constraints we have, we know that there will be only a single value of t
that gives us the value of x
we search. We want a single real (as in real number) solution.
We can work out the equation to get it in the form of standard cubic equations at³+bt²+ct+d=0
:
x = (u*u*u * p0) + (3*u*u*t * cp0) + (3*u*t*t * cp1) + (t*t*t * p1) x = u³p0 + 3u²tcp0 + 3ut²cp1 + t³p1 /* replace u by 1-t */ x = (1-t)³p0 + 3(1-t)²tcp0 + 3(1-t)t²cp1 + t³p1 /* (a - b)³ = a³ - 3a²b + 3ab² - b³ */ /* (1-t)³ = 1 - 3t + 3t² - t³*/ /* (1-t)² = 1 - 2t + t² */ x = (1-3t+3t²-t³)p0 + 3(1-2t+t²)tcp0 + 3(1-t)t²cp1 + t³p1 /* Distributions */ x = (1-3t+3t²-t³)p0 + (3-6t+3t²)tcp0 + (3-3t)t²cp1 + t³p1 x = (1-3t+3t²-t³)p0 + (3t-6t²+3t³)cp0 + (3t²-3t³)cp1 + t³p1 x = (p0-3p0t+3p0t²-p0t³) + (3cp0t-6cp0t²+3cp0t³) + (3cp1t²-3cp1t³) + t³p1 x = p0-3p0t+3p0t²-p0t³ + 3cp0t-6cp0t²+3cp0t³ + 3cp1t²-3cp1t³ + p1t³ /* Group by power */ x = -p0t³ + 3cp0t³ - 3cp1t³ + p1t³ + 3p0t² - 6cp0t² + 3cp1t² - 3p0t + 3cp0t + p0 /* I don't know what is the term in English, but just put the common term in front. */ x = t³(-p0 + 3cp0 - 3cp1 + p1) + t²(3p0 - 6cp0 + 3cp1) + t(-3p0 + 3cp0) + p0 0 = t³(-p0 + 3cp0 - 3cp1 + p1) + t²(3p0 - 6cp0 + 3cp1) + t(-3p0 + 3cp0) + p0 - x /* We can replace things that are constant */ a = -p0 + 3cp0 - 3cp1 + p1 b = 3p0 - 6cp0 + 3cp1 c = -3p0 + 3cp0 d = p0 - x 0 = t³a + t²b + tc + d
We can see that there is no easy way to get t=...
.
If we graph that equation, we have the t
value on the x axis, and the x
value (point.x from above) on the y axis. https://www.desmos.com/calculator/3ruowk5mvb
Since we know there is only one t
value that gives us a given x
value, that means that the function described by the equation, should cross the x axis at only one place (the root of the function), and that value is what we search: t
. On the graph if you change the slider for p (the name are all weird, because I don't think I can name things better in desmos), you change which value we are searching: if it's set at zero, we search which t
gives us 0; if it's set to 0.3 we search which t
gives us 0.3 and the answer is always "where the graph crosses the x axis".
That still doesn't help us find the solution in code. The Bézier Primer linked above has some function that can compute all the roots of cubic, but I can't say I understand everything and it seems relatively expensive (I haven't measure it). That's when people on Discord suggested to use the Newton–Raphson method.
The method uses the derivative of the function to iteratively approach the value searched. It's a method that converges fast toward the solution, but isn't guaranteed to converge.
One of the interpretation of the derivative of a function at a point, is that it's the slope of the tangent at that point. The algorithm uses the tangent as an approximation of the function at a point.
We start with an initial guess, a coordinate on the x axis, which in this is the interpolation factor t
, which in my test I always used the x value we a searching for. We compute the derivative at that point to get the tangent and compute where the it intersects the x axis. That is our new guess (new t
) that we hope is better than the previous one. We use that value in the equation to compute a y
value. If that y
value is close enough to 0, than we accept t
. Otherwise we do more iterations until we get a result or determine that the function doesn't converge.
/* function */ 0 = t³a + t²b + tc + d /* Derivative */ 0 = 3t²a + 2tb + c /* The slope of the tangent equals */ s = 3t²a + 2tb + c /* The slope of a line defined by 2 points is */ s = (y1 - y0) / (x1-x0) /* x0 is the initial guess. y0 is the value of the function if we pass it x0 We search x1. We are searching for a point that crosses the x axis, so y1 = 0 */ s = -y0 / (x1-x0) s * (x1-x0) = -y0 x1-x0 = -y0 / s x1 = -y0/s + x0 x1 = x0 - (y0/s) /* Which means */ x1 = x0 - (function(x0)/derivative(x0))
In the following graph, you can view the tangent (in purple) line at a guess value (the q
slider). You can still set the p
value for the point we are searching for.
Below the q
slider is the x value at which the tangent crosses the x axis. You can use this value in for q
to have the next iteration and repeat to see the function converging.
Below that is the distance of the current guess to the x axis (if you pass the current guess q
through the function, we get that value (distance on the y axis) and we want that value to be as close to zero as possible). This is also visualized as a blue line.
https://www.desmos.com/calculator/8bofj334lp
Some notes:
t
value can go outside the 0 to 1 range at first, and then converge.Here is some code that implement the Newton-Raphson method. Note that this is just an example, it doesn't try to properly handle the case where it doesn't converge.
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <math.h> typedef float f32; typedef uintptr_t umm; typedef int32_t s32; static f32 interpolation_linear( f32 start, f32 end, f32 t ) { f32 result = start * ( 1 - t ) + end * t; return result; } static f32 interpolation_cubic( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 t ) { f32 u = 1 - t; f32 result = ( u * u * u * start ) + ( 3 * u * u * t * cp_1 ) + ( 3 * u * t * t * cp_2 ) + ( t * t * t * end ); return result; } static f32 interpolation_cubic_t_from_x_NR( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 epsilon, s32* iteration_count ) { ( *iteration_count ) = 0; f32 a = ( -start + 3 * cp_1 - 3 * cp_2 + end ); f32 b = ( 3 * start - 6 * cp_1 + 3 * cp_2 ); f32 c = ( -3 * start + 3 * cp_1 ); f32 d = ( start - x ); f32 t = x; for ( umm i = 0; i < 8; i++ ) { f32 f = a * t * t * t + b * t * t + c * t + d; if ( fabsf( f ) < epsilon ) { break; } f32 derivative = 3 * a * t * t + 2 * b * t + c; if ( derivative == 0 ) { /* The function did not converve, you need to use bisection instead. */ break; } t = t - ( f / derivative ); ( *iteration_count )++; } return t; } static f32 interpolation_cubic_t_from_x_bisection( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 epsilon, s32* iteration_count ) { ( *iteration_count ) = 0; f32 l = 0; f32 r = 1; f32 t = x; while ( l < r ) { f32 x2 = interpolation_cubic( start, cp_1, cp_2, end, t ); if ( fabs( x - x2 ) < epsilon ) { break; } if ( x2 > x ) { r = t; } else { l = t; } t = interpolation_linear( l, r, 0.5f ); ( *iteration_count )++; } return t; } int main( int argc, char** argv ) { f32 start = 0; f32 cp_1 = 1; f32 cp_2 = 0; f32 end = 1; f32 x = 0; for ( umm i = 0; i < 11; i++ ) { s32 it_1 = 0, it_2 = 0; f32 t1 = interpolation_cubic_t_from_x_NR( start, cp_1, cp_2, end, x, 0.001f, &it_1 ); f32 t2 = interpolation_cubic_t_from_x_bisection( start, cp_1, cp_2, end, x, 0.001f, &it_2 ); f32 x1 = interpolation_cubic( start, cp_1, cp_2, end, t1 ); f32 x2 = interpolation_cubic( start, cp_1, cp_2, end, t2 ); printf( "NR: x: %f -> t: %f -> x: %f | iteration: %d\n", x, t1, x1, it_1 ); printf( "BI: x: %f -> t: %f -> x: %f | iteration: %d\n", x, t2, x2, it_2 ); x += 0.1f; } return 0; }
I wanted to do some performance tests, to see how much faster or slower some methods were. I implemented Cardano's algorithm using the French Wikipedia page as it was most straight forward than the English page (and I speak French). I'm not sure I implemented right because sometime it doesn't produce correct value, but I can't say I understand how it works, I just wanted some idea of how much times it would take.
In the course of doing all that, I also checked the results to see when Newton Raphson and Cardano didn't work. I also did a different implementation of the bisection method which also failed under some conditions. Here are some of the things I got from those test.
The tests were to use the different methods with different parameters:
This started as a performance test but in the end it just gives a broad idea of what's going on, I wouldn't count the numbers as really representing the performance. I didn't try to do any optimization, I use very straight forward version of the methods. I measured the performance by taking the difference between calls of __rdtsc(). The code was compiles with MSVC (19.43.34809 x64) with "/O2". I tested 6 functions: 2 bisection methods, Newton Raphson using floats and doubles, and Cardano using floats and doubles.
The two bisection methods are:
ax³+bx²+cx+d=0
(as describe in the previous post) (method B).For the performances, in general Newton Raphson is faster, then Cardano, then bisection. Newton Raphson and bisection takes more time with more precision requested, but Newton Raphson stay's faster with a precision of 0.000001
(in my use case I think a precision of 0.001
is enough, but I wanted to test it). Bisection at low precision is faster then Cardano. In all my tests I had weird "peaks" were for no apparent reason a method would take a lot of time, for example an average of 615 cycles but some peaks at 23244629 (about 0.001% of the time) for bisection.
This is just to get an idea (as I think my test isn't good) and keep in mind that only the bisection methods got the correct values on every calls:
Precision: 0.00001. Bisection method A cycles: 64065365548 | average: 615 | min: 21 | max: 23244629 iteration min: 1 | iteration max: 20 | iteration average: 16 Bisection method B cycles: 52389432600 | average: 503 | min: 21 | max: 1433929 iteration min: 1 | iteration max: 20 | iteration average: 16 Newton Raphson f32 cycles: 14889205012 | average: 143 | min: 18 | max: 4251809 iteration min: 1 | iteration max: 61 | iteration average: 5 Newton Raphson f64 cycles: 16517484053 | average: 158 | min: 24 | max: 1164929 iteration min: 1 | iteration max: 61 | iteration average: 5 Cardano f32 cycles: 287757556 | average: 279 | min: 122 | max: 326471 Cardano f64 cycles: 400542181 | average: 388 | min: 148 | max: 606724
What ends up more interesting, is why the functions failed. I won't talk about the Cardano's functions as, as I said, I don't understand them, and the only improvement I was able to do is to use the f64 version to solve some cases.
The bisection method A never failed. As we are just computing the result of the Bézier curve for x until we find a value close enough to what we want, it can't fail (unless we ask for a precision too high).
The bisection method B failed because when we compute a value that as the correct precision for ax³+bx²+cx+d=0
, when we use that result in the Bézier equation x=u³p0 + 3u²tcp0 + 3ut²cp1 + t³p1
(with u=1-t
), we can end up with a value slightly outside the desired precision. A simple way so solve that was to ask for a slightly higher precision in the first equation. This work up to some precision as we run at some point in more precision than floating point number offer. This method is also slightly faster than method A.
In the previous post I was wondering if due to the constraints of our curve, Newton Raphson was guaranteed to converge unless the derivative was 0. The answer is no. I've seen 3 types of case where NR doesn't work.
The first one is what we knew, when the derivative is zero. This can happen often, for example when start = 0 and cp1 = 0 and the initial guess is 0, or end = 1 and cp2 = 1 and the initial guess is 1. Those are not (at all) the only cases and I don't know if it is a common setup.
The second one is when the method oscillate between two values. You do the computation with one value, it gives you a second value. When you do the computation with the second value, you're back to the first one. The more precision you want, the more likely you're going to run into that problem. This is due to the floating point precision, not an issue with the algorithm.
The third one is when the function converges, but to a root that is outside the [0;1] range. The parameter we use for our curve guarantee that there is only one root in [0;1], but there might be other root outside that range. See this graph for example: https://www.desmos.com/calculator/se3bomd3el there are roots at around -0.6 and 50. If you use an initial guess that is near where the tangent is horizontal (0.00075 in the graph linked), you can end up converging toward a root outside the desired range.
While looking at the first and third issue, it became clear that choosing a good initial guess can help the function converge, and converge to the correct value. And one observation is that since I want to use those curve to interpolate, I'll most likely try to search for value close to the last one found, so a good guess might be to use the previous result. A side note is not to forget to reset it when we reach the end of the curve and want to start at the beginning (at the end the result is 1 since the end point is at 1, and 1 is a bad guess for the first value of the curve. I failed to do that at some point which can lead to the method not converging).
There is also a similar issue, where the function converges to the correct root, but as with bisection method B, this value when plugged into the Bézier equation can produce a value that is slightly off the requested precision.
Here is the test code. Please don't use it, it's bad, there are left overs from different tests...
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <float.h> #define _USE_MATH_DEFINES #include <math.h> typedef float f32; typedef double f64; typedef uintptr_t umm; typedef uint8_t u8; typedef uint32_t u32; typedef uint64_t u64; typedef int32_t s32; #define cast( type, expr ) ( type ) ( expr ) #define array_count( array ) ( ( umm ) ( sizeof( array ) / sizeof( ( array )[ 0 ] ) ) ) #define _assert( expr ) if ( !( expr ) ) { __debugbreak( ); } static f32 interpolation_linear( f32 start, f32 end, f32 t ) { f32 result = start * ( 1 - t ) + end * t; return result; } static f32 interpolation_cubic( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 t ) { f32 u = 1 - t; f32 result = ( u * u * u * start ) + ( 3 * u * u * t * cp_1 ) + ( 3 * u * t * t * cp_2 ) + ( t * t * t * end ); return result; } typedef f32 cubic_func_t( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ); static f32 interpolation_cubic_t_from_x_bisection( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { ( *iteration_count ) = 0; f32 l = 0; f32 r = 1; f32 t = guess; while ( l < r ) { ( *iteration_count )++; f32 x2 = interpolation_cubic( start, cp_1, cp_2, end, t ); if ( fabsf( x - x2 ) < epsilon ) { break; } if ( x2 > x ) { r = t; } else { l = t; } t = interpolation_linear( l, r, 0.5f ); } return t; } static f32 interpolation_cubic_t_from_x_bisection_2( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { ( *iteration_count ) = 0; f32 a = ( -start + 3 * cp_1 - 3 * cp_2 + end ); f32 b = ( 3 * start - 6 * cp_1 + 3 * cp_2 ); f32 c = ( -3 * start + 3 * cp_1 ); f32 d = ( start - x ); f32 l = 0; f32 r = 1; f32 t = guess; while ( l < r ) { ( *iteration_count )++; f32 f = a * t * t * t + b * t * t + c * t + d; if ( fabsf( f ) < epsilon ) { break; } if ( f - d > x ) { r = t; } else { l = t; } t = interpolation_linear( l, r, 0.5f ); } return t; } static f32 interpolation_cubic_t_from_x_NR_f32( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { /* NOTE simon (18/04/25 16:44:31): Choosing a good starting point is important. I tried to optimize by remembering the last result and using it as a starting point, since I want to use that for solving for near values. Which is ok, except when we end a curve and start another, and the we have a guess of 1 for the starting point which is as far as we can be that leads to the function not converging or taking a long time to converge. */ /* NOTE simon (18/04/25 16:46:54): I ran into cases where the guess was oscillating between two values and never converging. */ ( *iteration_count ) = 0; f32 a = ( -start + 3 * cp_1 - 3 * cp_2 + end ); f32 b = ( 3 * start - 6 * cp_1 + 3 * cp_2 ); f32 c = ( -3 * start + 3 * cp_1 ); f32 d = ( start - x ); f32 previous[ 2 ] = { FLT_MAX, FLT_MAX }; f32 t = guess; if ( c == 0 ) { if ( t == 0 ) { t += epsilon; } else if ( t == 1 ) { t -= epsilon; } } for ( umm i = 0; i < 100; i++ ) { ( *iteration_count )++; f32 f = a * t * t * t + b * t * t + c * t + d; if ( fabsf( f ) < epsilon ) { break; } f32 derivative = 3 * a * t * t + 2 * b * t + c; if ( derivative == 0 ) { /* The function can't converve, use bisection instead. */ t = -1; break; } t = t - ( f / derivative ); if ( t == previous[ 0 ] || t == previous[ 1 ] ) { t = -2; break; } previous[ 0 ] = previous[ 1 ]; previous[ 1 ] = t; } return t; } static f32 interpolation_cubic_t_from_x_NR_f64( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { /* NOTE simon (18/04/25 16:44:31): Choosing a good starting point is important. I tried to optimize by remembering the last result and using it as a starting point, since I want to use that for solving for near values. Which is ok, except when we end a curve and start another, and the we have a guess of 1 for the starting point which is as far as we can be that leads to the function not converging or taking a long time to converge. */ /* NOTE simon (18/04/25 16:46:54): I ran into cases where the guess was oscillating between two values and never converging. */ ( *iteration_count ) = 0; f64 a = ( -start + 3.0 * cp_1 - 3.0 * cp_2 + end ); f64 b = ( 3.0 * start - 6.0 * cp_1 + 3.0 * cp_2 ); f64 c = ( -3.0 * start + 3.0 * cp_1 ); f64 d = ( start - cast( f64, x ) ); f64 previous[ 2 ] = { guess, guess }; f64 t = guess; if ( t == 0 && c == 0 ) { t += epsilon; } for ( umm i = 0; i < 100; i++ ) { ( *iteration_count )++; f64 f = a * t * t * t + b * t * t + c * t + d; if ( fabs( f ) < epsilon ) { break; } f64 derivative = 3 * a * t * t + 2 * b * t + c; if ( derivative == 0 ) { /* The function can't converve, use bisection instead. */ t = -1; break; } t = t - ( f / derivative ); if ( t == previous[ 0 ] || t == previous[ 1 ] ) { t = -2; break; } previous[ 0 ] = previous[ 1 ]; previous[ 1 ] = t; } return cast( f32, t ); } static f32 interpolation_cubic_t_from_x_cardano_f32( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { ( *iteration_count ) = 1; f32 a = ( -start + 3 * cp_1 - 3 * cp_2 + end ); f32 b = ( 3 * start - 6 * cp_1 + 3 * cp_2 ); f32 c = ( -3 * start + 3 * cp_1 ); f32 d = ( start - x ); f32 result = 0; if ( a == 0 ) { if ( b == 0 ) { if ( c == 0 ) { result = 0; /* NOTE simon (22/04/25 15:58:42): No solution. */ } else { result = -d / c; } } else { f32 delta = ( b*b ) - ( 4*a*c ); if ( delta >= 0 ) { f32 result_1 = ( -b + sqrtf( delta ) ) / ( 2 * a ); f32 result_2 = ( -b - sqrtf( delta ) ) / ( 2 * a ); result = result_1; if ( result_1 < 0 || result_1 > 1 ) { result = result_2; } } else { result = 0; /* NOTE simon (22/04/25 15:59:25): No solution. */ } } } else { f32 p = ((3*a*c)-(b*b))/((3*a)*(3*a)); f32 q = ((-9*a*b*c) + (27*a*a*d) + (2*b*b*b)) / (27*a*a*a); f32 discriminant = (-4*p*p*p ) -(27*q*q); if ( discriminant < 0 ) { f32 u1 = ( -q + sqrtf( -discriminant / 27 ) ) / 2; f32 u = cbrtf( u1 ); f32 v1 = ( -q - sqrtf( -discriminant / 27 ) ) / 2; f32 v = cbrtf( v1 ); result = ( u + v ) - ( b/(3*a) ); } else if ( discriminant == 0 ) { if ( p == 0 && q == 0 ) { result = 0 - ( b/(3*a) ); } else { f32 result_1 = ( 3*q ) / p; result = result_1 - ( b/(3*b ) ); f32 result_2 = (-3*q)/(2*p); if ( result < 0 || result > 1 ) { result = result_2 - ( b/(3*b) ); } } } else { f32 result_1 = 2.0f * sqrtf( -p / 3.0f ) * cosf( (1.0f/3.0f) * acosf( ((3*q)/(2*p)) * sqrtf(3/-p)) + ((2.0f*0*( f32 ) M_PI ) / 3 ) ); f32 result_2 = 2.0f * sqrtf( -p / 3.0f ) * cosf( (1.0f/3.0f) * acosf( ((3*q)/(2*p)) * sqrtf(3/-p)) + ((2.0f*1*( f32 ) M_PI ) / 3 ) ); f32 result_3 = 2.0f * sqrtf( -p / 3.0f ) * cosf( (1.0f/3.0f) * acosf( ((3*q)/(2*p)) * sqrtf(3/-p)) + ((2.0f*2*( f32 ) M_PI ) / 3 ) ); result = result_1 - ( b/(3*b ) ); if ( result < 0 || result > 1 ) { result = result_2 - ( b/(3*b ) ); } if ( result < 0 || result > 1 ) { result = result_3 - ( b/(3*b ) ); } } } return result; } static f32 interpolation_cubic_t_from_x_cardano_f64( f32 start, f32 cp_1, f32 cp_2, f32 end, f32 x, f32 guess, f32 epsilon, u32* iteration_count ) { ( *iteration_count ) = 1; f64 a = ( -start + 3.0 * cp_1 - 3.0 * cp_2 + end ); f64 b = ( 3.0 * start - 6.0 * cp_1 + 3.0 * cp_2 ); f64 c = ( -3.0 * start + 3.0 * cp_1 ); f64 d = ( start - cast( f64, x ) ); f32 result = 0; if ( a == 0 ) { if ( b == 0 ) { if ( c == 0 ) { result = 0; /* NOTE simon (22/04/25 15:58:42): No solution. */ } else { result = cast( f32, -d / c ); } } else { f64 delta = ( b*b ) - ( 4*a*c ); if ( delta >= 0 ) { f64 result_1 = ( -b + sqrt( delta ) ) / ( 2 * a ); f64 result_2 = ( -b - sqrt( delta ) ) / ( 2 * a ); result = cast( f32, result_1 ); if ( result_1 < 0 || result_1 > 1 ) { result = cast( f32, result_2 ); } } else { result = 0; /* NOTE simon (22/04/25 15:59:25): No solution. */ } } } else { f64 p = ((3*a*c)-(b*b))/((3*a)*(3*a)); f64 q = ((-9*a*b*c) + (27*a*a*d) + (2*b*b*b)) / (27*a*a*a); f64 discriminant = (-4*p*p*p ) -(27*q*q); if ( discriminant < 0 ) { f64 u1 = ( -q + sqrt( -discriminant / 27 ) ) / 2; f64 u = cbrt( u1 ); f64 v1 = ( -q - sqrt( -discriminant / 27 ) ) / 2; f64 v = cbrt( v1 ); result = cast( f32, ( u + v ) - ( b/(3*a) ) ); } else if ( discriminant == 0 ) { if ( p == 0 && q == 0 ) { result = cast( f32, 0 - ( b/(3*a) ) ); } else { f64 result_1 = ( 3*q ) / p; result = cast( f32, result_1 - ( b/(3*b ) ) ); f64 result_2 = (-3*q)/(2*p); if ( result < 0 || result > 1 ) { result = cast( f32, result_2 - ( b/(3*b) ) ); } } } else { f64 result_1 = 2.0 * sqrt( -p / 3 ) * cos( (1.0/3) * acos( ((3*q)/(2*p)) * sqrt(3/-p)) + (( 2.0*0*M_PI ) / 3 ) ); f64 result_2 = 2.0 * sqrt( -p / 3 ) * cos( (1.0/3) * acos( ((3*q)/(2*p)) * sqrt(3/-p)) + (( 2.0*1*M_PI ) / 3 ) ); f64 result_3 = 2.0 * sqrt( -p / 3 ) * cos( (1.0/3) * acos( ((3*q)/(2*p)) * sqrt(3/-p)) + (( 2.0*2*M_PI ) / 3 ) ); result = cast( f32, result_1 - ( b/(3*b ) ) ); if ( result < 0 || result > 1 ) { result = cast( f32, result_2 - ( b/(3*b ) ) ); } if ( result < 0 || result > 1 ) { result = cast( f32, result_3 - ( b/(3*b ) ) ); } } } return result; } typedef struct stats_t { cubic_func_t* func; char* name; u8 is_iterative; u64 hits; u64 cycles_total; u64 cycle_ranges[ 101 ]; u64 cycles_min; u64 cycles_max; u64 iterations[ 100 ]; f32 error_min; f32 error_max; u64 above_epsilon; } stats_t; #define print(...) pos += snprintf( out + pos, reserved - pos, __VA_ARGS__ ) int main( int argc, char** argv ) { s32 reserved = 1000 * 1024 * 1024; char* out = malloc( reserved ); s32 pos = 0; stats_t all_stats[ ] = { { .func = interpolation_cubic_t_from_x_bisection, .name = "bisection 1", .is_iterative = 1, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, { .func = interpolation_cubic_t_from_x_bisection_2, .name = "bisection 2", .is_iterative = 1, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, { .func = interpolation_cubic_t_from_x_NR_f32, .name = "newton raphson f32", .is_iterative = 1, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, { .func = interpolation_cubic_t_from_x_NR_f64, .name = "newton raphson f64", .is_iterative = 1, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, { .func = interpolation_cubic_t_from_x_cardano_f32, .name = "cardano f32", .is_iterative = 0, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, { .func = interpolation_cubic_t_from_x_cardano_f64, .name = "cardano f64", .is_iterative = 0, .cycles_min = 0xffffffffffffffff, .error_min = FLT_MAX, }, }; u32 cp_iteration_count = 101; f32 cp_step = 0.01f; u32 guess_iteration_count = 101; f32 guess_step = 0.01f; u32 iteration_count = 101; f32 step = 0.01f; f32 epsilon = 0.00001f; f32 start = 0; f32 end = 1; for ( u32 stats_index = 0; stats_index < array_count( all_stats ); stats_index++ ) { stats_t* stats = all_stats + stats_index; u32 guess_count = guess_iteration_count; if ( !stats->is_iterative ) { guess_count = 1; } print( "# %s\n", stats->name ); for ( u32 cp_1_it = 0; cp_1_it < cp_iteration_count; cp_1_it++ ) { for ( u32 cp_2_it = 0; cp_2_it < cp_iteration_count; cp_2_it++ ) { f32 cp_1 = cp_1_it * cp_step; f32 cp_2 = cp_2_it * cp_step; for ( u32 guess_i = 0; guess_i < guess_count; guess_i++ ) { f32 initial_guess = guess_i * guess_step; for ( u32 i = 0; i < iteration_count; i++ ) { f32 x = step * i; u32 it = 0; u64 cycles = 0; u32 count = 0; f32 t = 0; u64 cycle_start = __rdtsc( ); t = stats->func( start, cp_1, cp_2, end, x, initial_guess, epsilon * 0.75f, &it ); u64 cycle_end = __rdtsc( ); cycles = cycle_end - cycle_start; f32 x_check = interpolation_cubic( start, cp_1, cp_2, end, t ); f32 error = fabsf( x - x_check ); stats->hits++; stats->cycles_total += cycles; u32 found_range = 0; for ( u32 r = 0; r < 101; r++ ) { u32 range_start = r * 100; u32 range_end = ( r + 1 ) * 100; if ( cycles >= range_start && cycles < range_end ) { stats->cycle_ranges[ r ]++; found_range = 1; break; } } if ( !found_range ) { stats->cycle_ranges[ 100 ]++; } if ( cycles < stats->cycles_min ) { stats->cycles_min = cycles; } if ( cycles > stats->cycles_max ) { stats->cycles_max = cycles; } stats->iterations[ it ]++; if ( error < stats->error_min ) { stats->error_min = error; } if ( error > stats->error_max ) { stats->error_max = error; } if ( error > epsilon ) { stats->above_epsilon++; } if ( it >= 100 ) { print( "failed: t: %f | cp_1: %f | cp_2: %f | x: %f | guess: %f | it: %d | iteration >= 100\n", t, cp_1, cp_2, x, initial_guess, it ); } else if ( error > epsilon ) { print( "failed: t: %f | cp_1: %f | cp_2: %f | x: %f | guess: %f | it: %d | ", t, cp_1, cp_2, x, initial_guess, it ); if ( t == -1 ) { print( "f' = 0\n" ); } else if ( t == -2 ) { print( "oscillate between 2 values\n" ); } else { print( "error > epsilon : %f > %f : %e > %e\n", error, epsilon, error, epsilon ); } } } } } } for ( u32 i = 0; i < 100; i++ ) { if ( !stats->cycle_ranges[ i ] ) { continue; } print( " cycles %d -> %d: %lld\n", i * 100, ( i + 1 ) * 100, stats->cycle_ranges[ i ] ); } print( " cycles > 10000: %lld\n", stats->cycle_ranges[ 100 ] ); print( " cycles: %lld | average: %lld | min: %lld | max: %lld\n", stats->cycles_total, stats->cycles_total / stats->hits, stats->cycles_min, stats->cycles_max ); u64 iteration = 0; u64 iteration_min = 0xffffffffffffffff; u64 iteration_max = 0; for ( u32 i = 0; i < 100; i++ ) { if ( stats->iterations[ i ] == 0 ) { continue; } if ( i < iteration_min ) { iteration_min = i; } if ( i > iteration_max ) { iteration_max = i; } iteration += ( stats->iterations[ i ] * i ); print( " iterations[ %d ]: %lld\n", i, stats->iterations[ i ] ); } u32 iteration_average = cast( u32, round( cast( f64, iteration ) / cast( f64, stats->hits ) ) ); print( " iteration min: %lld | iteration max: %lld | iteration average: %d\n", iteration_min, iteration_max, iteration_average ); print( " error min: %f | max: %f | above epsilon: %lld\n\n", stats->error_min, stats->error_max, stats->above_epsilon ); } FILE* file = fopen( "stats.txt", "wb" ); fwrite( out, pos, 1, file ); fclose( file ); return 0; }