Several years ago I wrote a ray-casting dungeoun crawler completely from scratch in Odin, that does everything including rendering (softare texturing with bi-linear filtering, mip-mapping and sampling) manually on the CPU (a VERY old Ryzen 1.2) on a single thread with streightforward code (no SIMD).
I recently brought it back to life with more recent version of Odin, and it runs about the same as it did, at a little over 30fps at less than 640x480.
More recently I decided to try to back-port it to C++ using my SlimEngine base project, with an aim to later put the same rendering code on CUDA like I did with SlimTracin.
After some simplification and optimization, at the same resolution and on the same CPU, still single threaded and unvectorized, it now runs at over 200fps (and over 300fps on my work laptop).
I then got it working also on CUDA (RTX 4070) and on the same computer, and there it now runs over 600fp (and at 5K it runs over 60fps). Same code can be toggled between CPU or GPU at runtime (just like SlimTracin XPU).
Still doing the actual 2D ray-casting and generating vertical and horizontal rendering data on the CPU and that is resolution dependant.
The GPU (if/when used) is mostly just doing the software texture sampling and shading, so there's still room to improve, but pretty happy with the result so far 🙂