I am making a old-scool rpg while learning software rendering. I have no issues with performance. I get over 100fps on only a single core, however, the game still is laggy. I assume the reason is that the game refresh rate does not sync with the monitor 60hz refresh rate.
Is there any way to do this with win32 and GDI? I have 0 experience with OpenGL/DirectX.
You can try to use DwmFlush after displaying your image to sort of "wait for vsync". I never used that so I don't know if it will solve your problem. Note that I believe this function will not wait on Windows 7 if the compositor is off.
Not V-syncing usually make games feel less lagging by responding faster with partial frames, so you would only see tearing. Checking these things can quickly show if there's something else going on:
* Peak delays affecting maximum frame time more than average. (Can record the maximum frame time over each second to detect this) MS-Windows can cause major lagging where the whole system freeze for seconds if heap memory is fragmented instead of reused. Windows 10 might also borrow all your CPU cores to run their cloud server applications or scan all your files for applications to delete.
* Due to having less fixed overhead than GPU rendering, having some frames going much faster than others can cause a jerky motion where tearing becomes more visible during sudden jumps. Showing a frame for almost no time is like not having one to begin with. (Can run a scheduler with non-urgent AI tasks until a minimum frame delay is reached for smoother animation)
* Drawing other things in the background. (MS-Windows usually have a bug/feature where border-less full-screen is slower than exclusive mode despite being equivalent)
You might also want to do rendering on a thread separate from canvas upload using double buffering (render to one frame while uploading the other) and get around 120 FPS with a reusable optimization. This makes the canvas upload essentially free (so that there's no point in using Direct3D), because most applications cannot utilize all cores from the beginning of a frame where it's mostly serial dependencies of waiting for instruction cache.