Worked on optimizing my 2D renderer (which renders polygon outlines end-to-end, based on ravg.pdf paper) more. Noticed that benchmarking is hard. Interestingly, while I am recording with ShareX, it takes only about half the "normal" time to generate a frame. Here it takes around 400 usec to generate the geometry and specialize it to tiles of 64x64 pixels. It takes about 80 usec to push the tiles to OpenGL. When syncing with glFinish() at the end, the OpenGL part takes around 300usec. Probably not measuring anything interesting here, since the GPU work is dominated by various latencies. Something similar could be the case for CPU part (geometry) since the times are probably subject to CPU scaling for example.
If I do both workloads 20 times per frame, both measurements increase by 10x-15x, still running comfortably at 60 FPS. More complicated geometry is definitely needed now to make any serious evaluations.