So, I have been looking into batching in OpenGL, because I constantly hear that batching draw calls is a big win. I don't quite understand how this helps though. I don't understand what the actual overhead is per draw call, since ultimately the GPU is going to render the same number of vertices.
For example, assuming no state changes between Draw calls, why would I want to utilize CPU time to transform vertices and update a dynamic VBO, instead of just executing multiple Draw calls. I hear that for small meshes, the transformation of vertices CPU side is worth it for the savings, but what exactly is producing CPU overhead on Draw calls?
I also read that the reason is because the GPU renders super fast, so if the CPU can't supply data fast enough, we are completely CPU bottle-necked. I, once again, don't see how batching solves this issue, because sure we are supplying more data for the GPU to churn through, but we are also taking longer to submit it. Therefore, the only thing I can assume is that some sort of driver overhead is occurring. My current train of thought is that it fills up the Command Push Buffer faster on the user-mode application, and so has to incur context switches to maybe copy it over to the kernel-mode command buffer multiple times per frame? I honestly have no clue, even after reading multiple online resources.
Thanks. I would really love it if someone could help elucidate this for me!