@nickav: I would not rely on "Batch, Batch, Batch" presentation/talk much. It is close to two decade old, and GPU drivers those days were quite different.
Here's a bit more modern talk: https://developer.nvidia.com/cont...adically-reduce-driver-overhead-0
(still over 5y old, so...)
And obviously AZDO:
@Bronxolon: what you say generally is true for Vulkan and D3D12 API's. There the driver overhead is minimal and most of the time it just pushes commands to buffer. On the other hand - GL/D3D11 style API's have much larger overhead due to how API is abstracted from hardware. Every time GL driver encounters draw command, it needs to figure out how to properly build command stream to GPU. There is a lot of state to take into account. Not only vertex attribute pointers, but blend state, rasterizer states, used shader (maybe it needs to be recompiled), used uniforms, used atrributes, etc... There is a ton of validation of work happening before it gets to actual command buffer. That is why modern GPU drivers have separate thread for doing these things. Raw GL API most of the time just talks only to this thread, where all the submission work happens.