glBufferData causing time spikes

I'm working on an OpenGL application and noticed a huge frame time (~160ms) that always happens at frame index 745.
After doing simple profiling, I know that it happens on a precise call of glBufferData. The application is doing the exact same thing during all the frames: rendering 2 rectangles, rendering 4 strings, and outputting sound. The time spike happens on the glBufferData call that sends vertex and uvs data for the 4th string (all strings go through the same code, but I store the profiler timing values for each one separated).

Does anyone has any idea of what could be causing this ?

Here is an overview of how the code works:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
while ( running ) {
    initCommandBuffer( );
    handleInput( );
    simulation( );
    outputAudio( );
    renderCommandBuffer( );
    swapBuffer( );
}

void initCommandBuffer( ){
    glDeleteBuffers( ); // with buffers used in the previous frame.
    // Set the number of commands and commands data to 0.
}

void renderCommandBuffer( ){
    for ( commands ) {
        glGenBuffers( );
        glBindBuffer( vertex );
        shaderSetup( );
        glBufferData( vertex, GL_STATIC_DRAW );
        glBindBuffer( index );
        glBufferData( index, GL_STATIC_DRAW );
        glEnable( GL_PRIMITIVE_RESTART );
        glDrawElements( GL_TRIANGLE_STRIP );
    }
}
I tried to use GL_DYNAMIC_DRAW, but since I'm deleting and recreating the buffers every frame, I don't think it matters (and it didn't change anything). Is it a problem to delete and recreate buffers every frames ?

Here are the profiling values. All the values bellow are from an unoptimized build launched in visual studio 2012 to clearly show the problem. If I run outside of visual studio or build with optimizations, the spike is still there but smaller (missing 1 frame instead of 10) and always on the same frame. If I remove the rendering of the 4th string, the spike happens at a latter frame.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// fi = frame index
// cyc = total cycles (__rdtsc difference) for the frame
// sim = cycles for the simulation
// ren = cycles for the rendering
// syn = cycles for vsync

// fi: 740 cyc: 46428064       sim: 21400025       ren: 1094274        syn: 23933106
// fi: 741 cyc: 46627764       sim: 22217575       ren: 1048855        syn: 23360148
// fi: 742 cyc: 46564577       sim: 22059713       ren: 958719         syn: 23545474
// fi: 743 cyc: 46048741       sim: 22540901       ren: 900595         syn: 22606731
// fi: 744 cyc: 46916673       sim: 22182977       ren: 949184         syn: 23783882
// fi: 745 cyc: 511899266      sim: 21164771       ren: 458690424      syn: 32043306

// Partial frame details from the rendering part
// sf : total cycles
// ss : cycles before sv
// sv : cycles for glBufferData for the vertex
// sb : cycles between sv and si
// si : cycles for glBufferData for the indices
// se : cycles after si
//
// string1
// fi: 744 sf: 32629          ss: 27189          sv: 714            sb: 2220           si: 226            se: 2063
// fi: 745 sf: 33385          ss: 27735          sv: 714            sb: 2210           si: 231            se: 2273
// string2
// fi: 744 sf: 325274         ss: 16548          sv: 17488          sb: 3869           si: 225356         se: 61793
// fi: 745 sf: 120346         ss: 16422          sv: 16091          sb: 2368           si: 26607          se: 58632
// string3
// fi: 744 sf: 112718         ss: 34508          sv: 24266          sb: 3376           si: 29558          se: 20716
// fi: 745 sf: 105267         ss: 32271          sv: 19798          sb: 16238          si: 16207          se: 20491
// string4
// fi: 744 sf: 98207          ss: 27195          sv: 29694          sb: 5975           si: 18443          se: 16648
// fi: 745 sf: 458054436      ss: 25746          sv: 457858831      sb: 11205          si: 15747          se: 142392

You should not delete buffer when you need to put new data in. DYNAMIC_DRAW probably is not helping because you are deleting buffers every frame, and GL driver every time sees new buffer and cannot make any intelligent decisions about it.
Simply call glBufferData again, and leave GL_DYNAMIC_DRAW. Basically take out glDeleteBuffers and glGenBuffers calls.
Don't create many buffers, try to create one or just a few big buffers and upload data to various parts of it with glBufferSubData. You can preallocate buffer memory by passing NULL for glBufferData. Then loop in renderCommandBuffer function won't need to bind new buffers or set up new vertex attributes for shader. To upload data to subbuffer more efficiently you'll need to use glMapBufferRange​+GL_MAP_INVALIDATE_BUFFER_BIT or glInvalidateBufferData functions.

Also look into various extensions that allow to buffer data to GPU more efficiently - ARB_buffer_storage.

Here is good information on streaming data to buffers:
https://www.opengl.org/wiki/Buffer_Object_Streaming
https://hacksoflife.blogspot.com/2012/04/beyond-glmapbuffer.html (be careful with this information, its 3y old)
https://hacksoflife.blogspot.com/...6/glmapbuffer-no-longer-cool.html

Edited by Mārtiņš Možeiko on
To add to the list of recommended reading on the subject:

http://www.seas.upenn.edu/~pcozzi...s-AsynchronousBufferTransfers.pdf
Thanks for the answers. I expected that the way I was creating/deleting buffers was not optimal, but didn't expected such huge gap in timing. More than 500 times slower seems like a lot. I will try to be smarter with how I use buffers.

I did a "not real test that doesn't mean anything": I created lots of buffers, bound them and uploaded data to them (never delete) and measure the time spent for the uploads. I did it with 1000 uploads of 256Ko and 1000 uploads of 1024Ko.
This test is not accurate or anything but there seems that there is a pattern of spikes every 16 Mo.

I then added a second loop that would write to the same buffers, but with glBufferSubData instead of glBufferData (writing the same amount of byte in the buffers). Most write were faster (like 10 times faster) but every 16 Mo the timing was not improved (and since other were faster, it a huge gap). But it is faster because it's the second time I'm writing in it. If I write a second time with glBufferData, the time is the same as with glBufferSubData. So next I let every test run several times.

And then I did 1000 glBufferSubData in 1 buffer (initialized with a glBufferData passing the total size and 0 as the data pointer): the same result as the test above, with huge spike every 16 Mo.
glBufferData_256k.txt The 3 tests are one after the other.

If I didn't do something stupid in those tests, it seems that even if I handle buffer differently, at some point I would get one long frame. I guess I have to try and see.

PS: I didn't read the pdf Jesse posted yet.
Welcome to figuring out opengl perf problems. The constant headache of game devs.
And the fun will start when you'll give your game/application to other people. You'll fix data buffering issue on your machine, but on different machines with different GPU's or driver version it might be not the most optimal way and will still produce lag... fun times!
I can't wait.
Recent discussions about OpenGL made me thing that I should mention here that I implemented buffer re-specification (buffer orphaning) a few months ago and it solved the problem.