Default assignment operator of hmm_vec4 crash?!

Hi!

I have a a backbuffer rendering approach that fills a buffer with rendering primitives for the platform to render after a frame.
The "problem" is that I don't fully understand why I have to call memcpy for the following code to not crash.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  RenderCommandRect *rect = RenderBufferAlloc(renderCommandsBuffer, RenderCommandRect);
  rect->pos = pos * M2P_FACTOR;
  rect->size = size * M2P_FACTOR;
  #if 1
    // TODO(Hakan): why is this necessary?
    memcpy(&rect->color, &color, sizeof(hmm_vec4));
  #else
    // This throws 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
    rect->color = color;
  #endif
  rect->thickness = thickness;


I assume this is a problem because the hmm_vec4 is stored using SSE intrinsic?
The hmm_vec4 uses the default assignment operator, which to my understanding does a copy of each member(?)

1
2
3
4
5
6
// Note this is somewhat simplified
typedef union hmm_vec4
{
  float Elements[4];
    __m128 InternalElementsSSE;
} hmm_vec4;



This is the generated assembly where it crashes on the last assembly instruction.
The value of xmm0 is as expected and the pointers in rax and rcx looks valid as well.
So then where does the alleged read from 0xFFFFFFFFFFFFFFFF occur? Could the debugger be reporting some sort of false-positives?

1
2
3
4
5
    rect->color = color;
00007FF905448243  mov         rax,qword ptr [rect]  
00007FF905448248  mov         rcx,qword ptr [&color]  
00007FF905448250  movups      xmm0,xmmword ptr [rcx]  
00007FF905448253  movdqa      xmmword ptr [rax+10h],xmm0  




Edited by AntonHakansson on
Not sure about 0xFFFFFFFFFFFFFFFF (is your rect pointer valid?). Are you sure it happens at this place, not from inside of some exception handler?

But most likely what happens is that compiler sees that members of hmm_vec4 uses SSE types and that means it assumes it can load and store them with aligned SSE instructions. You need to guarantee that memory is aligned. Basically your RenderBufferAlloc should return 16-byte aligned pointer.

Edited by Mārtiņš Možeiko on
mmozeiko
You need to guarantee that memory is aligned.


mmozeiko thank you! The alignment turned out to be the problem as you suggested. I've never had any experience working with SSE instruction before so I was kinda set out to fail haha.

For anyone interested I made RenderBufferAlloc return 16-byte aligned pointers and also made the color field occupy the first 16-bytes of the RenderCommandRect structure to ensure that the hmm_vec4 is guaranteed to be 16-byte aligned. Of course this is only a temporary solution and I should probably continue watching handmade hero past the first 50 days haha. I noticed at a quick glance that Casey seems to be using a similar approach with a push buffer renderer but probably alot better.
You don't need to put color as first member. Compiler knows alignment for each member in structure. So it will pad with empty space between them to maintain alignment.

Example:
1
2
3
4
5
6
struct Foo
{
  char     A;
  int      B;
  hmm_vec4 C;
}

A will be at offset 0.
B will be at offset 4 and 3 bytes before it will be padding.
C will start at offset 16, and 12 bytes before it will be padding.
Total structure size will be 32.

Here's godbolt that illustrates this: https://godbolt.org/z/hRD8wu