We are currently in the process of converting the website to the new design. Some pages, like this one, are still broken. We appreciate your patience.
Handmade Network»Forums
251 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?

I'm pretty new to the Intel intrinsics, so is there any intrinsic for doing ops between lanes of a SIMD register?

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?

Shuffle top 64-bits down to low 64-bit, then add to original register.

251 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Replying to mmozeiko (#29657)

So does the shuffle happen on another register, or the same register? What instruction can do that?

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Edited by Mārtiņš Možeiko on

_mm_shuffle_epi32 intrinsic, pshufd instruction.

If you use intrinsics, then you don't worry about which exactly registers will be used for calculations - compiler will allocate them for you.

See example code here: https://godbolt.org/z/T4YrMbGKf

251 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Replying to mmozeiko (#29659)

What is the _MM_SHUFFLE doing? The intel intrinsic guide says it is the control bit, but what does (3,2,3,2) mean?

When I asked about register usage, I meant is there any way to only use the original register without any additional ones (presumably with only a single instruction, something like: _mm_merge_epi64(myRegister))? In this case, you still need an additional shuffle and store that in another register first.

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#29660)

_MM_SHUFFLE(a,b,c,d) is macro that gets 4 arguments, each 2 bit value, and calculates (a<<6)|(b<<4)|(c<<2)|d.

What these bits mean you can read in Intel intrinsic guide for instruction you are using it in. For _mm_shuffle_epi32 it will calculate following:

dst[0] = src[d]
dst[1] = src[c]
dst[2] = src[b]
dst[3] = src[a]

Where src/dst are input/output 128-bit registers. And I use array syntax to access 32-bit lanes. In your case if you want top 64-bit (src[2] and src[3]) to be moved to low 64-bit (dst[0] and dst[1]). Thus d=2 and c=3. But a and b values don't matter, can be anything - 0, 1, 2 or 3.

There is no way to add top 64-bit to low 64-bit of __m128i with single instruction.