Handmade Network»Forums
248 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?

I'm pretty new to the Intel intrinsics, so is there any intrinsic for doing ops between lanes of a SIMD register?

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?

Shuffle top 64-bits down to low 64-bit, then add to original register.

248 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Replying to mmozeiko (#29657)

So does the shuffle happen on another register, or the same register? What instruction can do that?

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Edited by Mārtiņš Možeiko on

_mm_shuffle_epi32 intrinsic, pshufd instruction.

If you use intrinsics, then you don't worry about which exactly registers will be used for calculations - compiler will allocate them for you.

See example code here: https://godbolt.org/z/T4YrMbGKf

248 posts
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Replying to mmozeiko (#29659)

What is the _MM_SHUFFLE doing? The intel intrinsic guide says it is the control bit, but what does (3,2,3,2) mean?

When I asked about register usage, I meant is there any way to only use the original register without any additional ones (presumably with only a single instruction, something like: _mm_merge_epi64(myRegister))? In this case, you still need an additional shuffle and store that in another register first.

Mārtiņš Možeiko
2583 posts / 2 projects
How to add the high 64-bit integer to the low 64-bit integer of a __m128 register?
Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#29660)

_MM_SHUFFLE(a,b,c,d) is macro that gets 4 arguments, each 2 bit value, and calculates (a<<6)|(b<<4)|(c<<2)|d.

What these bits mean you can read in Intel intrinsic guide for instruction you are using it in. For _mm_shuffle_epi32 it will calculate following:

dst[0] = src[d]
dst[1] = src[c]
dst[2] = src[b]
dst[3] = src[a]

Where src/dst are input/output 128-bit registers. And I use array syntax to access 32-bit lanes. In your case if you want top 64-bit (src[2] and src[3]) to be moved to low 64-bit (dst[0] and dst[1]). Thus d=2 and c=3. But a and b values don't matter, can be anything - 0, 1, 2 or 3.

There is no way to add top 64-bit to low 64-bit of __m128i with single instruction.