f = f + 0x1p23f - 0x1p23f;
For some reason, this will convert 2.6f
to 3f
.
float fun(float f) { float x = !(f < 0) ? f : -f; float y = !(x < 0x1p31f) ? x : (float)(int)x; return !(f < 0) ? y : -y; }
Why do you check for 0x1p31
rather than 0x1p23
?
Regarding pointer arithmetic, if I cast all the pointers to integers first, and then do the math, will it work? For some reason, I thought that casting to character pointer should always work, that's why I sad for non-char pointers in the OP.
For some reason, this will convert 2.6f to 3f.
Yes, you're right. It will round. I wrote it wrong. To truncate it will require extra conditional to compare & subtract 1 in case it rounded up.
Why do you check for 0x1p31 rather than 0x1p23?
Anything between 0x1p23 and 0x1p31 will work there. Values up to 1<<31 still fit into int type, so it's safe to cast. The min value to compare there must be at least 0x1p23, because you need that casting only for floats smaller than 1<<23.
if I cast all the pointers to integers first, and then do the math, will it work?
Work for what? It will work in a sense that it won't be UB. But the resulting subtraction may not be meaningful value depending on target architecture. On some architectures pointers point to completely different memory - like on GPU - global vs local one. The difference between such pointers is useless.
Once you have UB in code, you cannot rely any calculations. Comment out "This is probably UB" section and your printf at the end will work as expected.
Huh, interesting. So UB affects not only that one line but also the code that comes after it. But why comment out the "probably UB" line solves it? Shouldn't the original "obviously UB" line also have an effect? When I move the last calculation to before the second calculation, the -fsanitize=undefined
works in both cases. Moving it to the top before the first "obviously UB" line only makes itself work while "probably UB" doesn't.
It affects everything. That comes after, that comes before. Related variable, unrelated variables. You cannot reliably predict what compiler will do if you have UB in your code, unless you start analyzing over all internals of compiler.
UB means that compiler is allowed assume this calculation, or operation, or statement never happens in actual lifetime of program. Thus it is free to remove it or change however it wants it. That means this situation affects everything that leads to this UB, or anything that goes after it.
So how did you know commenting out the "probably UB" line would make it generate the correct code? Was it a heuristic guess, or did you just play with it until the output was correct?
It was either that part, of the other UB. I'm not saying that leaving other UB is fine. It just happened to work with only one UB removed. Ideally you should not have any UB so compiler does not have possibility to miscompile your code.
I've been using SSE2 for my function based on your gist. How can I do the same thing for 64-bit integers? There aren't any _mm_cvtepi64_pd
or _mm_cvttpd_epi64
instructions. Your gist also defines a HAS_SSE4
macro. Afaik, MSVC doesn't have a compile-time way to check for SSE4. It seems like you just check if the compiler is MSVC then use SSE4.
For doubles you can do SSE1 way, with repeated add/sub's. Or you can unpack exponent & mantissa and do it "manually" in more oldschool way, that's probably what C runtime does.
Or depending on your target audience just use SSE4 and not worry about. In Steam Hardware Survey you can see that almost everybody has SSE4:
MSVC does not care about intrinsic availability, it just compiles function with whatever intrinsics you put inside. gcc/clang cares that proper target options are enabled. That's why I was checking for SSE4_1 define. But there is a simpler way - I've updated gist to forcefully enable SSE4.1 for sse4_floor function, this way no extra compiler arguments needed. Of course it is still up to you to verify that sse4 is available before running it (or just require it unconditionally as minspec).