The 2024 Wheel Reinvention Jam is in 16 days. September 23-29, 2024. More info

Beta 0.7.3.0

Finalspace
Beta v0.7.3.0

Here is another release for you all. This release contains mostly minor and major bugfixes, but a couple of functions as well.
Also i added LP (Low Precision) and HP (High precision) for fplGetTimeAs*
One of the major issue i finally solved is the hiding/showing of the cursor in win32 ;)

I highly recommend to update to this release as soon as possible. The release is tagged and documentation is updated as well.

One Last thing: I added a develop branch, so the master branch contains the current release only. That way i can work and break things, but still have a working release. I used git for so many years, but i never used it properly it seems...

Changelog of v0.7.3.0 beta:
- Changed: fplConsoleWaitForCharInput returns char instead of const char
- Changed: Added isDecorated field to fplWindowSettings
- Changed: Added isFloating field to fplWindowSettings
- Changed: Renamed fplSetWindowTitle() -> fplSetWindowAnsiTitle()
- Changed: Copy Ansi/Wide String pushes error for buffer range error
- Fixed: Fixed api name mismatch CloseFile() -> fplCloseFile()
- Fixed: Corrected wrong doxygen defines
- Fixed: Corrected most clang compile warnings
- New: Added fplIsWindowDecorated() / fplSetWindowDecorated()
- New: Added fplIsWindowFloating() / fplSetWindowFloating()
- New: Added fplSetWindowWideTitle()
- New: Added fplGetTimeInMillisecondsHP()
- New: Added fplGetTimeInMillisecondsLP()
- New: Added fplGetTimeInSecondsLP()
- New: Added fplGetTimeInSecondsHP()
- New: Added FPL_NO_ENTRYPOINT

- Changed: [Win32] fplAtomicExchangeS64() / fplAtomicAddS64() / fplAtomicStoreS64() uses _Interlocked* operatings directly for x86
- Fixed: [Win32] Corrected wrong case-sensitivity in includes
- Fixed: [Win32] Fixed Cursor visibility was not properly changeable
- Fixed: [Win32] Function prototype macros was not properly named
- New: [Win32] Implemented fplIsWindowDecorated() / fplSetWindowDecorated()
- New: [Win32] Implemented fplIsWindowFloating() / fplSetWindowFloating()
- New: [Win32] Implemented fplSetWindowWideTitle()
- New: [Win32] Implemented fplGetTimeInMillisecondsLP()
- New: [Win32] Implemented fplGetTimeInMillisecondsHP()
- New: [Win32] Implemented fplGetTimeInSecondsLP()
- New: [Win32] Implemented fplGetTimeInSecondsHP()
- New: [POSIX] Implemented fplGetTimeInMillisecondsLP()
- New: [POSIX] Implemented fplGetTimeInMillisecondsHP()
- New: [POSIX] Implemented fplGetTimeInSecondsLP()
- New: [POSIX] Implemented fplGetTimeInSecondsHP()

Comments

Answering your "// @NOTE(final): Why does MSVC have no _InterlockedExchange64 on x86???" comment in source code - that's because InterlockedXYZ functions exists only when there are simple CPU instruction that does what you need. x86 cannot do 64-bit atomic operation (LOCK + mov/add/.. opcode). You can do that only with cmpxchg8b and loop. That is what actual "InterlockedExchange64" function does, not intrinsic.

And cmpxchg8b function did not exist in original x86 instruction set. Only since Pentium Pro.

Btw, I think your fplAtomicAddS64 function for 32-bit windows is wrong. First it tries to cmpxchg value with 0? What if original value is not 0?
mmozeiko
Answering your "// @NOTE(final): Why does MSVC have no _InterlockedExchange64 on x86???" comment in source code - that's because InterlockedXYZ functions exists only when there are simple CPU instruction that does what you need. x86 cannot do 64-bit atomic operation (LOCK + mov/add/.. opcode). You can do that only with cmpxchg8b and loop. That is what actual "InterlockedExchange64" function does, not intrinsic.

And cmpxchg8b function did not exist in original x86 instruction set. Only since Pentium Pro.

Btw, I think your fplAtomicAddS64 function for 32-bit windows is wrong. First it tries to cmpxchg value with 0? What if original value is not 0?


That explains why there is no exchange for 64-bit. Thanks!

Also it seems that your question solved it magically -> I removed the #if defined(FPL_ARCH_X64) and use _InterlockedExchange64 or _InterlockedExchangeAdd64 always. It seems it got fixed after i corrected the type cast to LONG64.

But regarding your question.

InterlockedCompareExchange returns always the initial value, regardless of the change.
If i compare and exchange a value with 0 and 0 i get always the old value and replacing 0 zero 0 does not change the "actual" result - but technically it may change it to zero so it do a few extra cycles.
Yeah, but what happens next? Let's say you are running two threads A and B.

A: reads oldValue
B: reads oldValue
B: calculate oldValue + addendB
A: calculate oldValue + addendA
A: writes oldValue + addendA
B: tries to write oldValue + addendB, but will fail, because _InterlockedCompareExchange64 will compare value it with original value - oldValue.

In result only one addendB has been added to "value". But code should have added both - addendA and addendB.

So it is wrong.
and that's why CompareAndExchange is nearly always done in a loop so you can retry the operation if it fails.
mmozeiko
Yeah, but what happens next? Let's say you are running two threads A and B.

A: reads oldValue
B: reads oldValue
B: calculate oldValue + addendB
A: calculate oldValue + addendA
A: writes oldValue + addendA
B: tries to write oldValue + addendB, but will fail, because _InterlockedCompareExchange64 will compare value it with original value - oldValue.

In result only one addendB has been added to "value". But code should have added both - addendA and addendB.

So it is wrong.


Yeah my workaround for x86 was naive, so the value could be wrong.
But now it uses just one intrinsic for each atomic function so its fine.