The 2024 Wheel Reinvention Jam is in 3 days. September 23-29, 2024. More info

Trouble with Multithreading

Finalspace
I never thought pthread would be so different from win32, but something is wrong with my pthread implementation.

My simple testbed which creates 3 threads - one master thread, two slave threads does not work at all. Both slave threads waits to get a signal each, the master thread sets both signals - but only one thread gets signaled properly, the other threads waits forever.

So i need some clarifications if my assumptions are correct

1.) Locking a mutex

1
2
3
4
int lockRes;
do {
  lockRes = pthreadApi->pthread_mutex_lock(handle);
} while(lockRes == EAGAIN);


It cannot be used on multiple threads, only one thread can lock/unlock the mutex, right?

2.) pthread_cond_wait() and pthread_cond_timedwait()

Both requires a locked mutex to get signaled properly. Without a lock you are screwed.

If multiple threads uses the same mutex for waiting with each having its own condition you are screwed because when one condition gots signaled and the mutex gots unlocked, the other threads will wait forever.

3.) I translated the win32 to pthread like this:

Creating a signal:

1
HANDLE handle = CreateEventA(NULL, FALSE, FALSE, NULL);


->

1
2
3
4
5
6
7
8
fpl_internal int fpl__PosixConditionCreate(const fpl__PThreadApi *pthreadApi, pthread_cond_t *handle) {
	*handle = PTHREAD_COND_INITIALIZER;
	int condRes;
	do {
		condRes = pthreadApi->pthread_cond_init(handle, fpl_null);
	} while(condRes == EAGAIN);
	return(condRes);
}


Set a signal + Send broadcast:

1
2
HANDLE handle ...
SetEvent(handle);


->

1
2
3
4
pthread_cond_t *condition = ...
int condRes = pthreadApi->pthread_cond_signal(condition);
pthreadApi->pthread_cond_broadcast(condition);
result = (condRes == 0);


Wait for a signal until it gets a broadcast/signaled:

1
2
HANDLE handle ...
WaitForSingleObject(handle, INFINITE);


->

1
2
3
pthread_cond_t *condition = ...
pthread_mutex_t *mutex = ...
pthreadApi->pthread_cond_wait(condition, mutex);

Comments

1) no. This can be used safely on multiple threads. That's the point of mutex - to synchronize threads. If multiple threads will call lock at the same time, only one of them will succeed, and rest of them will wait until first one calls unlock. Same on Windows with critical sections or mutex objects.

2) yes, you should use dedicated mutex per condition object. There is alternative way to build "condition variables" on Linux - using semaphores. Or evenfd. eventfd is very similar to CreateEvent functionality on windows. eventfd also allows to wait on multiple event handles at the same time (with select/poll/epoll) - same as on Windows with WaitForMultipleObjects.

3) events and conditional variables is very different thing. Windows event is equivalent to eventfd on Linux. And pthread_cond_t is equivalent to CONDITION_VARIABLE on Windows.

Your code seems ok, but remember to lock/unlock mutex around signal/broadcast/wait functions. Also don't call signal/broadcast at the same time. Use only one of them depending on how many threads you want to wake up. Signal wakes up only one of the threads that waits. Broadcast wakes up all of them.
mmozeiko
1) no. This can be used safely on multiple threads. That's the point of mutex - to synchronize threads. If multiple threads will call lock at the same time, only one of them will succeed, and rest of them will wait until first one calls unlock. Same on Windows with critical sections or mutex objects.

2) yes, you should use dedicated mutex per condition object. There is alternative way to build "condition variables" on Linux - using semaphores. Or evenfd. eventfd is very similar to CreateEvent functionality on windows. eventfd also allows to wait on multiple event handles at the same time (with select/poll/epoll) - same as on Windows with WaitForMultipleObjects.

3) events and conditional variables is very different thing. Windows event is equivalent to eventfd on Linux. And pthread_cond_t is equivalent to CONDITION_VARIABLE on Windows.

Your code seems ok, but remember to lock/unlock mutex around signal/broadcast/wait functions. Also don't call signal/broadcast at the same time. Use only one of them depending on how many threads you want to wake up. Signal wakes up only one of the threads that waits. Broadcast wakes up all of them.


Okay seems i totally misunderstood condition variables, i never used condition variables in win32 before. I just use auto-reset events and critical sections. As a matter of fact, the FFMPEG demo and the audio system relies heavy on that concept, because it is very lightweight on win32.

But i found a resource which simulates events using pthread:
https://github.com/neosmart/pevents/blob/master/pevents.cpp

Thats exactly what i was looking for.

Need to change a lot to the threading api, so FPL have events, conditions and semaphores in addition as well.
Actually its other way around - condition variables are much more lightweight than events. Its better to design your synchronization directly to use semaphores or condition variables instead of events (or use conditional varibles to simulate events).

As I said - if you want Windows event like synchronization, then you can simply use eventfd. No need to simulate them. Here's example how to use it (WARNING - no error checking): https://gist.github.com/mmozeiko/b2a72ab9fe52e6c2b88fff5ec8eaeba7
eventfd actually is more powerful than Windows event, it can handle more complex synchronization situations.

Btw, I hate when platform abstraction libraries create their own types for events / files / sockets / etc. Because that prevents you to use a lot of good synchronization situations. All modern OS'es allow to wait simultaneously on event and file/socket and other handles (WaitForMulipleObjects/epoll). On Windows both file and event has HANDLE type. On Posix - int type. Sure, you can drop down to actual OS primitives if library allows to access struct members, but then its kind of pointless to use abstraction library at all. Basically library at this point makes you to write more inefficient code, but using multithreading/events/etc should be about writing more efficient code.
mmozeiko
Actually its other way around - condition variables are much more lightweight than events. Its better to design your synchronization directly to use semaphores or condition variables instead of events (or use conditional varibles to simulate events).

As I said - if you want Windows event like synchronization, then you can simply use eventfd. No need to simulate them. Here's example how to use it (WARNING - no error checking): https://gist.github.com/mmozeiko/b2a72ab9fe52e6c2b88fff5ec8eaeba7
eventfd actually is more powerful than Windows event, it can handle more complex synchronization situations.

Btw, I hate when platform abstraction libraries create their own types for events / files / sockets / etc. Because that prevents you to use a lot of good synchronization situations. All modern OS'es allow to wait simultaneously on event and file/socket and other handles (WaitForMulipleObjects/epoll). On Windows both file and event has HANDLE type. On Posix - int type. Sure, you can drop down to actual OS primitives if library allows to access struct members, but then its kind of pointless to use abstraction library at all. Basically library at this point makes you to write more inefficient code, but using multithreading/events/etc should be about writing more efficient code.


Thanks for that gist, i will look into that.

Regarding your last statement, i hate it too when libraries introduces more stuff than what is actually needed.
For FPL i just want it to mirror every handle / file / socket to match the OS specific thing directly, without any overhead if possible.

If you check out the all the fpl*Handle structs, you will see that Signals are the only thing which totally breaks this. So its time to clean this up. Also the term "Signal" is wrong, it should be "Event" instead.

Oh and i see that fplThreadHandle contains stuff which i added for testing, but forgot to remove it (Mutex, Condition).
So signals are the only thing which really needs a cleanup.

But why is it ineffecient to have access to the underlying OS primitives? I find it very ineffectient to introduce a allocation just to hide away the internal OS specific handles. Thats what most libraries do, like for example SDL. They will never show the internal stuff.

Style-A (Typical style for most libraries):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Header
typedef void *fplEventHandle;

// Impl
typedef struct fplInternalEventHandle {
#if defined(FPL_PLATFORM_WIN32)
  HANDLE win32EventHandle;
#elif defined(FPL_PLATFORM_LINUX)
  int linuxEventHandle;
#else
  int dummy;
#endif
} fplInternalEventHandle;

fplEventHandle fplEventCreate() {
  fplInternalEventHandle *internalHandle = (fplInternalEventHandle *)malloc(sizeof(fplInternalEventHandle ));
  // ...
  fplEventHandle result = (void *)internalHandle;
}

void fplEventDestroy(fplEventHandle *eventHandle) {
  fplInternalEventHandle *internalHandle = (fplInternalEventHandle *)eventHandle;
#if defined(FPL_PLATFORM_WIN32)
  internalHandle->win32EventHandle ...
#else if defined(FPL_PLATFORM_LINUX)
  internalHandle->linuxEventHandle = ...
#endif
  free(internalHandle);
}


Style-B (Which FPL targets):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Header
typedef struct fplEventHandle {
#if defined(FPL_PLATFORM_WIN32)
  HANDLE win32EventHandle;
#elif defined(FPL_PLATFORM_LINUX)
  int linuxEventHandle;
#else
  int dummy;
#endif
} fplEventHandle;

// Impl

bool fplEventInit(fplEventHandle *handle) {
#if defined(FPL_PLATFORM_WIN32)
  handle->internalHandle.win32EventHandle ...
#else if defined(FPL_PLATFORM_LINUX)
  handle->internalHandle.linuxEventHandle = ...
#endif
}

void fplEventDestroy(fplEventHandle *handle) {
  // ...
}


Really i dont see any reason why Style-B is ineffecient. It is not ineffecient or slower, but it is more unsafe - because you can break the internal structures more easiely.
Finalspace
But why is it ineffecient to have access to the underlying OS primitives? I find it very ineffectient to introduce a allocation just to hide away the internal OS specific handles. Thats what most libraries do, like for example SDL. They will never show the internal stuff.


It's not about efficiency but about encapsulation and robustness, having direct access to the OS handles means the usercode could destroy them or do other nefarious things with them that would invalidate some of your internal invariants and break code in strange ways that can be very hard to debug.

ratchetfreak
Finalspace
But why is it ineffecient to have access to the underlying OS primitives? I find it very ineffectient to introduce a allocation just to hide away the internal OS specific handles. Thats what most libraries do, like for example SDL. They will never show the internal stuff.


It's not about efficiency but about encapsulation and robustness, having direct access to the OS handles means the usercode could destroy them or do other nefarious things with them that would invalidate some of your internal invariants and break code in strange ways that can be very hard to debug.



Well if you just leave the "internal" fields untouched, you wont break anything.
But its true, in general this approach is more unsafe - especially for inexperienced programmers. But this is the same discussion as private vs public fields in classes -> Hide everything and expose bare minimum. I write OOP shit every fucking day and its insane how much data is hidden in all the thirdparty libraries, resulting in no control, you cannot fix any bugs whatsoever.

You could do something like this, but i wont do it - because my philosophy is: Do not hide anything at all:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Header
typedef struct fplInternalWhateverHandle {
  ... tons of internal stuff
} fplInternalWhateverHandle;

#if defined(FPL_HIDE_IMPLEMENTATION)
typedef void *fplWhateverHandle;
fplWhateverHandle fplCreateWhatever();
#else
typedef fplInternalWhateverHandle fplWhateverHandle;
bool fplInitWhatever(fplWhateverHandle *handle);
#endif

// Implementation

bool fplInitWhatever(fplWhateverHandle *handle) {
  fplInternalWhateverHandle *internalHandle = (fplInternalWhateverHandle *)handle;
  ...
}

fplWhateverHandle fplCreateWhatever() {
  fplInternalWhateverHandle *handle = (fplInternalWhateverHandle *)malloc(sizeof(fplInternalWhateverHandle)));
  ...  
}