How to wait for threads to finish without a busy loop ?

I have a thread queue similar to what Casey does in Handmade hero and I would like to be able to pause the main thread while other threads are working. More precisely I want the main thread to wake up when all other threads are paused in WaitForSingleObjectEx on the queue semaphore. At the moment the code looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
uint32_t thread_queue_work( thread_queue_t* queue, ... ) {
    uint32_t result = 0;
    ...
    ReleaseSemaphore( queue->semaphore, 1, 0 );
    ...  
    return result;
}

DWORD thread_main( void* thread_data ) {
    
    thread_queue_t* queue = ( thread_queue_t* ) thread_data;
    
    while ( !queue->terminate ) {
        
        if ( !thread_do_work( queue ) ) {
            WaitForSingleObjectEx( queue->semaphore, INFINITE, 0 );
        }
    }
        
    return 0;
}

void thread_complete_queue( thread_queue_t* queue ) {
    while ( queue->completion_count != queue->completion_goal ) {
        thread_do_work( queue );
    }
}

int main( int argc, char** argv ) {
    thread_queue_t queue = thread_queue_init( );

    thread_queue_work( &queue, ... );
    thread_queue_work( &queue, ... );
    thread_complete_queue( &queue );

    thread_queue_work( &queue, ... );
    thread_queue_work( &queue, ... );
    thread_complete_queue( &queue );

    return 0;
}


But if there is only two tasks to do and they take some time, thread_complete_queue is mostly a busy loop which I would like to avoid. Also not all program will use thread_complete_queue, so the thread must not do something that would fail if thread_complete_queue is not called.

Is there a good way to do this ? Is it possible to wait for the semaphore to "be 0" ? Should I use CreateEvent to create an event for each thread and use WaitForMultipleObjects ? Would that work if thread_complete_queue isn't called ? Would it work if I have several hundred threads ?

Edited by Simon Anciaux on Reason: precisions
WaitForMultipleObjects will have limitation on how many to wait, 64 if I remember correctly.

One obvious way would be to implement it with Event/ConditionVariable (& count of currently working threads. Every time worker finishes work it does following two things - decrease currently working threads and signal event/condvar. All thread share same event/condvar. Main thread in a loop does following thing - waits for event/condvar & checks for thread count. Once it reaches 0, finish the loop. Yes, the wait won't be done only "once". It will wake up multiple times, same as your current while loop. But this solution is easy, and in most cases "good enough".
with an atomic variable you can do
1
2
3
if(atomic_decrement(&countTask)==0){//that is new value is 0
    signal(&cond_var);
}


then the main thread is only woken once.
Thanks to you both. Here is what I ended up with:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
uint32_t thread_do_work( thread_queue_t* queue ) {
    ... 
    if ( InterlockedIncrement( &queue->completion_count ) == queue->completion_goal ) {
        SetEvent( queue->complete_event );
    }
    ...
}

DWORD thread_main( void* thread_data ) {
    
    thread_queue_t* queue = ( thread_info_t* ) thread_data;
    
    while ( !queue->terminate ) {
        if ( !thread_do_work( queue ) ) {
            WaitForSingleObjectEx( queue->semaphore, INFINITE, 0 );
        }
    }
    
    if ( ( uint32_t ) InterlockedIncrement( &queue->terminated_count ) == queue->thread_count ) {
        SetEvent( queue->terminate_event );
    }
    ...
}

uint32_t thread_queue_work( thread_queue_t* queue, thread_callback_t callback, void* callback_data ) {
    ...
    ReleaseSemaphore( queue->semaphore, 1, 0 );
    ...
}

void thread_complete_queue( thread_queue_t* queue ) {
    
    while ( queue->completion_count != queue->completion_goal ) {
        if ( !thread_do_work( queue ) ) {
            WaitForSingleObjectEx( queue->complete_event, INFINITE, 0 );
        }
    }
}

void thread_terminate( thread_queue_t* queue ) {
    
    queue->terminated_count = 0;
    InterlockedIncrement( &queue->terminate );
    ReleaseSemaphore( queue->semaphore, queue->thread_count, 0 );
    
    while ( ( uint32_t ) queue->terminated_count != queue->thread_count ) {
        WaitForSingleObjectEx( queue->terminate_event, INFINITE, 0 );
    }
    
    CloseHandle( queue->semaphore );
}