handmade.network » Wiki » Deadlock

Deadlock is a problem that arises from the misuse of synchronization objects, causing one or multiple threads to lock up forever.

A proper understanding of deadlock is imperative if you wish to create any non-trivial multithreaded programs.

Understanding Mutual Exclusion

Operating systems generally provide a mutual exclusion object, that can be acquired and released. It might be defined in the kernel like this:

1
2
3
4
struct Mutex {
    Thread *volatile owner;
    LinkedList<Thread> blockedThreads;
};

The owner field stores the thread that currently owns the mutex. The blockedThreads fields stores a list of threads that are waiting for the mutex to be released, so that they can attempt to acquire it.

When a thread tries to acquire a mutex, the kernel checks to see if owner is null. If it is, the kernel stores a pointer to the current thread in the field. If a thread is already there, then the kernel will insert the thread into the blockedThreads list.

When a thread tries to release a muetx, the kernel first ensures that the owner field matches the current thread. If it does, then the owner field can be re-set to null, and one thread (if there is one) from the blockedThreads list can be removed, and executed.

We can use the mutex object in situations like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
OSHandle mutex = OSCreateMutex();
volatile int counter = 0;

int SomeFunction(int increment) {
    int newValue;

    OSAcquireMutex(mutex);

    // Only one thread can execute this code at a time.
    counter += increment;
    newValue = counter;

    OSReleaseMutex(mutex);

    return newValue;
}

The mutex ensures that only one thread can increment the counter at a time, because the mutex can only be acquired when no thread owns it. If it is already acquired, then the thread attempting to acquire it will wait until the thread that owns it releases it.

Simplest Deadlock

Consider what would happen if we did this with a mutex.

1
2
OSAcquireMutex(mutex);
OSAcquireMutex(mutex);

On line 1, the current thread acquires the mutex, and the mutex is now owned by the current thread. In the kernel, the owner field is set to point to the current thread.

On line 2, the current thread attempts to acquire the mutex, but since the mutex is already owned, it is inserted into the blockedThreads list.

But since it owns the mutex, the mutex can now never be released. Therefore, the thread will spend the rest of its life blocking until it is terminated (by a different thread).

This is the simplest case of deadlock.

(Actually, since you would never want this behaviour to occur, some operating system supply recursive mutexes - if the current thread tries to acquire a mutex it already owns, it just increments an internal counter. For the sake of simplicity though, we'll use our simplified model.)

A More Complicated Example

The example above shouldn't ever be an actual problem, since deadlock is guaranteed to occur if the problematic code path is ever taken, allowing the programmer to immediately notice and fix the issue.

Real deadlock - involving more than one mutex - is perhaps more difficult to wrap one's head around.

Consider the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
OSHandle mutex1 = OSCreateMutex();
volatile unsigned int counter1 = 0; // Only read/write when mutex1 is acquired.

OSHandle mutex2 = OSCreateMutex();
volatile unsigned int counter2 = 0; // Only read/write when mutex2 is acquired.

void IncrementCounter1() {
    OSAcquireMutex(mutex1);
    counter1++;
    OSReleaseMutex(mutex1);
}

void IncrementCounter2() {
    OSAcquireMutex(mutex2);
    counter2++;
    OSReleaseMutex(mutex2);
}

void IncrementBothCounters() {
    OSAcquireMutex(mutex1);
    counter1++;

    OSAcquireMutex(mutex2);
    counter2++;

    OSReleaseMutex(mutex2);
    OSReleaseMutex(mutex1);
}

bool IsCounter2LargerThanCounter1() {
    OSAcquireMutex(mutex2);

    if (counter2 == 0) {
        // Early out: if counter2 is 0, it can't be larger than counter 1.
        OSReleaseMutex(mutex2);
        return false;
    }

    OSAcquireMutex(mutex1);
    bool result = counter2 > counter1;

    OSReleaseMutex(mutex1);
    OSReleaseMutex(mutex2);

    return result;
}

In this example there are 2 mutexes, mutex1 and mutex2, and 2 counters, counter1 and counter2.

Before the program tries to read or write to one of the counters, it needs to acquire the corresponding mutex. (BTW - there's no particular reason that they don't share a mutex other than I need 2 mutexes to demonstrate the concept of deadlock in this example.)

Now, if you're not familiar with deadlock you probably won't have spotted the problem with this code. (I'm assuming you have the same level of inituition with deadlock that I did when I first tried writing multihreaded code.) But this code can cause deadlock, when used by multiple threads.

Let's consider what might happen if there are 2 threads running this code. The first thread calls IncrementBothCounters, and the second thread calls IsCounter2LargerThanCounter1.

1: Calls IncrementBothCounters
1: Acquires mutex1.
1: Increment counter1.

2: Calls IsCounter2LargerThanCounter1
2: Acquires mutex2.
2: Checks if counter2 is equal to 0. (It isn't.)
2: Attempts to acquire mutex1, but it's already owned by thread 1, so it gets added to mutex1's blockedThreads list.
2: <blocking until mutex1 is released>

1: Attempts to acquire mutex2, but it's already owned by thread 2, so it gets added to mutex2's blockedThreads list.
1: <blocking until mutex2 is released>

Deadlock.

If the events occur in the order described above, then we get deadlock.

Thread 1 is waiting for thread 2 to do something, but thread 2 is waiting for thread 1 to do something. So neither of them can continue. This is a circular wait.

So, how do we fix it?

We simply have to make sure that in all code paths the mutexes are acquired in a consistent order.

Let's say we have to acquire mutex2 before mutex1 is acquired. This doesn't mean we have to acquire mutex2 to acquire mutex1, but we can't acquire mutex2 after mutex1 is acquired.

To fix the code, we can simply update IncrementBothCounters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
void IncrementBothCounters() {
    OSAcquireMutex(mutex2);
    counter2++;

    OSAcquireMutex(mutex1);
    counter1++;

    OSReleaseMutex(mutex1);
    OSReleaseMutex(mutex2);
}

And the code is now deadlock-free (I think: multithreaded code is hard).

A Different Solution

While in the previous section we discussed the idea of acquiring mutexes in a consistent order, there's actually another way deadlock can be avoided.

Some OSes allow processes to acquire multiple mutexes atomically.

For example, let's say we have a graphics library which provides drawing surfaces, each of which has a mutex. Consider a subroutine that copies the contents of one surface to another:

1
2
3
void CopySurface(Surface *destination, Surface *source, Rectangle destinationRegion, Point sourceOffset) {
    ...
}

The first thing the subroutine has to do is acquire the mutexes for both the surfaces.

Written naively, that would look like this:

1
2
OSAcquireMutex(destination->mutex);
OSAcquireMutex(source->mutex);

But as we have learnt, this could cause deadlock.

There isn't really a way to determine which mutex should be acquired first - the caller could draw from one surface to another, and do the same in reverse. So we can't just swap the lines around this time.

It might be possible to make a rule like "the mutex of the surface with the lowest memory address is always acquired first", but that's not a particularly nice solution.

Instead, we can tell the operating system to acquire both mutexes atomically.

1
2
OSHandle mutexes[2] = { destination->mutex, source->mutex };
OSAcquireMultipleMutexes(mutexes, 2);

The OS can ensure there is no time in which this thread only owns a single of these mutexes - only ever neither or both.

This solves the problem of deadlock.

Minimal requirements

There are 4 minimal requirements for deadlock to be possible:

  1. mutual exclusion: only a single thread can hold a resource at a time. If another thread wants to use it it needs to wait until the resource is released.

  2. hold and wait: A thread is holding a resource while waiting on another.

  3. no preemtion: a resource can only be released by the thread holding it.

  4. Circular wait: the directed graph representing which threads are waiting for which contains a cycle.

If you eliminate any of these conditions then deadlock will never occur.