Determining Thread Safety in Unit Tests

I frequently write unit tests to prove that some set of code is thread safe. Usually, I write these tests in response to a bug found in production. In this case, the purpose of the test is demonstrate that the bug is replicated (test fails), and that the new code fixes the threading problem (test passes), and then acts as a regression test for future releases.

Most of the tests thread safety tests I've written test a thread race condition, but some also test for thread deadlocks.

Proactively unit testing that code is thread safe is a little more tricky. Not because the unit test is more difficult to write, but because you have to do solid analysis to determine (guess, really,) what might be thread unsafe. If your analysis is correct, then you should be able to write a test that fails until you make the code thread safe.

When testing for a thread race condition, my tests almost always follow the same pattern: (this is pseudocode)

bool failed = false;
int iterations = 100;

// threads interact with some object - either 
Thread thread1 = new Thread(new ThreadStart(delegate() {
   for (int i=0; i<iterations; i++) {
     doSomething(); // call unsafe code
     // check that object is not out of synch due to other thread
     if (bad()) {
       failed = true;
     }
   }
}));
Thread thread2 = new Thread(new ThreadStart(delegate() {
   for (int i=0; i<iterations; i++) {
     doSomething(); // call unsafe code
     // check that object is not out of synch due to other thread
     if (bad()) {
       failed = true;
     }
   }
}));

thread1.Start();
thread2.Start();
thread1.Join();
thread2.Join();
Assert.IsFalse(failed, "code was thread safe");

Proving that something is thread safe is tricky - probably halting-problem hard. You can show that a race condition is easy to produce, or that it is hard to produce. But not producing a race condition doesn't mean it isn't there.

But: my usual approach here (if I have reason to think a bit of code that should be thread-safe, isn't) is to spin up a lot of threads waiting behind a single ManualResetEvent. The last thread to get to the gate (using interlocked to count) is responsible for opening the gate so that all the threads hit the system at the same time (and already exist). Then they do the work and check for sane exit conditions. Then I repeat this process a large number of times. This is usually sufficient to reproduce a suspected thread-race, and show that it moves from "obviously broken" to "not broken in an obvious way" (which is crucially different to "not broken").

Also note: most code does not have to be thread-safe.


I had a similar problem where we found Thread Safety bugs. To fix it we had to prove it and then fix it. That quest brought me to this page but I could not find any real answer. As many of the above answers explained why. But never the less I found a possible way that might help others:

public static async Task<(bool IsSuccess, Exception Error)> RunTaskInParallel(Func<Task> task, int numberOfParallelExecutions = 2)
    {
        var cancellationTokenSource = new CancellationTokenSource();
        Exception error = null;
        int tasksCompletedCount = 0;
        var result = Parallel.For(0, numberOfParallelExecutions, GetParallelLoopOptions(cancellationTokenSource),
                      async index =>
                      {
                          try
                          {
                              await task();
                          }
                          catch (Exception ex)
                          {
                              error = ex;
                              cancellationTokenSource.Cancel();
                          }
                          finally
                          {
                              tasksCompletedCount++;
                          }

                      });

        int spinWaitCount = 0;
        int maxSpinWaitCount = 100;
        while (numberOfParallelExecutions > tasksCompletedCount && error is null && spinWaitCount < maxSpinWaitCount))
        {
            await Task.Delay(TimeSpan.FromMilliseconds(100));
            spinWaitCount++;
        }

        return (error == null, error);
    }

This is not the cleanest code nor our final result but the logic stays the same. This code proved our Thread Safety bug every time.

Here is how we used it:

int numberOfParallelExecutions = 2;
RunTaskInParallel(() => doSomeThingAsync(), numberOfParallelExecutions);

Hope this helps someone.