Why must one call join() or detach() before thread destruction?

You might want a thread to completely clean up after itself when it's done leaving no traces. This would mean that you could start a thread and then forget about it.

But you might also want to be able to manage a thread while it was running and get any return value it had provided when it was done. In this case, if a thread cleaned up after itself when it was done, your attempt to manage it could cause a crash because you would be accessing a handle that might be invalid. And to check for the return value when the thread finishes, the return value has to be stored somewhere, which means the thread can't be fully cleaned up because the place where the return value is stored has to be left around.

In most frameworks, by default, you get the second option. You can manage the thread (by interrupting it, sending signals to it, joining it, or whatever) but it can't clean up after itself. If you prefer the first option, there's a function to get that behavior (detach) but that means that you may not be able to access the thread because it may or may not continue to exist.

Technically the answer is "because the spec says so" but that is an obtuse answer. We can't read the designers' minds, but here are some issues that may have contributed:

With POSIX pthreads, child threads must be joined after they have exited, or else they continue to occupy system resources (like a process table entry in the kernel). This is done via pthread_join(). Windows has a somewhat analogous issue if the process holds a HANDLE to the child thread; although Windows doesn't require a full join, the process must still call CloseHandle() to release its refcount on the thread.

Since std::thread is a cross-platform abstraction, it's constrained by the POSIX requirement which requires the join.

In theory the std::thread destructor could have called pthread_join() instead of throwing an exception, but that (subjectively) that may increase the risk of deadlock. Whereas a properly written program would know when to insert the join at a safe time.

See also:

https://en.wikipedia.org/wiki/Zombie_process
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa
https://docs.microsoft.com/en-us/windows/win32/procthread/terminating-a-process

You're getting confused because you're conflating the std::thread object with the thread of execution it refers to. A std::thread object is a C++ object (a bunch of bytes in memory) that acts as a reference to a thread of execution. When you call std::thread::detach what happens is that the std::thread object is "detached" from the thread of execution -- it no longer refers to (any) thread of execution, and the thread of execution continues running independently. But the std::thread object still exists, until it is destroyed.

When a thread of execution completes, it stores its exit info into the std::thread object that refers to it, if there is one (If it was detached, then there isn't one, so the exit info is just thrown away.) It has no other effect on the std::thread object -- in particular the std::thread object is not destroyed and continues to exist until someone else destroys it.

Why must one call join() or detach() before thread destruction?

Tags:

C++

Multithreading

Related

Recent Posts