Performance of pthread_mutex_lock/unlock
Your locks are probably too fine-grained. Of course, the optimal granularity may vary depending on workload.
You could use a single lock for the whole tree, and it may perform better. But, if you do lots of reading and relatively few insertions/deletions, you end up with the whole tree locked often for no good reason. You may want to use a reader-writer lock, which would allow several readers at the same time.
Your question reminded me of this other one, when there is a comparison between fine-grained locking and coarse-grained locking for a linked list. While in the coarse-grained version each thread run in turn (not in parallel), and the total running time was slightly more than the sum of each thread's running time, and in the fine-grained version total running time was much less than the sum of each thread's running time, the added overhead of fine-grained locking totally offset these benefit, making the fine-grained version slower than the coarse-grained one.
As far as I can see your lock strategy is not optimal since most of the locks will not be taken to change the data, but just to read and find the way through the tree.
pthread_rwlock_t
could help on this. You'd only take read-locks on the path down in the tree until you hit a node where you want to do some modification. There you would then take a write-lock. By that you could have other threads perform the same task when walking down the tree in a different branch without disturbing each other.
A decent implementation of pthread_rwlock_t
would do this with a counter for the readers that it changes with atomic operations, as long as there is no contention with writers. This should be very fast. Once there is contention, it would be as costly as a mutex, I think.
pthread_mutex_lock
and pthread_mutex_unlock
vary in cost depending on contention:
- Single thread use - either only one thread exists, or only one thread is using the mutex and the resource it protects: locking is virtually free, perhaps 80-100 cycles at most.
- Multiple threads using the resource, but locks are held for very short intervals and contention is rare: locking has some cost, and it's hard to measure; the cost consists mostly of invalidating other cores'/cpus' cache lines.
- Significant lock contention: nearly every lock and unlock operation will require assistance from the kernel, and the cost is easily several thousand (possibly even tens of thousand) cycles per lock/unlock.
Still, mutexes should be the least expensive locking primitive in most situations and on most implementations. Occasionally spinlocks may perform better. I would never expect semaphores to perform better.
Instead of worrying about the blades of grass, step back and observe the whole forest.
Any algorithm which depends on two threads potentially closely stepping on each each other's toes is inherently inefficient. Try to find a way to drastically reduce the need for interaction.
For example, if one thread produces data and the other consumes it, one can easily think up an inefficient algorithm where the producer publishes the data in shared memory and then waits for the other to consume it. Meanwhile the consumer is waiting for the producer to finish, etc., etc. This is all much simplified by the producer writing into a file or pipe, and the consumer reading from it.