How to profile OpenMP bottlenecks

Scalasca is a nice tool for profiling OpenMP (and MPI) codes and analyzing the results. Tau is also very nice but much harder to use. The intel tools, like the vtune, are also good but very expensive.


Arm MAP has OpenMP and pthreads profiling - and works without needing to instrument or modify your source code. You can see synchronization issues and where threads are spending time to the source line level. The OpenMP profiling blog entry is worth reading.

MAP is widely used for high performance computing as it is also profiles multiprocess applications such as MPI.


OpenMP includes the functions omp_get_wtime() and omp_get_wtick() for measuring timing performance (docs here), I would recommend using these.

Otherwise try a profiler. I prefer the google CPU profiler which can be found here.

There is also the manual way described in this answer.