Is it possible to make thread join to 'parallel for' region after its job?
What about something like this?
#pragma omp parallel
{
// note the nowait here so that other threads jump directly to the for loop
#pragma omp single nowait
{
job2();
}
#pragma omp for schedule(dynamic, 32)
for (int i = 0 ; i < 10000000; ++i) {
job1();
}
}
I did not test this but the single will be executed by only one threads while all others will jump directly to the for loop thanks to the nowait. Also I think it is easier to read than with sections.
Another way (and potentially the better way) to express this would be to use OpenMP tasks:
#pragma omp parallel master
{
#pragma omp task // job(2)
{ // 'printf' is not real job. It is just used for simplicity.
printf("i'm single: %d\n", omp_get_thread_num());
}
#pragma omp taskloop // job(1)
for (int i = 0 ; i < 10000000; ++i) {
// 'printf' is not real job. It is just used for simplicity.
printf("%d\n", omp_get_thread_num());
}
}
If you have a compiler that does not understand OpenMP version 5.0, then you have to split the parallel
and master
:
#pragma omp parallel
#pragma omp master
{
#pragma omp task // job(2)
{ // 'printf' is not real job. It is just used for simplicity.
printf("i'm single: %d\n", omp_get_thread_num());
}
#pragma omp taskloop ]
for (int i = 0 ; i < 10000000; ++i) {
// 'printf' is not real job. It is just used for simplicity.
printf("%d\n", omp_get_thread_num());
}
}
The problem comes from synchronization. At the end of the section
, omp waits for the termination of all threads and cannot release the thread on job 2 until its completion has been checked.
The solution requires to suppress the synchronization with a nowait
.
I did not succeed to suppress synchronization with sections
and nested parallelism. I rarely use nested parallel regions, but I think that, while sections can be nowaited, there is a problem when spawning the new nested parallel region inside a section. There is a mandatory synchronization at the end of a parallel section that cannot be suppressed and it probably prevents new threads to join the pool.
What I did is to use a single
thread, without synchronization. This way, omp start the single
thread and does not wait for its completion to start the parallel for
. When the thread finishes its single
work, it joins the thread pool to finish processing the for
.
#include <omp.h>
#include <stdio.h>
int main() {
int singlethreadid=-1;
// omp_set_nested(1);
#pragma omp parallel
{
#pragma omp single nowait // job(2)
{ // 'printf' is not real job. It is just used for simplicity.
printf("i'm single: %d\n", omp_get_thread_num());
singlethreadid=omp_get_thread_num();
}
#pragma omp for schedule(dynamic, 32)
for (int i = 0 ; i < 100000; ++i) {
// 'printf' is not real job. It is just used for simplicity.
printf("%d\n", omp_get_thread_num());
if (omp_get_thread_num() == singlethreadid)
printf("Hello, I\'m back\n");
}
}
}