How can I run 4 threads each on a different core (parallelism)?
There is no standard way to set affinity of given thread, under the hood std::thread is implemented using posix threads on linux/unixes and with windows threads under Windows. The solution is to use native apis, for example under windows following code will cause full utilization of all the 8 cores of my i7 CPU:
auto fn = []() {while (true);};
std::vector<std::thread> at;
const int num_of_cores = 8;
for (int n = 0; n < num_of_cores; n++) {
at.push_back(std::thread(fn));
// for POSIX: use pthread_setaffinity_np
BOOL res = SetThreadAffinityMask(at.back().native_handle(), 1u << n);
assert(res);
}
for (auto& t : at) t.join();
but after commenting out SetThreadAffinityMask
I still get the same results,all the cores are fully utilized, so Windows scheduler does a good job.
If you want to have a better control of the system cores look into libraries like OpenMP, TBB (Thread Building Blocks), PPL. In this order.
You're done, no need to schedule anything. As long as there are multiple processors available, your threads will run simultaneously on available cores.
If there are less than 4 processors available, say 2, your threads will run in an interleaved manner, with up to 2 running at any given time.
p.s. it's also easy to experience it for yourself - just make 4 infinite loops and run them in 4 different threads. You will see 4 CPUs being used.
DISCLAIMER: Of course, "under the hood", scheduling is being done for you by the OS. So you depend on the quality of the scheduler built into the OS for concurrency. The fairness of the scheduler built into the OS on which a C++ application runs is outside the C++ standard, and so is not guaranteed. In reality though, especially when learning to write concurrent applications, most modern OSes will provide adequate fairness in the scheduling of threads.