What is the displs argument in MPI_Scatterv?
Yes, displacements gives the root information the information as to which items to send to a particular task - the offset of the starting item. So in most simple cases (e.g., you'd use MPI_Scatter
but the counts don't evenly divide) this can be immediately calculated from counts information:
displs[0] = 0; // offsets into the global array
for (size_t i=1; i<comsize; i++)
displs[i] = displs[i-1] + counts[i-1];
But it doesn't need to be that way; the only restriction is that the data you're sending can't overlap. You could count from the back just as well:
displs[0] = globalsize - counts[0];
for (size_t i=1; i<comsize; i++)
displs[i] = displs[i-1] - counts[i];
or any arbitrary order would work as well.
And in general the calculations can be more complicated because the types of the send buffer and receive buffers have to be consistent but not necessarily the same - you often get this if you're sending multidimensional array slices, for instance.
As an example of the simple cases, the below does the forward and backward cases:
#include <iostream>
#include <vector>
#include "mpi.h"
int main(int argc, char **argv) {
const int root = 0; // the processor with the initial global data
size_t globalsize;
std::vector<char> global; // only root has this
const size_t localsize = 2; // most ranks will have 2 items; one will have localsize+1
char local[localsize+2]; // everyone has this
int mynum; // how many items
MPI_Init(&argc, &argv);
int comrank, comsize;
MPI_Comm_rank(MPI_COMM_WORLD, &comrank);
MPI_Comm_size(MPI_COMM_WORLD, &comsize);
// initialize global vector
if (comrank == root) {
globalsize = comsize*localsize + 1;
for (size_t i=0; i<globalsize; i++)
global.push_back('a'+i);
}
// initialize local
for (size_t i=0; i<localsize+1; i++)
local[i] = '-';
local[localsize+1] = '\0';
int counts[comsize]; // how many pieces of data everyone has
for (size_t i=0; i<comsize; i++)
counts[i] = localsize;
counts[comsize-1]++;
mynum = counts[comrank];
int displs[comsize];
if (comrank == 0)
std::cout << "In forward order" << std::endl;
displs[0] = 0; // offsets into the global array
for (size_t i=1; i<comsize; i++)
displs[i] = displs[i-1] + counts[i-1];
MPI_Scatterv(global.data(), counts, displs, MPI_CHAR, // For root: proc i gets counts[i] MPI_CHARAs from displs[i]
local, mynum, MPI_CHAR, // I'm receiving mynum MPI_CHARs into local */
root, MPI_COMM_WORLD); // Task (root, MPI_COMM_WORLD) is the root
local[mynum] = '\0';
std::cout << comrank << " " << local << std::endl;
std::cout.flush();
if (comrank == 0)
std::cout << "In reverse order" << std::endl;
displs[0] = globalsize - counts[0];
for (size_t i=1; i<comsize; i++)
displs[i] = displs[i-1] - counts[i];
MPI_Scatterv(global.data(), counts, displs, MPI_CHAR, // For root: proc i gets counts[i] MPI_CHARAs from displs[i]
local, mynum, MPI_CHAR, // I'm receiving mynum MPI_CHARs into local */
root, MPI_COMM_WORLD); // Task (root, MPI_COMM_WORLD) is the root
local[mynum] = '\0';
std::cout << comrank << " " << local << std::endl;
MPI_Finalize();
}
Running gives:
In forward order
0 ab
1 cd
2 ef
3 ghi
In reverse order
0 hi
1 fg
2 de
3 abc
Yes, your reasoning is correct - for contiguous data. The point of the displacements
parameter in MPI_Scatterv
is to also allow strided data, meaning that there are unused gaps of memory in the sendbuf
between the chunks.
Here is an example for contigous data. The official documentation actually contains good examples for strided data.