How does CUDA assign device IDs to GPUs?
CUDA picks the fastest device as device 0. So when you swap GPUs in and out the ordering might change completely. It might be better to pick GPUs based on their PCI bus id using:
cudaError_t cudaDeviceGetByPCIBusId ( int* device, char* pciBusId )
Returns a handle to a compute device.
cudaError_t cudaDeviceGetPCIBusId ( char* pciBusId, int len, int device )
Returns a PCI Bus Id string for the device.
or CUDA Driver API cuDeviceGetByPCIBusId
cuDeviceGetPCIBusId
.
But IMO the most reliable way to know which device is which would be to use NVML or nvidia-smi to get each device's unique identifier (UUID) using nvmlDeviceGetUUID
and then match it do CUDA device with pciBusId using nvmlDeviceGetPciInfo
.
Set the environment variable CUDA_DEVICE_ORDER
as:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
Then the GPU IDs will be ordered by pci bus IDs.