How can I get number of Cores in cuda device?
The cores per multiprocessor is the only "missing" piece of data. That data is not provided directly in the cudaDeviceProp
structure, but it can be inferred based on published data and more published data from the devProp.major
and devProp.minor
entries, which together make up the CUDA compute capability of the device.
Something like this should work:
#include "cuda_runtime_api.h"
// you must first call the cudaGetDeviceProperties() function, then pass
// the devProp structure returned to this function:
int getSPcores(cudaDeviceProp devProp)
{
int cores = 0;
int mp = devProp.multiProcessorCount;
switch (devProp.major){
case 2: // Fermi
if (devProp.minor == 1) cores = mp * 48;
else cores = mp * 32;
break;
case 3: // Kepler
cores = mp * 192;
break;
case 5: // Maxwell
cores = mp * 128;
break;
case 6: // Pascal
if ((devProp.minor == 1) || (devProp.minor == 2)) cores = mp * 128;
else if (devProp.minor == 0) cores = mp * 64;
else printf("Unknown device type\n");
break;
case 7: // Volta and Turing
if ((devProp.minor == 0) || (devProp.minor == 5)) cores = mp * 64;
else printf("Unknown device type\n");
break;
case 8: // Ampere
if (devProp.minor == 0) cores = mp * 64;
else if (devProp.minor == 6) cores = mp * 128;
else printf("Unknown device type\n");
break;
default:
printf("Unknown device type\n");
break;
}
return cores;
}
(coded in browser)
"cores" is a bit of a marketing term. The most common connotation in my opinion is to equate it with SP units in the SM. That is the meaning I have demonstrated here. I've also omitted cc 1.x devices from this, as those device types are no longer supported in CUDA 7.0 and CUDA 7.5
A pythonic version is here
In linux you can run the following command to get the number of CUDA cores:
nvidia-settings -q CUDACores -t
To get the output of this command in C, use the popen function.
As Vraj Pandya already said, there is a function (_ConvertSMVer2Cores
) in the Common/helper_cuda.h file on nvidia's cuda-samples github repository, which provides this functionality. You just need to multiply its result with the multiprocessor count from the GPU.
Just wanted to provide a current link.
#include <cuda.h>
#include <cuda_runtime.h>
#include <helper_cuda.h> // You need to place this file somewhere where it can be
// found by the linker.
// The file itself seems to also require the
// `helper_string.h` file (in the same folder as
// `helper_cuda.h`).
int deviceID;
cudaDeviceProp props;
cudaGetDevice(&deviceID);
cudaGetDeviceProperties(&props, deviceID);
int CUDACores = _ConvertSMVer2Cores(props.major, props.minor) * props.multiProcessorCount;