Use of cudamalloc(). Why the double pointer?
All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that's why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, you need to pass a double-pointer.
Another well-known function which operates on addresses for the same reason is the scanf
function. How many times have you forgotten to write this &
before the variable that you want to store the value to? ;)
int i;
scanf("%d",&i);
In C/C++, you can allocate a block of memory dynamically at runtime by calling the malloc
function.
int * h_array;
h_array = malloc(sizeof(int));
The malloc
function returns the address of the allocated memory block which can be stored in a variable of some kind of pointer.
Memory allocation in CUDA is a bit different in two ways,
- The
cudamalloc
return an integer as error code instead of a pointer to the memory block. In addition to the byte size to be allocated,
cudamalloc
also requires a double void pointer as its first parameter.int * d_array cudamalloc((void **) &d_array, sizeof(int))
The reason behind the first difference is that all CUDA API function follows the convention of returning an integer error code. So to make things consistent, cudamalloc
API also returns an integer.
There requirements for a double pointer as the function first argument can be understood in two steps.
Firstly, since we have already decided to make the cudamalloc return an integer value, we can no longer use it to return the address of the allocated memory. In C, the only other way for a function to communicate is by passing the pointer or address to the function. The function can make changes to the value stored at the address or the address where the pointer is pointing. The changes to those value can be later retrieved outside the function scope by using the same memory address.
how the double pointer works
The following diagram illustrated how it works with the double pointer.
int cudamalloc((void **) &d_array, int type_size) {
*d_array = malloc(type_size);
return return_code;
}
Why do we need the double pointer? Why this does work
I normally live the python world so I also struggled to understand why this will not work.
int cudamalloc((void *) d_array, int type_size) {
d_array = malloc(type_size);
...
return error_status;
}
So why it doesn't work? Because in C, when cudamalloc
is called, a local variable named d_array is created and assigned with the value of the first function argument. There is no way we can retrieve the value in that local variable outside the function's scope. That why we need to a pointer to a pointer here.
int cudamalloc((void *) d_array, int type_size) {
*d_array = malloc(type_size);
...
return return_code;
}
This is simply a horrible, horrible API design. The problem with passing double-pointers for an allocation function that obtains abstract (void *
) memory is that you have to make a temporary variable of type void *
to hold the result, then assign it into the real pointer of the correct type you want to use. Casting, as in (void**)&device_array
, is invalid C and results in undefined behavior. You should simply write a wrapper function that behaves like normal malloc
and returns a pointer, as in:
void *fixed_cudaMalloc(size_t len)
{
void *p;
if (cudaMalloc(&p, len) == success_code) return p;
return 0;
}