CUDA: How to assert in kernel code?
For devices of cc 2.x or above, assertion , void assert(int expression)
, could be used within a kernel such that threads with expression == 0
send a message to stderr once a host synchronization function is called.
For other cases or when assertion cannot be used (e.g. on MacOS), you won't be able to return an error message or error code to the host from a kernel.
Instead, I would set a error state and check it from the host. Use device global memory or (better) mapped host memory for storing an error state, passed as a parameter to each kernel call. Use if statements in the kernel, and of if the statements fail, set the error code and return. You will be able to check the error code from the host after the kernel call, but keep in mind that you will have synchronize the host and device after the kernel launch before checking the error code. I guess this will work fine for development but not so much for production.
As to printing an error message straight from the device
- In 1.x, 2.x, and 3.0 cards, you can use emulation mode to print an error message.
- In 3.1 forward (on fermi), apparently you can use printf in kernels to print the error message. It appears that it doesn't always work right away, e.g. http://forums.nvidia.com/index.php?showtopic=182448
You may find this helpful:
Using assert within kernel invocation
Alternatively you can catch cudaError using cudaThreadSynchronize() which gives you one of about 40 different reasons for kernel returning an error. But mostly you can check those conditions using if/else commands in the kernel.
I would like to point out that an assert may occur in one thread only, but if you decide to early terminate that thread its absense may cause other bugs (and probably other asserts) happening later; possibly leading to a complete kernel crash and loose of all information on the GPU.
Also, the answer given at " Using assert within kernel invocation " will work only if the assert is used directly in the __ global__ function and not deeper, somewhere inside __ device__ function.
My suggestion is, that even an assert fails, you proceed normally with your code, but leave an error message. You can use mapped, pinned memory (you map host RAM memory into GPU address space) to store error codes/messages. That way, even if your kernel crashes and GPU is reset, you are likely to obtain valuable information in that mapped memory. If I am not mistaken, mapped, pinned memory is supported by almost all devices of Compute Capability 1.1 and higher.