printing from cuda kernels
To enable use of plain printf()
on devices of Compute Capability >= 2.0, it's important to compile for CC of at least CC 2.0 and disable the default, which includes a build for CC 1.0.
Right-click the .cu
file in your project, select Properties
, select Configuration Properties
| CUDA C/C++
| Device
. Click on the Code Generation
line, click the triangle, select Edit
. In the Code Generation dialog box, uncheck Inherit from parent or project defaults
, type compute_20,sm_20
in the top window, click OK.
I am using GTX 1650 also GTX1050, and c++11. For recent users, this is my suggestion:
In host function:
#include<iostream>
using namespace std;
cout<< .....(anything you want) << endl;
In kernel:
if(threadIdx.x==0){
printf("ss=%4.2f \n", ss);
}
Note that this "if" is quite important and I notice nobody mentioned this. Because you might use a lot of threads and you definitely do not want to print too much from every threads. Also 4.2f means 4 points and 2 for decimal. This can prevent print too much 00000. Also do not forget \n to jump line.
Also you can consider this to print shared memory value:
if(threadIdx.x==0){
for(int i=0;i<64;i++){
for(int j=0;j<8; j++){
printf("%4.2f ", ashare[i*8+j]);
}
printf("\n");
}
printf("\n");
}
This can print shared memory beautifully. Notice also need to restrict only in threadIdx.x==0
you can write this code to print whatever you want from inside the CUDA Kernel:
# if __CUDA_ARCH__>=200
printf("%d \n", tid);
#endif
and include < stdio.h >
One way of solving this problem is by using cuPrintf function which is capable of printing from the kernels. Copy the files cuPrintf.cu
and cuPrintf.cuh
from the folder
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\src\simplePrintf
to the project folder. Then add the header file cuPrintf.cuh
to your project and add
#include "cuPrintf.cu"
to your code. Then your code should be written in a format mentioned below :
#include "cuPrintf.cu"
__global__ void testKernel(int val)
{
cuPrintf("Value is: %d\n", val);
}
int main()
{
cudaPrintfInit();
testKernel<<< 2, 3 >>>(10);
cudaPrintfDisplay(stdout, true);
cudaPrintfEnd();
return 0;
}
By following the above procedure one can get a print on the console window from the device function.
Though I solved my issues in the above mentioned way I still don't have the solution of using printf
from the device function. If it is true and absolutely necessary to upgrade my nvcc compiler from sm_10 to sm_21 to enable the printf
feature then it would be very much helpful if someone could show me the light. Thanks for all your cooperation