Cuda Hello World printf not working even with -arch=sm_20

In kernel printf is only supported in compute capability 2 or higher hardware. Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object. It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf call is unsupported for that architecture.

If you remove the compute capability 1.0 build target from your project, the error will disappear.

You could alternatively, write the kernel like this:

__global__ void test()
{
#if __CUDA_ARCH__ >= 200
    printf("Hi Cuda World");
#endif
}

The __CUDA_ARCH__ symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error.

When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer. To do this add a synchronizing call after the kernel launch in the host code

for example:

int main( int argc, char** argv )
{

    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

[disclaimer: all code written in browser, never compiled, use at own risk]

If you do both things, you should be able to compile, run and see output.


Just use cudaDeviceSynchronize(). As a supplement to @Tomasz's answer.

Devices with compute capability 2.x or higher support calls to printf from within a CUDA kernel.

printf output is stored in a circular buffer of a fixed size. And this buffer is flushed only for:

  • the start of a kernel launch
  • synchronization (e.g. cudaDeviceSynchronize())
  • blocking memory copies (e.g. cudaMemcpy(...))
  • module load/unload
  • context destruction

So the most simple "Hello world" example:

#include <stdio.h>

__global__ void hello() {
    printf("Hello from GPU);
}

int main() {
    hello<<<1, 1>>>();
    cudaDeviceSynchronize();
}

Reference:

  • cmu15418

  • Nvidia CUDA Toolkit Document


If you're using printf in kernel, you should use cudaDeviceSynchronize():

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){
    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

Tags:

C++

Cuda