Cuda - copy from device global memory to texture memory

The naming of the various cudaMemcpy routines used to be somewhat convoluted when this question was first asked, but has been cleaned up by Nvidia since.

For operating on a 3D array you need to use cudaMemcpy3D() which (between others) has the ability to copy from 3D data in linear memory into a 3D array.
cudaMemcpyToArray() used to be the function required for copying linear data to a 2D array, but has been deprecated in favor of the more consistently named cudaMemcpy2D().

If you are using a device of compute capability 2.0 or higher, you however don't want to use any of the cudaMemcpy*() functions. Instead use a surface which allows you to directly write to the texture without the need for any data copying between the kernels. (You still need to separate reading and writing into two different kernels though just as you do now, as the texture cache is not coherent with surface writes and is only invalidated on kernel launch).


cudaMemcpyToArray() accepts cudaMemcpyDeviceToDevice as its kind parameter, so it should be possible.