CUDA image processing
This is an extended comment on code style, rather than on your problem. Unfortunately I don't have a CUDA-capable card, so I can't test your actual issue.
Generally speaking, however, you will want to try and get away from procedural loops when writing Mathematica code:
- Alternatives to procedural loops and iterating over lists in Mathematica
- Why should I avoid the For loop in Mathematica?
For instance in your code, instead of:
Do[imagevar[count++] = Import[image], {image, images}];
you could use:
imagevar = Import/@images;
Note that this makes imagevar
a list, rather than an indexed variable, but that will actually be helpful in this case, since you can carry out "vector" operations on lists quickly and cleanly. See e.g. Map
, which is used above in its "shortcut" (infix) form /@
.
Similarly, now that imagevar
is a list, your code
For[i = 1, i < Length[images], i++, CUDAImageConvolve[imagevar[i], {{-1, -2, 3}}]]
can be converted to the following:
CUDAImageConvolve[#, {{-1, -2, 3}}]& /@ imagevar;
If you want more information on using pure functions (i.e. the #
and &
expressions), take a look at How to | Work with Pure Functions.
Timing might also be best done using AbsoluteTiming
; see this section of the Date and Time Functions Tutorial. For instance, the following will directly return the time it took to carry out one instance of that operation:
AbsoluteTiming[CUDAImageConvolve[#, {{-1, -2, 3}}]& /@ imagevar;]
In this context, you might also be interested in RepeatedTiming
for more accurate timing results.
Similarly with the last bit, to give you timing of the non-CUDA operation:
AbsoluteTiming[ImageConvolve[#, {{-1, -2, 3}}]& /@ imagevar;]
CUDAImageConvolve
is slow due to memory tranfert between the different memories (CPU and GPU). One can decompose the action of CUDAImageConvolve
in 6 phases and get de timing of each phase.
Here are the results, compared with the CPU ImageConvolve
and the straight CUDAImageConvolve
:
One can see that the convolution alone takes only 16 mS.
This information can be usefull if you have a sequence of operations to do without leaving the GPU memory.
Here is the code :
Needs["CUDALink`"]
CUDAQ[]
imagevar = ExampleData /@ {{"TestImage","House2"},
{"TestImage","Lena"},{"TestImage","Mandrill"}}
AbsoluteTiming[ImageConvolve[#, {{-1, -2, 3}}]& /@ imagevar]
AbsoluteTiming[CUDAImageConvolve[#, {{-1, -2, 3}}]& /@ imagevar]
Initialization :
AbsoluteTiming[memList=CUDAMemoryLoad[ImageData[#]]& /@ imagevar;]
AbsoluteTiming[CUDAMemoryCopyToDevice /@ memList;]
(*CUDAMemoryInformation[memList[[1]]]*)
AbsoluteTiming[outMemList=CUDAMemoryAllocate["Double",Append[ImageDimensions[#],3]]& /@ imagevar;]
{0.0467458, Null}
{0.0143745, Null}
{0.000671206, Null}
convolution :
AbsoluteTiming[(CUDAImageConvolve[#[[1]], {{-1, -2, 3}},"OutputMemory"-> #[[2]]])& /@ Transpose[{memList,outMemList}];]
{0.0151647, Null}
restitution :
AbsoluteTiming[Image[CUDAMemoryGet[#]]& /@ outMemList]
AbsoluteTiming[CUDAMemoryUnload /@ Flatten[{memList,outMemList}];]
NOTE :
CUDAMemoryLoad[ImageData[#]]& /@ imagevar
is used instead of CUDAMemoryLoad /@ imagevar
because CUDAMemoryLoad[image]
create a 2D array of {Byte,Byte,Byte} even if the image is a 2D array of {Float,Float,Float}. The consequence is that the convolution creates a overflow on the Bytes, and the resulting image is then :