Does OpenCL always zero-initialize device memory?

No it doesn't. For instance I had this small kernel to test atomic add:

kernel void atomicAdd(volatile global int *result){
    atomic_add(&result[0], 1);
}

Calling it with this host code (pyopencl + unittest):

def test_atomic_add(self):
    NDRange = (4, 4)
    result = np.zeros(1, dtype=np.int32)        
    out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY, size=result.nbytes)
    self.prog.atomicAdd(self.queue, NDRange, NDRange, out_buf)
    cl.enqueue_copy(self.queue, result, out_buf).wait()
    self.assertEqual(result, 16)

was always returning the correct value when using my CPU. However on a ATI HD 5450 the returned value was always junk.

And If I well recall, on an NVIDIA the first run was returning the correct value, i.e. 16, but for the following run, the values were 32, 48, etc. It was reusing the same location with the old value still stored there.

When I corrected my host code with this line (copying the 0 value to the buffer):

out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY | self.mf.COPY_HOST_PTR, hostbuf=result)

Everything worked fine on any devices.

As far as I know there is no sentence in standard that states this. Maybe some driver implementations will do this automatically, but you shoudn't rely on it.

I remember that once I had a case where a buffer was not initialized to 0, but I can't remember the settings of "OS + driver".

Probably what is going on is that the typical OS does not use even 1% of now a days devices memory. So when you start a OpenCL, there is a huge probability that you will fall into an empty zone.

Does OpenCL always zero-initialize device memory?

Tags:

Opencl

Related

Recent Posts