Performance: memset
This is most likely due to lazy allocation in your VM subsystem. Typically when you allocate a large amount of memory only the first N pages are actually allocated and wired to physical memory. When you access beyond these first N pages then page faults are generated and further pages are allocated and wired in on an "on demand" basis.
As to the second part of the question, I believe some VM implementations actually track zeroed pages and handle them specially. Try initialising DataSrc
to actual (e.g. random) values and repeat the test.
As others already pointed out, Linux uses an optimistic memory allocation strategy.
The difference between the first and the following memcpy
s is the initialization of DataDest
.
As you have already seen, when you eliminate memset(DataSrc, 0, N)
, the first memcpy
is even slower, because the pages for the source must be allocated as well. When you initialize both, DataSrc
and DataDest
, e.g.
memset(DataSrc, 0, N);
memset(DataDest, 0, N);
all memcpy
s will run with roughly the same speed.
For the second question: when you initialize the allocated memory with memset
all pages will be laid out consecutively. On the other side, when the memory is allocated as you copy, the source and destination pages will be allocated interleaved, which might make the difference.