What is the ideal memory block size to use when copying?
A block between 4096 and 32KB is the typical choice. Using 100MB is counter-productive. You are occupying RAM with the buffer that can be put to much better use as the file system writeback cache.
Copying files is very fast when the file fits completely in the cache, the WriteFile() call is a simple memory-to-memory copy. The cache manager then lazily writes it out to the disk. But when there's no more room in the cache, the copy speed drops off a cliff when WriteFile() has to wait for space to be made available. It now goes at disk write speeds.
I would recommend you to benchmark this, and remember to include much smaller block sizes. In my own tests on this, I got quite counterintuitive results.
When reading and writing from the hard drive, all (power of two) block sizes between 512 byte and 512 kB gave the same speed. Increasing the block size from 512 kB to 1 MB reduced the copying speed to about 60%. Increasing the block size further increased the speed again, but never all the way back to the speed of using small blocks.
When all the copied data was in the cache memory, the (much faster) copying speed improved with increasing block sizes, flattening out around reaching 32 kB blocks, and then suddenly dropped to about half the previous speed when going from 256 kB to 512 kB blocks, never to return to the previous speeds.
After this test, I dropped read/write block sizes in several of my programs from around 1 MB to 32 kB.
There's generally little benefit in using blocks that large.
Suppose your operating system is super-naive and every read or write operation incurs a hard disc seek (in practice you will often find that writes get queued and reads get read-ahead-buffered, reducing the benefit of using large buffers in your application code).
Then every block costs you (say) 2x10ms for two seeks (one to read and one to write) and there's little point increasing your block size once the time for the actual reading and writing is substantially more than that. A really fast HD might read and write at 150MB/s, in which case that 10ms would correspond to 1.5MB of reading/writing, and you'd be gaining little for blocksizes beyond 15MB.
In practice, (1) your seek time will probably be less, (2) your read and write bandwidth will probably be more, and (3) your OS and drive hardware will probably be cacheing and queueing things for you; you'll probably see little or no benefit from blocksizes above about 100KB.
(You should probably benchmark a variety of blocksizes and see what you get on your own system.)