efficiency of fwrite for massive numbers of small writes

First of all, fwrite() is a library and not a system call. Secondly, it already buffers the data.

You might want to experiment with increasing the size of the buffer. This is done by using setvbuf(). On my system this only helps a tiny bit, but YMMV.

If setvbuf() does not help, you could do your own buffering and only call fwrite() once you've accumulated enough data. This involves more work, but will almost certainly speed up the writing as your own buffering can be made much more lightweight that fwrite()'s.

edit: If anyone tells you that it's the sheer number of fwrite() calls that is the problem, demand to see evidence. Better still, do your own performance tests. On my computer, 500,000,000 two-byte writes using fwrite() take 11 seconds. This equates to throughput of about 90MB/s.

Last but not least, the huge discrepancy between 11 seconds in my test and one hour mentioned in your question hints at the possibility that there's something else going on in your code that's causing the very poor performance.


your problem is not the buffering for fwrite(), but the total overhead of making the library call with small amounts of data. if you write just 1MB of data, you make 250000 function calls. you'd better try to collect your data in memory and then write to the disk with one single call to fwrite().

UPDATE: if you need an evidence:

$ dd if=/dev/zero of=/dev/null count=50000000 bs=2
50000000+0 records in
50000000+0 records out
100000000 bytes (100 MB) copied, 55.3583 s, 1.8 MB/s
$ dd if=/dev/zero of=/dev/null count=50 bs=2000000
50+0 records in
50+0 records out
100000000 bytes (100 MB) copied, 0.0122651 s, 8.2 GB/s