Serious Memory Leak When Iteratively Parsing XML Files
From the XML
package's webpage, it seems that the author, Duncan Temple Lang, has quite extensively described certain memory management issues. See this page: "Memory Management in the XML Package".
Honestly, I'm not proficient in the details of what's going on here with your code and the package, but I think you'll either find the answer in that page, specifically in the section called "Problems", or in direct communication with Duncan Temple Lang.
Update 1. An idea that might work is to use the multicore
and foreach
packages (i.e. listResults = foreach(ix = 1:N) %dopar% {your processing;return(listElement)}
. I think that for Windows you'll need doSMP
, or maybe doRedis
; under Linux, I use doMC
. In any case, by parallelizing the loading, you'll get faster throughput. The reason I think you may get some benefit from memory usage is that it could be that forking R, could lead to different memory cleaning, as each spawned process gets killed when complete. This isn't guaranteed to work, but it could address both memory and speed issues.
Note, though: doSMP
has its own idiosyncracies (i.e. you may still have some memory issues with it). There have been other Q&As on SO that mentioned some issues, but I'd still give it a shot.