Large Arrays, and LOH Fragmentation. What is the accepted convention?

The first thing that comes to mind is to split the array up into smaller ones, so they don't reach the memory needed for the GC to put in it the LOH. You could spit the arrays into smaller ones of say 10,000, and build an object which would know which array to look in based on the indexer you pass.

Now I haven't seen the code, but I would also question why you need an array that large. I would potentially look at refactoring the code so all of that information doesn't need to be stored in memory at once.


Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.

The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.

Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).

Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.

Strategies to avoid LOH fragmentation are:

  • Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
  • Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
  • If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
  • Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
  • Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
  • Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
  • Consider pooling large arrays.

Edit: the LOH threshold for double arrays appears to be 8k.


It's an old question, but I figure it doesn't hurt to update answers with changes introduced in .NET. It is now possible to defragment the Large Object Heap. Clearly the first choice should be to make sure the best design choices were made, but it is nice to have this option now.

https://msdn.microsoft.com/en-us/library/xe0c2357(v=vs.110).aspx

"Starting with the .NET Framework 4.5.1, you can compact the large object heap (LOH) by setting the GCSettings.LargeObjectHeapCompactionMode property to GCLargeObjectHeapCompactionMode.CompactOnce before calling the Collect method, as the following example illustrates."

GCSettings can be found in the System.Runtime namespace

GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();