Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
You seem to be asking two things:
- have GC's improved since that research was performed, and
- can I use the conclusions of the paper as a formula to predict required memory.
The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:
- GC'ed memory management still requires significantly more virtual memory.
- If you try to constrain the heap size the GC performance drops significantly.
- If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.
However, the conclusions cannot really be used as a formula:
- The original study was done with JikesRVM rather than a Sun JVM.
- The Sun JVM's garbage collectors have improved in the ~5 years since the study.
- The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.
On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.
These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:
- Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
- The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the
substring
operation less memory intensive. - The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.
1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.
if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)
Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.