Learning garbage collection theory
There is a whole book on garbage collection, and a quite good one, if I may add:
Richard Jones & Rafael Lins, Garbage Collection, Wiley and Sons (1996), ISBN 0471941484
Richard Jones also maintains a nice site collecting garbage collection resources.
Most early garbage collection papers are eminently readable. You could start with Paul Wilson's survey of "Uniprocessor Garbage Collection Techniques" (1992, LNCS vol. 637) and then dive into the original literature on topics that sound interesting.
I want to learn the theory behind garbage collection. How do i go about it?
I am also a dabbler interested in garbage collection (to the point that I wrote my own garbage collected VM called HLVM). I learned by reading as many research papers on garbage collection as I could get my hands on and by playing with the ideas myself, both raw in my virtual machine and also by writing memory-safe high-level simulations.
The obvious answer is - a compiler textbook... The question is, is it necessary to learn lexical analysis, parsing and other stuff that usually precedes garbage collection in a text?
The lexical analysis, parsing and other stuff is not relevant to garbage collection. You might get an outdated cursory overview of garbage collection from a compiler book but you need to read research papers to get an up-to-date view, e.g. with regard to multicore.
In short, what are the prerequisites to learning about Garbage collection theory?
You need to know about basic graph theory, pointers, stacks, threads and (if you're interested in multi-threading) low-level concurrency primitives such as locks.
Garbage collection is all about determining reachability. When a program can no longer obtain a reference to a value because that value has become unreachable then the GC can recycle the memory that value is occupying. Reachability is determined by traversing the heap starting from a set of "global roots" (global variables and pointers on the thread's stacks and in the core's registers)
GC design has many facets but you might begin with the four main garbage collection algorithms:
- Mark-and-sweep (McCarthy, 1960)
- Mark-and-compact (Haddon and Waite, 1967)
- Stop-and-copy (Cheney, 1970)
- Mark-region (McKinley et al., 2007)
Perhaps the most notable evolution of these basic ideas is generational garbage collection, which was the defacto standard design for many years.
My personal feeling is that some of the obscure work on garbage collection conveys just as much useful information so I'd also highly recommend:
- Baker's treadmill (a beautiful real-time GC).
- VCGC (a completely different tri-color marking scheme).
You might also like to study the three kinds of write barrier (Dijkstra's, Steele's and Yuasa's) and look at the card marking and remembered set techniques.
Then you might also like to examine the actual design decisions some implementors chose for language implementations like Java and .NET as well as the SML/NJ compiler for Standard ML, the OCaml compiler, the Glasgow Haskell compiler and others. The differences between the techniques adopted are as big as the similarities between them!
There are also some great tangentially-related papers like Henderson's Accurate Garbage Collection in an Uncooperative Environment. I used that technique to avoid having to write a stack walker for HLVM.
The memorymanagement.org website is an invaluable resource, particularly the glossary of GC-related terms.
Read these papers in order. They are in progressive subject matter/difficulty order (not chronological).
List taken directly from Prof. Kathryn McKinley's Memory Management course page here, where you'll find links to all the articles.
I took the course last semester, so I read all these and I have to say I learned what I set out to learn!
Note that links to freely-available copies of most of the papers below are included in the garbage-collection tag wiki at https://stackoverflow.com/tags/garbage-collection/info.
- List processing in real time on a serial computer, Baker, CACM, 21(4) 280--294, 1978.
- A nonrecursive list compacting algorithm , Cheney, CACM, 13(11): 677--678, 1970.
- A Real-time garbage collector based on the lifetimes of objects, Lieberman & Hewitt, CACM, 26(6): 419--429, 1983.
- Generation scavenging: A non-disruptive high-performance storage reclamation algorithm, Ungar, Proceedings of the first ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, 1984, pages 157--167.
- Simple generational garbage collection and fast allocation, Appel, Software--Practice and Experience 19(2):171-183, February 1989.
- Age-Based Garbage Collection, D. Stefanovic, K. S. McKinley, J. E. B. Moss, ACM Conference on Object-Oriented Programming Systems, Languages and Applications. (OOPSLA), pp. 370--381. Denver CO, November 1999.
- Older-first Garbage Collection in Practice: Evaluation in a Java Virtual Machine, D. Stefanovic, M. Hertz, S. M. Blackburn, K. S. McKinley, and J. E. B. Moss, Memory System Performance, Berlin, Germany, pp. 175--184, June 2002.
- Beltway: Getting Around Garbage Collection Gridlock, S. M. Blackburn, R. Jones, K. S. McKinley, and J. E. B. Moss, ACM Conference on Programming Language Design and Implementation, Berlin, Germany, pp. 153--164, June 2002.
- An efficient machine-independent procedure for garbage collection in various list structures, Schorr & Waite, CACM, 10(8): 501--506, 1967.
- Comparison of compacting algorithms for garbage collection, Cohen & Nicolau, ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 5, Issue 4, pages 532--553, October 1983.
- MC2: High-Performance Garbage Collection for Memory-Constrained Environments, Sachindran, Berger & Moss, ACM Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 81-96, Vancouver, BC, October 2004.
- Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance, Blackburn & McKinley, ACM Conference on Programming Language Design and Implementation, pp.22--32, Tucson, AZ, June 2008.
- An Efficient Incremental Automatic Garbage Collector, Deutsch & Bobrow, CACM, 19(9): 522--526, September 1976.
- Ulterior Reference Counting: Fast Garbage Collection without the Wait, S. M. Blackburn and K. S. McKinley , Proceedings of the ACM 2003 SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 344-359, Annehiem, CA, October 2003.
- Cycle Tracing: Efficient Concurrent Mark-Sweep Cycle Collection, Frampton & Blackburn, 2009. (In submission to ISMM.)
- Multiprocessing compactifying garbage collection, Guy L. Steele, Jr., CACM 18(9): 495-508, 1975.
- On-the-fly garbage collection: an exercise in cooperation, E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens, Communications of the ACM, 21(11):966--975, November 1978.
- Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, Vechev, Yahav, and Bacon, ACM Conference on Programming Language Design and Implementation, Ottawa, Ontario, pp. 341-353, 2006.
- A Real-time Garbage Collector with Low Overhead and Consistent Utilization, Bacon, Cheng, and Rajan, ACM Symposium on Principles of Programming Languages, New Orleans, Louisiana, pp. 285-298, 2003.
- Tax-and-spend: democratic scheduling for real-time garbage collection, Auerbach, Bacon, Cheng, Grove, Biron, Gracie, McCloskey, Micic, and Sciampacone, ACM International Conference On Embedded Software, Atlanta, GA, pp. 245-254, 2008.
- Garbage collection in an uncooperative environment, H. Boehm and M. Weiser, Software Practice and Experience, 18(9):807-820, 1988.
- Hoard: A Scalable Memory Allocator for Multithreaded Applications, E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson, The Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, pp. 117--128, November 2000.
- Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages,Jump & McKinley, In submission to ACM Transactions on Software Practice & Experience, 2009. (Abbreviated version appears in ACM Conference on Programming Languagesm, Nice, France, January 2009.)
- Leak Pruning, Bond & McKinley, ACM Conference on Architecture Support for Programming Languages and Operating Systems, Washington, DC, March 2009. (To appear.)
- Free-me: A Static Analysis for Individual Object Reclamation, Guyer & McKinley, ACM Conference on Programming Language Design and Implementation, Ottawa, Canada, pp. 364-375, June 2006.
- Garbage collection can be faster than stack allocation, Appel, Information Processing Letters 25(4):275-279, 17 June 1987.
- The Garbage Collection Advantage: Improving Program Locality Huang, Blackburn, McKinley, Moss, Wang, & Cheng, ACM Conference on Object-Oriented Programming Systems, Languages, & Applications, Vancouver, BC, pp. 69-80, October 2004.
- Demystifying Magic: High-level Low-level Programming, Daniel Frampton, Stephen M. Blackburn, Perry Cheng, Robin Garner, David P. Grove, J. Eliot B. Moss & Sergey I. Salishev. ACM International Conference on Virtual Execution Environments, Washington DC, March 2009. (To appear.)
- Myths and Realities: The Performance Impact of Garbage Collection, S. M. Blackburn, P. Cheng, and K. S. McKinley, ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, pp. 25--36, New York, NY, June 2004.
- A unified theory of garbage collection, Bacon, Cheng, & Rajan, ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, Vancouver, BC, Canada, pp. 50-68, 2004.