What kind of Garbage Collection does Go use?
This is the implementation of the GC:
https://github.com/golang/go/blob/master/src/runtime/mgc.go
From the docs in the source:
The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is non-generational and non-compacting. Allocation is done using size segregated per P allocation areas to minimize fragmentation while eliminating locks in the common case.
Go 1.8 GC might evolve again, with the proposal "Eliminate STW stack re-scanning"
As of Go 1.7, the one remaining source of unbounded and potentially non-trivial stop-the-world (STW) time is stack re-scanning.
We propose to eliminate the need for stack re-scanning by switching to a hybrid write barrier that combines a Yuasa-style deletion write barrier [Yuasa '90] and a Dijkstra-style insertion write barrier [Dijkstra '78].
Preliminary experiments show that this can reduce worst-case STW time to under 50µs, and this approach may make it practical to eliminate STW mark termination altogether.
The announcement is here and you can see the relevant source commit is d70b0fe and earlier.
Plans for Go 1.4+ garbage collector:
- hybrid stop-the-world/concurrent collector
- stop-the-world part limited by a 10ms deadline
- CPU cores dedicated to running the concurrent collector
- tri-color mark-and-sweep algorithm
- non-generational
- non-compacting
- fully precise
- incurs a small cost if the program is moving pointers around
- lower latency, but most likely also lower throughput, than Go 1.3 GC
Go 1.3 garbage collector updates on top of Go 1.1:
- concurrent sweep (results in smaller pause times)
- fully precise
Go 1.1 garbage collector:
- mark-and-sweep (parallel implementation)
- non-generational
- non-compacting
- mostly precise (except stack frames)
- stop-the-world
- bitmap-based representation
- zero-cost when the program is not allocating memory (that is: shuffling pointers around is as fast as in C, although in practice this runs somewhat slower than C because the Go compiler is not as advanced as C compilers such as GCC)
- supports finalizers on objects
- there is no support for weak references
Go 1.0 garbage collector:
- same as Go 1.1, but instead of being mostly precise the garbage collector is conservative. The conservative GC is able to ignore objects such as []byte.
Replacing the GC with a different one is controversial, for example:
- except for very large heaps, it is unclear whether a generational GC would be faster overall
- package "unsafe" makes it hard to implement fully precise GC and compacting GC
(For Go 1.8 - Q1 2017, see below)
The next Go 1.5 concurrent Garbage Collector involve being able to "pace" said gc.
Here is a proposal presented in this paper which might make it for Go 1.5, but also helps understand the gc in Go.
You can see the state before 1.5 (Stop The World: STW)
Prior to Go 1.5, Go has used a parallel stop-the-world (STW) collector.
While STW collection has many downsides, it does at least have predictable and controllable heap growth behavior.
(Photo from GopherCon 2015 presentation "Go GC: Solving the Latency Problem in Go 1.5")
The sole tuning knob for the STW collector was “GOGC”, the relative heap growth between collections. The default setting, 100%, triggered garbage collection every time the heap size doubled over the live heap size as of the previous collection:
GC timing in the STW collector.
Go 1.5 introduces a concurrent collector.
This has many advantages over STW collection, but it makes heap growth harder to control because the application can allocate memory while the garbage collector is running.
(Photo from GopherCon 2015 presentation "Go GC: Solving the Latency Problem in Go 1.5")
To achieve the same heap growth limit the runtime must start garbage collection earlier, but how much earlier depends on many variables, many of which cannot be predicted.
- Start the collector too early, and the application will perform too many garbage collections, wasting CPU resources.
- Start the collector too late, and the application will exceed the desired maximum heap growth.
Achieving the right balance without sacrificing concurrency requires carefully pacing the garbage collector.
GC pacing aims to optimize along two dimensions: heap growth, and CPU utilized by the garbage collector.
The design of GC pacing consists of four components:
- an estimator for the amount of scanning work a GC cycle will require,
- a mechanism for mutators to perform the estimated amount of scanning work by the time heap allocation reaches the heap goal,
- a scheduler for background scanning when mutator assists underutilize the CPU budget, and
- a proportional controller for the GC trigger.
The design balances two different views of time: CPU time and heap time.
- CPU time is like standard wall clock time, but passes
GOMAXPROCS
times faster.
That is, ifGOMAXPROCS
is 8, then eight CPU seconds pass every wall second and GC gets two seconds of CPU time every wall second.
The CPU scheduler manages CPU time.- The passage of heap time is measured in bytes and moves forward as mutators allocate.
The relationship between heap time and wall time depends on the allocation rate and can change constantly.
Mutator assists manage the passage of heap time, ensuring the estimated scan work has been completed by the time the heap reaches the goal size.
Finally, the trigger controller creates a feedback loop that ties these two views of time together, optimizing for both heap time and CPU time goals.