Go Internals - Garbage Collector

22/06/202612 min read

Lucas Lemos

Introduction

In Go Internals - Allocator we followed mallocgc hand out spans and size-class slots. When objects die, those slots do not go back through free(3) — the garbage collector decides what is still reachable and sweeps the rest back into allocator free lists.

That logic lives in src/runtime/mgc.go, mgcmark.go, mgcsweep.go, mbarrier.go, and mgcwork.go. Go 1.23 still uses a non-generational, non-compacting, concurrent mark-sweep collector with tri-color marking. The mutator keeps running during most of mark and all of sweep; short stop-the-world (STW) pauses bracket the parts where global invariants would break if goroutines kept storing pointers freely.

This article walks through that cycle — how roots become a marked graph, why write barriers exist, what the STW slices cost, and how GOGC sets the pace.

What the GC is responsible for

The allocator hands you memory. The GC answers one question: which heap objects are still reachable from roots? Everything else in a swept span becomes reusable slots.

Roots include:

goroutine stacks (the compiler emitted stack maps in Memory)
global variables
registers and spill slots while a G is scanned at a safe point

The GC walks pointer edges from those roots through objects in spans. It does not refcount — cycles would leak, and atomic inc/dec on every pointer write would fight the scheduler's throughput goals.

flowchart LR
  R["roots: stacks, globals"] --> A["heap object A"]
  A --> B["heap object B"]
  A --> C["heap object C"]
  B --> D["heap object D"]
  X["unreachable object X"] -.->|"never marked"| SW["swept, slot reused"]

Unreachable objects can sit in mapped memory until sweep runs. That is why heap RSS (Resident Set Size) can stay flat after a traffic spike even though your service logic "released" everything — the allocator article's span retention and this article's sweep timing are the same story from two angles.

Tri-color marking

During mark, every heap object is conceptually one of three colors:

Color	Meaning
White	Not reached yet — garbage if still white when mark finishes
Grey	Reached, but its outgoing pointers may not be scanned yet
Black	Reached and scanned — children are shaded

The invariant the collector maintains: no black object holds a pointer to a white object. If that broke, a live object could be mistaken for garbage.

At mark start, everything is white. Roots are shaded grey. Workers pull grey objects, scan their pointer fields, shade referents grey, and turn the scanned object black. When no grey objects remain, anything still white is unreachable.

stateDiagram-v2
  [*] --> White: heap at mark start
  White --> Grey: discovered from root or field
  Grey --> Black: pointer fields scanned
  White --> Black: noscan object marked without field scan

Noscan size classes from the allocator article skip field scanning — the mark bit is set and the object goes black immediately. That is a major win for []byte backing stores with no pointers.

Mark bits live in span metadata (mspan in mspan.go), not in object headers. The runtime tracks which spans are being marked and which have been swept.

One GC cycle: phases

A full cycle is easier to read as a timeline than as a single function name. Names below match comments and helpers in mgc.go.

sequenceDiagram
  participant M as mutator goroutines
  participant GC as GC workers
  Note over M,GC: sweep termination (brief STW)
  Note over M,GC: mark setup (STW) — enable barrier, scan roots
  M->>M: run with write barriers
  GC->>GC: concurrent mark
  M->>GC: mark assist on alloc
  Note over M,GC: mark termination (STW) — drain grey, flush workbufs
  GC->>GC: concurrent sweep spans
  M->>M: allocate into swept spans

Sweep termination (STW). Finish sweeping spans left over from the previous cycle so the heap is in a consistent state before a new mark.

Mark setup (STW). Turn on the write barrier, clear mark state for the new cycle, enqueue roots (stacks, globals), and spin up background mark workers. This is one of the STW slices you see in latency profiles when GODEBUG=gctrace=1 reports STW time.

Concurrent mark. Mutator goroutines run with write barriers enabled. Dedicated workers (gcBgMarkWorker in mgcmark.go) and mark assist on allocating goroutines drain the grey frontier until the work buffers empty.

Mark termination (STW). Stop the world again, finish any remaining mark work, flush per-P gcWork buffers, and verify accounting. After this point the set of black objects is the live heap.

Concurrent sweep. Unmarked slots in spans are linked back into free lists (mgcsweep.go). Allocation can proceed into spans that are swept or not yet swept — the allocator checks span state before handing out a slot.

Go does not compact the heap. Live objects stay where they were allocated; sweep only recycles gaps inside spans.

Write barriers: keeping the tri-color invariant

While mark runs concurrently with your code, a goroutine could store a pointer to a white object into a black object after the scanner already passed that field. Without help, that white object would never turn grey — a leak in the correctness sense (premature reclaim), not a language-level memory leak.

Go uses a hybrid write barrier (combining ideas from Dijkstra and Yuasa), enabled for the whole concurrent mark phase. Compiler-inserted barrier calls sit on pointer writes — not on every field write, only where a pointer is stored into heap memory.

Rough effect on a pointer store *slot = ptr during mark:

Shade the previous value in *slot if needed (Yuasa side — do not hide the old subgraph).
Shade ptr if the slot is in a black object (Dijkstra side — the new edge must not point at white).

The exact lowering is in mbarrier.go and the compiler's writebarrier package (cmd/compile/internal/wb). You will not see barriers in normal source; they appear in -S assembly output as calls to runtime.gcWriteBarrier or buffered variants.

Barriers have a real CPU cost during mark. They are why some benchmarks look slower when GOGC=off is not an option and the heap is large enough that mark overlaps your hot path. Outside concurrent mark, barriers are off and stores are plain moves.

Mark assist: allocation pays for marking

If goroutines allocated freely while mark workers fell behind, the grey frontier would never catch up and the live heap could grow without bound before mark termination.

Mark assist forces allocating goroutines to do mark work proportional to their allocation rate. mallocgc consults gcAssistAlloc in mgcmark.go — if the GC is behind pace, your make or new path also scans and shades objects before returning.

That ties GC CPU to allocation pressure in a predictable way: bursty allocators stall slightly in the allocator path instead of letting mark debt pile up until a long STW.

Background mark workers run on their own Ps when available (gcBgMarkWorker). Under heavy load you will see both background workers and mark assist in CPU profiles under runtime.gcDrain and friends.

Sweep: reclaiming spans for mallocgc

Mark answers "live or dead." Sweep answers "put dead slots back where mcache can pop them."

Sweep walks spans and clears mark bits for unmarked small objects, pushing slots onto span free lists. Fully empty spans can return pages toward the central lists the allocator article described. Sweep is concurrent — it does not need STW except when the next cycle needs a clean sweep state.

Until a span is swept, dead objects still look "busy" to the allocator's fast path. High allocation rates after a spike can trigger sweep assist (allocation-driven sweep) so mallocgc does not hand out unswept memory forever.

Large-object spans and special cases follow parallel paths in mgcsweep.go, but the mental model is the same: if mark did not reach an object, its memory becomes eligible for reuse.

STW: what actually pauses your goroutines

"Go has a low-latency GC" usually means most mark and all sweep are concurrent, not that STW disappeared.

STW happens when the runtime must observe a consistent global snapshot:

enabling or disabling the write barrier across all Ps
scanning stacks at the start of mark (with preemption points so goroutines cooperate)
finalizing mark when work buffers must be drained without new barrier races

Typical STW times are sub-millisecond on many workloads and can rise with goroutine count and stack depth — each stack is a root. Thousands of idle goroutines with deep stacks can lengthen mark setup even if they barely allocate.

For latency-sensitive services, STW spikes often correlate with heap size and GOMAXPROCS, not only allocation rate. Profile with GODEBUG=gctrace=1 and execution traces from the Scheduler article to line up pauses with phase transitions.

GOGC and pacing

GOGC is an environment variable (default 100). It controls when the next GC cycle starts, not how aggressively sweep runs inside a cycle.

After a cycle completes, the runtime records the live heap size — call it L. With GOGC=100, the trigger heap goal is roughly L + L * (GOGC/100), i.e. about 2× live heap before the next cycle begins. GOGC=200 allows about 3×; GOGC=50 targets about 1.5×.

Lower GOGC → more frequent GC → less heap headroom, more CPU on mark/sweep. Higher GOGC → fewer cycles, larger heaps, longer mark work per cycle when it finally runs.

GOGC=off disables automatic cycles (you can still force runtime.GC()). Useful in benchmarks; risky in production unless you know nothing allocates after warm-up.

The pacing controller in mgcpacer.go adjusts goals so mark finishes before the heap hits the trigger. When it mis-estimates, mark assist rises — you pay in allocation latency instead of a bigger STW.

Memory limit (Go 1.19+)

debug.SetMemoryLimit and GOMEMLIMIT add a soft cap on total memory (heap + certain runtime structures). When live heap approaches the limit, the pacer makes GC more aggressive — effectively treating memory pressure like a lower GOGC.

This is not a hard malloc failure at the cap: the runtime tries to reclaim and return pages to the OS (runtime/debug documents the behavior). Pair with metrics on GODEBUG=gctrace=1 goal and live fields when tuning services that used to set GOGC very high.

Reading gctrace

GODEBUG=gctrace=1 ./myapp

Each line is one GC cycle. A typical row looks like:

gc 42 @12.345s 2%: 0.12+1.5+0.08 ms clock, 0.10+0.9+0.05 ms cpu, 64->64->32 MB, 96 MB goal, 8 P

Useful fields:

gc 42 — cycle number
@12.345s — time since process start
2% — fraction of wall time spent in GC so far
0.12+1.5+0.08 ms clock — STW sweep term + concurrent mark + STW mark term (approximate split; see mgc.go comments for the exact mapping in 1.23)
64->64->32 MB — heap at mark start → heap at mark end → live heap after sweep
96 MB goal — pacing target that triggered this cycle
8 P — GOMAXPROCS at cycle time

Sudden jumps in the STW components with flat allocation often mean more stacks to scan or mark debt from a heap spike. Compare with alloc_space profiles from the Allocator article to see whether you are fighting allocation rate or retention.

A small retention experiment

package main

import (
  "os"
  "strconv"
  "time"
)

var sink []*byte

func main() {
  n, _ := strconv.Atoi(os.Getenv("N"))
  if n == 0 {
    n = 100_000
  }
  for i := 0; i < n; i++ {
    b := make([]byte, 1024) // 1 KiB each, pointerful slice header on heap
    sink = append(sink, b)
  }
  time.Sleep(2 * time.Second) // let a GC cycle run
}

Run with N=100000, GODEBUG=gctrace=1. The live heap after sweep stays large because sink keeps every backing array reachable. Drop the sink append and only the slice growth transient matters — same allocation count, completely different live column.

That is the graph the marker walks: not your variable names, but which pointers still exist in roots and object fields.

sync.Pool and GC

sync.Pool entries can be cleared at GC cycle start (sync package docs). Pool is not a way to opt out of marking — pooled objects are still on the heap while referenced. It reduces allocation rate so mark has fewer new objects to scan, at the cost of unpredictable clears when you might have wanted reuse.

Tuning levers

Lever	Effect
Fewer heap pointers / less allocation	Smaller graph, less mark work — first resort
`GOGC`	Trade heap size vs GC frequency
`GOMEMLIMIT` / `SetMemoryLimit`	Cap RSS-ish pressure, forces more aggressive pacing
`runtime.GC()`	Manual cycle; blocks until complete — rare in app code
`GOMAXPROCS`	More `P`s → more parallel mark workers; also more stacks at root scan

There is no supported flag to swap in a different collector. GODEBUG=gcstoptheworld=1 exists for debugging (forces more STW); do not use it in production.

GC vs allocator: end-to-end

flowchart LR
  A["escape → mallocgc"] --> B["object live in span"]
  B --> C["mark finds pointer path from root"]
  C --> D["object black / live"]
  B --> E["no path from roots"]
  E --> F["sweep recycles slot"]
  F --> G["mcache alloc reuses slot"]

Escape analysis decides allocation. The allocator places objects. The GC maintains the reachability graph and sweeps dead slots so the allocator can reuse them without manual free.

Conclusion

Go's garbage collector is a concurrent tri-color mark-sweep system: short STW pauses set up and finish mark, write barriers preserve the black-not-to-white invariant while goroutines run, and sweep feeds spans back to mallocgc. GOGC and the memory limit shape when cycles run and how hard the pacer pushes — not the per-allocation path from the Allocator article.

Next in the series is Built-in Types — how slice, string, and map headers look in memory and what the runtime does when you index, append, or range over them.