Back to home

Go Internals - Allocator

11 min read
Cover Image for Go Internals - Allocator
Lucas LemosLucas Lemos

Introduction

In Go Internals - Memory we saw escape analysis decide when a value must live on the heap. Once that decision is made, the runtime has to hand back a pointer fast enough that goroutine churn stays cheap.

That job belongs to the heap allocator in src/runtime/malloc.go, mheap.go, mcache.go, and mcentral.go. The design is inherited from tcmalloc: size classes, per-thread caches, and a central page heap. Go adapts it for a concurrent runtime with a tracing GC and one mcache per P.

This article walks through Go 1.23 allocator paths — what mallocgc does with a size, where locks appear, and why your heap profile often shows mallocgc even when application code looks innocent.

Pages, spans, and the heap arena

The heap is not one giant malloc arena. The runtime carves virtual memory into pages8 KiB on typical 64-bit platforms (_PageShift = 13 in malloc.go). Pages are grouped into spans (mspan in mspan.go): contiguous page runs that either hold many small objects of one size class or one large object.

flowchart TB
  subgraph arena["Heap arena (virtual memory)"]
    S1["mspan: 8 pages, size class 48 B"]
    S2["mspan: 1 page, tiny objects"]
    S3["mspan: N pages, single large object"]
    FREE["unused pages"]
  end
  S1 --> O1["obj obj obj obj ..."]
  S2 --> O2["16 B slots packed"]
  S3 --> O3["one allocation"]

Each span tracks:

  • how many pages it covers
  • which size class it serves (or 0 for large-object spans)
  • a free list of unused slots inside the span
  • GC metadata (mark bits, sweep state — the Garbage Collector article picks up there)

Spans are the unit the allocator hands out and the GC later reclaims. When you read mheap.alloc in mheap.go, it is ultimately wiring pages into spans.

Size classes: rounding up is intentional

Not every new(T) gets exactly sizeof(T) bytes. Small and medium objects route through size classes — a fixed table in sizeclasses.go (generated; do not edit by hand). A 24-byte struct might land in the 32-byte class. You pay a few bytes of internal fragmentation so the allocator can use free lists instead of a general-purpose heap.

Rough buckets mallocgc uses:

Route Size range (typical 64-bit) What happens
Tiny ≤ 16 bytes, no pointers Packed into 16-byte blocks; several objects can share one slot
Small > 16 bytes up to 32 KiB (maxSmallSize) Size-class lookup → mcache → maybe mcentral → maybe mheap
Large > 32 KiB Dedicated span(s) from mheap; size rounded to page multiples
type node struct {
  next *node
  val  int
} // 16 bytes on 64-bit — small class, not tiny (has a pointer field)

Pointer-free objects ≤ 16 bytes can share a tiny block. Add a pointer field and the object leaves the tiny path even if the struct is still small.

The size-class table also records noscan: if a class has no pointers, the GC skips scanning those objects during marking. That is a real throughput win for []byte-heavy workloads when objects stay in noscan classes.

mallocgc: the front door

Compiler-generated code and runtime helpers call mallocgc(size, typ, needzero). You will see it in heap profiles more often than any user function name.

flowchart TD
  A["mallocgc(size, typ, needzero)"] --> B{"size == 0?"}
  B -->|yes| Z["zerobase / empty struct"]
  B -->|no| C{"tiny? (≤16 B, noscan)"}
  C -->|yes| T["mcache.tiny allocator"]
  C -->|no| D{"size ≤ maxSmallSize?"}
  D -->|yes| E["size class index"]
  E --> F["mcache.alloc"]
  F -->|miss| G["mcentral.cacheSpan"]
  G -->|miss| H["mheap.alloc"]
  D -->|no| I["mheap.allocLarge"]

For small objects the hot path is: compute class → take the next free slot from the current P's mcache → return. No lock on that path if the cache already has a span with free slots.

needzero tells the allocator whether memory must be zeroed before return. Fresh heap pages are zeroed; reused slots from a freed object may already be cleared or may need memclr depending on span state — the runtime tracks this so the language zero-value guarantee holds without clearing the whole heap on every allocation.

mcache: one cache per P

Each P (processor) owns an mcache — a structure with a free list per size class plus the tiny allocator state. Because goroutine code runs only when its M holds a P, most allocations hit thread-local cache state without a central lock.

flowchart LR
  subgraph P0["P₀"]
    C0["mcache₀"]
    C0 --> SC0["span for 32 B class"]
    C0 --> SC1["span for 96 B class"]
  end
  subgraph P1["P₁"]
    C1["mcache₁"]
  end
  G["running G on P₀"] -->|"mallocgc small"| C0

When mcache.alloc runs dry for a class, it calls into mcentral to refill. Refill attaches a span with free objects; the G keeps allocating from mcache until that span is exhausted.

This is why GOMAXPROCS affects allocator scalability indirectly: more Ps means more mcaches and less contention on mcentral, up to the point where your workload is actually parallel.

mcentral: spans shared across Ps

Each size class has one mcentral — a list of spans with free space for that class. mcentral.cacheSpan runs when an mcache needs a fresh span. It takes a lock on that central structure, pulls a partial span from the nonempty list, or asks mheap for new pages if every list is empty.

sequenceDiagram
  participant G as G on P
  participant MC as mcache
  participant CENT as mcentral
  participant MH as mheap
  G->>MC: alloc 48 B
  MC->>MC: pop free slot
  Note over MC: span empty
  MC->>CENT: cacheSpan
  CENT->>CENT: lock, detach span
  CENT-->>MC: mspan with free slots
  MC-->>G: pointer
  Note over CENT,MH: central lists empty
  CENT->>MH: alloc pages, build span
  MH-->>CENT: new mspan

Contention shows up here when many Ps allocate the same size class at once and central lists stay empty. That is less common than mcache hits in typical services, but microbenchmarks that hammer one struct size from all cores can spike mcentral locks — worth knowing when a profile blames runtime.(*mcentral).cacheSpan.

mheap: pages and large objects

mheap is the global page heap. It owns free page runs, maps new arena memory from the OS when needed, and builds spans. Large allocations (> 32 KiB) skip mcache and mcentral entirely: mheap allocates enough pages, marks the span as large, and returns the object base address.

Large objects pay page rounding. A 40 KiB request on 8 KiB pages needs six pages (48 KiB mapped). Internal fragmentation at page granularity is the trade-off for O(1) large allocation without maintaining a separate large-object free list on the hot path.

When the process needs more heap, the runtime grows the arena — sysAlloc / platform mapping in mem.go. RSS climbs in steps; GODEBUG=gctrace=1 shows heap size but not per-allocation detail. For allocator behavior, pprof alloc space is the better lens.

Tiny allocator: packing sub-16-byte values

The tiny path is easy to miss because escape analysis and size classes already hid the cost. For pointer-free objects up to 16 bytes, mcache keeps a tiny block (16 bytes) and bumps a cursor through it:

type point struct {
  x, y int16 // 4 bytes total, noscan
}

func manyPoints() []*point {
  out := make([]*point, 1000)
  for i := range out {
    out[i] = &point{x: int16(i), y: int16(i)} // each escapes; tiny-eligible
  }
  return out
}

Several point values can live in one 16-byte tiny block before the runtime opens a new one. Once you add a *T or string field, the object is no longer tiny-eligible even if the struct is still small.

Tiny allocations still escape if you take their address — the win is packing density and noscan marking, not stack placement.

What the compiler emits

Escape analysis does not call mallocgc itself. The compiler lowers heap allocations to calls like newobject, makemap, makeslice, or explicit mallocgc in the SSA backend (cmd/compile/internal/ssagen/ssa.go and friends). Each carries type information so the allocator can pick the right size class and scan/noscan behavior.

func newBox() *int {
  n := new(int) // lowered to typed heap allocation
  *n = 1
  return n
}

make([]T, len, cap) allocates the backing array on the heap when the slice escapes; the three-word slice header may still live on the stack. new(T) always heap-allocates T — that is the language rule, not an allocator quirk.

Fragmentation and retention

Internal fragmentation — object smaller than its size class, or large alloc rounded to pages — is usually acceptable compared to lock contention on a single global heap.

External fragmentation — free memory scattered across spans that cannot satisfy a new large request — is handled cooperatively with the GC: when objects die, spans become sweepable and slots return to mcache / mcentral free lists. Until sweep runs, memory stays mapped. The next article covers tri-color marking and sweep; here the relevant part is that freeing is not free(3) — reclaimed slots are reused by the same size class before pages go back to the OS (and pages rarely return to the OS immediately).

Long-lived pointers into many size classes keep whole spans alive even if only one object in the span is live. That is span-level retention, not a leak in the Go sense, but it shows up as heap size plateaus in profiles.

sync.Pool and the allocator

sync.Pool is not a separate allocator. It caches already allocated objects per-P to skip mallocgc on hot paths. When the pool is empty, Pool.Get allocates normally; when GC runs, pool entries may be dropped (Pool documentation warns they can be cleared any time).

var bufPool = sync.Pool{
  New: func() interface{} {
    b := make([]byte, 4096) // heap backing array via makeslice
    return &b
  },
}

Pool reduces allocation rate; it does not change escape proofs or size classes. Measure before sprinkling pools — they add retention and API friction.

Observing allocator behavior

Heap profiles point at mallocgc and sometimes mcache / mcentral refill:

go test -bench=. -benchmem ./...
go test -memprofile=mem.prof -bench=BenchmarkHandler ./...
go tool pprof -http=:8080 -alloc_space mem.prof

-alloc_space shows where bytes were allocated over time; -alloc_objects counts allocation events. A flat mallocgc stack with your function one frame up means the compiler emitted heap traffic at that call site — pair with -gcflags="-m" from the Memory article to see why.

For live heap size (retained, not cumulative alloc), use -heap / inuse_space in pprof, or:

GODEBUG=gctrace=1 ./myapp

schedtrace does not break down allocator tiers; if you suspect mcentral lock contention, CPU profile with mutex profiling:

go test -mutexprofile=mutex.prof -bench=.
go tool pprof -http=:8080 mutex.prof

Allocator vs stack: when each path wins

Situation Typical path
Local variable, address does not escape Stack (no mallocgc)
Return *T, closure capture, go spawn Heap via mallocgc
make(map...), growing slice backing store Heap
Small noscan struct on heap Tiny or small size class
Buffer > 32 KiB Large span from mheap

Reducing allocator work means reducing escapes first — different size classes are a second-order tweak. Choosing between value and pointer returns beats picking struct fields to hit a smaller class.

Tuning knobs (and what does not exist)

Go does not expose jemalloc-style MALLOC_ARENA_MAX or a user-facing size-class table. Useful levers that touch allocation indirectly:

  • Fewer escapes — compiler -m, API design
  • GOGC — controls GC pacing, not malloc routing; affects how long freed spans sit before sweep
  • GOMAXPROCS — more Ps → more mcaches
  • debug.SetMemoryLimit (Go 1.19+) — soft cap that makes the runtime return memory to the OS more aggressively under pressure; does not change per-allocation class selection

Do not expect a flag to "turn off" the tiny allocator or swap in system malloc for normal Go heap objects — the GC assumes span layout.

Conclusion

The Go heap allocator routes mallocgc through size classes and spans: hot small allocations come from the current P's mcache without locks; refills hit mcentral; new pages and large objects go through mheap. Escape analysis from the Memory article decides whether you allocate; this layer decides how fast and with how much fragmentation.

Next in the series is Garbage Collector — how mark and sweep reclaim those spans, what write barriers cost, and what GOGC actually controls.