Back to home

Data Structures with Golang - Part 3

7 min read
Cover Image for Data Structures with Golang - Part 3
Lucas LemosLucas Lemos

Introduction

In part 1 we covered Lists and in part 2 we covered Trees. Now we are moving to one of the most used structures in real systems: Hash Tables.

If you have ever implemented caching, deduplication, counters, indexing, joins, session lookups or permission checks, you have probably depended on hash tables.

The core goal of this structure is simple: fast key-based access. In average scenarios, insert, lookup and delete are all O(1).

As in the previous articles, we will keep the same base interface:

type Container interface {
  Empty() bool
  Size() int
  Clear()
  Values() []interface{}
  String() string
}

What is a Hash Table?

A Hash Table stores data as key-value pairs.

Instead of traversing nodes (like linked lists) or comparing through tree levels (like BST/AVL), it computes an index directly from a key using a hash function.

Simple flow:

  1. You provide a key (example: "user:42").
  2. The hash function converts the key into an integer hash.
  3. Hash is mapped to an array index (bucket).
  4. Data is inserted/read in that bucket.

High-Level Visualization

flowchart LR
  K["Key: user:42"] --> H["Hash Function"]
  H --> M["mod bucketCount"]
  M --> B["Bucket #3"]
  B --> V["(user:42, profile)"]

Why collisions happen

Collisions are unavoidable because the key space is much larger than the bucket array.

Different keys can produce:

  • the same hash, or
  • different hashes that map to the same index after modulo.

Example:

hash("ab") = 100
hash("ba") = 116

bucketCount = 8

100 % 8 = 4
116 % 8 = 4

Both go to bucket[4] -> collision

So hash tables are not about avoiding collisions completely. They are about handling collisions efficiently.

Collision resolution strategies

The two most common approaches are:

1) Separate Chaining

Each bucket stores a collection (usually linked list or slice) of entries.

bucket[0] -> (a, 10) -> (k, 99)
bucket[1] -> (x, 2)
bucket[2] -> empty
bucket[3] -> (m, 8) -> (n, 13) -> (p, 21)

Pros:

  • simple to implement
  • deletion is straightforward
  • performance degrades gradually

Cons:

  • pointer/slice overhead
  • cache locality may be worse depending on structure

2) Open Addressing

All entries stay in the array. On collision, probe another slot (linear probing, quadratic probing, double hashing).

Pros:

  • better cache locality in many scenarios
  • no per-bucket extra structure

Cons:

  • deletion is trickier (tombstones)
  • high load factors degrade quickly

In this article we will implement separate chaining, because it is easier to read and reason about.

Load factor and resizing

Load factor is:

alpha = size / bucketCount

As alpha grows, collisions increase, and average bucket length increases.

Typical strategy:

  • choose a threshold (for example 0.75)
  • when alpha exceeds threshold, create a larger bucket array (usually 2x)
  • reinsert all entries (rehash)

This resizing step is O(n), but it does not happen every operation, so amortized insertion still stays close to O(1).

Complexity summary

Average case (good hash + controlled load factor):

  • Insert: O(1)
  • Lookup: O(1)
  • Delete: O(1)

Worst case (all keys collapse into one bucket):

  • Insert: O(n)
  • Lookup: O(n)
  • Delete: O(n)

Practical Go implementation (separate chaining)

Below is a complete implementation with:

  • string keys
  • generic values
  • dynamic resize
  • update on existing keys
  • Container interface methods
type entry[V any] struct {
  key   string
  value V
}

type HashTable[V any] struct {
  buckets       [][]entry[V]
  size          int
  maxLoadFactor float64
}

func NewHashTable[V any](initialBuckets int) *HashTable[V] {
  if initialBuckets < 8 {
    initialBuckets = 8
  }
  return &HashTable[V]{
    buckets:       make([][]entry[V], initialBuckets),
    maxLoadFactor: 0.75,
  }
}

func (h *HashTable[V]) Empty() bool {
  return h.size == 0
}

func (h *HashTable[V]) Size() int {
  return h.size
}

func (h *HashTable[V]) Clear() {
  h.buckets = make([][]entry[V], len(h.buckets))
  h.size = 0
}

func (h *HashTable[V]) Values() []interface{} {
  values := make([]interface{}, 0, h.size)
  for _, bucket := range h.buckets {
    for _, e := range bucket {
      values = append(values, e.value)
    }
  }
  return values
}

func (h *HashTable[V]) String() string {
  pairs := make([]string, 0, h.size)
  for _, bucket := range h.buckets {
    for _, e := range bucket {
      pairs = append(pairs, fmt.Sprintf("%s=%v", e.key, e.value))
    }
  }
  return fmt.Sprintf("{%s}", strings.Join(pairs, ", "))
}

func (h *HashTable[V]) Put(key string, value V) {
  if float64(h.size+1)/float64(len(h.buckets)) > h.maxLoadFactor {
    h.resize(len(h.buckets) * 2)
  }

  index := h.bucketIndex(key)
  bucket := h.buckets[index]

  // Update existing key (size does not change)
  for i := range bucket {
    if bucket[i].key == key {
      bucket[i].value = value
      h.buckets[index] = bucket
      return
    }
  }

  // Insert new key
  h.buckets[index] = append(bucket, entry[V]{key: key, value: value})
  h.size++
}

func (h *HashTable[V]) Get(key string) (V, bool) {
  index := h.bucketIndex(key)
  bucket := h.buckets[index]

  for i := range bucket {
    if bucket[i].key == key {
      return bucket[i].value, true
    }
  }

  var zero V
  return zero, false
}

func (h *HashTable[V]) Remove(key string) bool {
  index := h.bucketIndex(key)
  bucket := h.buckets[index]

  for i := range bucket {
    if bucket[i].key == key {
      // remove by swapping with last (O(1) after lookup)
      last := len(bucket) - 1
      bucket[i] = bucket[last]
      bucket = bucket[:last]
      h.buckets[index] = bucket
      h.size--
      return true
    }
  }

  return false
}

func (h *HashTable[V]) ContainsKey(key string) bool {
  _, ok := h.Get(key)
  return ok
}

func (h *HashTable[V]) Keys() []string {
  keys := make([]string, 0, h.size)
  for _, bucket := range h.buckets {
    for _, e := range bucket {
      keys = append(keys, e.key)
    }
  }
  return keys
}

func (h *HashTable[V]) LoadFactor() float64 {
  if len(h.buckets) == 0 {
    return 0
  }
  return float64(h.size) / float64(len(h.buckets))
}

func (h *HashTable[V]) bucketIndex(key string) int {
  hash := fnv.New64a()
  _, _ = hash.Write([]byte(key))
  return int(hash.Sum64() % uint64(len(h.buckets)))
}

func (h *HashTable[V]) resize(newBucketCount int) {
  oldBuckets := h.buckets
  h.buckets = make([][]entry[V], newBucketCount)
  h.size = 0

  for _, bucket := range oldBuckets {
    for _, e := range bucket {
      h.Put(e.key, e.value)
    }
  }
}

Imports used by this implementation:

import (
  "fmt"
  "hash/fnv"
  "strings"
)

Resize visualization

flowchart TD
  A["size=6, buckets=8, alpha=0.75"] --> B["insert one more key"]
  B --> C["alpha exceeds threshold"]
  C --> D["create new buckets (16)"]
  D --> E["rehash all existing entries"]
  E --> F["resume normal operations"]

Important implementation details

Updating existing keys

Put("k", old) then Put("k", new) should update the value, not duplicate key entries.

Hash function quality matters

Weak hash functions can generate clustered indexes and hurt performance.

In production:

  • prefer stable and well-known hash functions
  • avoid simplistic custom hashes unless really needed

Iteration order is not guaranteed

Like Go's native map, hash table iteration should usually be considered unordered.

If stable ordering is required, keep an extra ordered structure (like slice of keys or tree).

Memory behavior

Hash tables trade memory for speed:

  • bucket array
  • per-entry metadata
  • temporary allocations during resize

This is often acceptable because lookup speed is the main goal.

Real-world use cases

  • Caches: key to computed result
  • Counting/frequency maps: key to occurrences
  • Indexes: external identifier to internal object
  • Sets: key with empty struct / bool value
  • Joins/grouping: hash one side, probe the other

Hash Tables vs Trees

  • Use Hash Table when your primary operation is exact key lookup.
  • Use Balanced Tree when you need sorted order, range queries, predecessor/successor operations, or deterministic traversal.

Both are foundational and often used together in real systems.

Conclusion

Hash tables are one of the most practical data structures in software engineering because they offer fast average operations with a conceptually simple model: hash key, choose bucket, resolve collisions.

In this article we went deep on collisions, load factor, resizing and a complete Go implementation using separate chaining while still following the same Container interface used in the previous posts.

In the next article we can move to Heaps/Priority Queues, which are a great next step after hash-based lookup structures.