Data Structures with Golang - Part 3
Introduction
In part 1 we covered Lists and in part 2 we covered Trees. Now we are moving to one of the most used structures in real systems: Hash Tables.
If you have ever implemented caching, deduplication, counters, indexing, joins, session lookups or permission checks, you have probably depended on hash tables.
The core goal of this structure is simple: fast key-based access. In average scenarios, insert, lookup and delete are all O(1).
As in the previous articles, we will keep the same base interface:
type Container interface {
Empty() bool
Size() int
Clear()
Values() []interface{}
String() string
}
What is a Hash Table?
A Hash Table stores data as key-value pairs.
Instead of traversing nodes (like linked lists) or comparing through tree levels (like BST/AVL), it computes an index directly from a key using a hash function.
Simple flow:
- You provide a key (example:
"user:42"). - The hash function converts the key into an integer hash.
- Hash is mapped to an array index (bucket).
- Data is inserted/read in that bucket.
High-Level Visualization
flowchart LR
K["Key: user:42"] --> H["Hash Function"]
H --> M["mod bucketCount"]
M --> B["Bucket #3"]
B --> V["(user:42, profile)"]
Why collisions happen
Collisions are unavoidable because the key space is much larger than the bucket array.
Different keys can produce:
- the same hash, or
- different hashes that map to the same index after modulo.
Example:
hash("ab") = 100
hash("ba") = 116
bucketCount = 8
100 % 8 = 4
116 % 8 = 4
Both go to bucket[4] -> collision
So hash tables are not about avoiding collisions completely. They are about handling collisions efficiently.
Collision resolution strategies
The two most common approaches are:
1) Separate Chaining
Each bucket stores a collection (usually linked list or slice) of entries.
bucket[0] -> (a, 10) -> (k, 99)
bucket[1] -> (x, 2)
bucket[2] -> empty
bucket[3] -> (m, 8) -> (n, 13) -> (p, 21)
Pros:
- simple to implement
- deletion is straightforward
- performance degrades gradually
Cons:
- pointer/slice overhead
- cache locality may be worse depending on structure
2) Open Addressing
All entries stay in the array. On collision, probe another slot (linear probing, quadratic probing, double hashing).
Pros:
- better cache locality in many scenarios
- no per-bucket extra structure
Cons:
- deletion is trickier (tombstones)
- high load factors degrade quickly
In this article we will implement separate chaining, because it is easier to read and reason about.
Load factor and resizing
Load factor is:
alpha = size / bucketCount
As alpha grows, collisions increase, and average bucket length increases.
Typical strategy:
- choose a threshold (for example
0.75) - when
alphaexceeds threshold, create a larger bucket array (usually 2x) - reinsert all entries (rehash)
This resizing step is O(n), but it does not happen every operation, so amortized insertion still stays close to O(1).
Complexity summary
Average case (good hash + controlled load factor):
- Insert:
O(1) - Lookup:
O(1) - Delete:
O(1)
Worst case (all keys collapse into one bucket):
- Insert:
O(n) - Lookup:
O(n) - Delete:
O(n)
Practical Go implementation (separate chaining)
Below is a complete implementation with:
- string keys
- generic values
- dynamic resize
- update on existing keys
Containerinterface methods
type entry[V any] struct {
key string
value V
}
type HashTable[V any] struct {
buckets [][]entry[V]
size int
maxLoadFactor float64
}
func NewHashTable[V any](initialBuckets int) *HashTable[V] {
if initialBuckets < 8 {
initialBuckets = 8
}
return &HashTable[V]{
buckets: make([][]entry[V], initialBuckets),
maxLoadFactor: 0.75,
}
}
func (h *HashTable[V]) Empty() bool {
return h.size == 0
}
func (h *HashTable[V]) Size() int {
return h.size
}
func (h *HashTable[V]) Clear() {
h.buckets = make([][]entry[V], len(h.buckets))
h.size = 0
}
func (h *HashTable[V]) Values() []interface{} {
values := make([]interface{}, 0, h.size)
for _, bucket := range h.buckets {
for _, e := range bucket {
values = append(values, e.value)
}
}
return values
}
func (h *HashTable[V]) String() string {
pairs := make([]string, 0, h.size)
for _, bucket := range h.buckets {
for _, e := range bucket {
pairs = append(pairs, fmt.Sprintf("%s=%v", e.key, e.value))
}
}
return fmt.Sprintf("{%s}", strings.Join(pairs, ", "))
}
func (h *HashTable[V]) Put(key string, value V) {
if float64(h.size+1)/float64(len(h.buckets)) > h.maxLoadFactor {
h.resize(len(h.buckets) * 2)
}
index := h.bucketIndex(key)
bucket := h.buckets[index]
// Update existing key (size does not change)
for i := range bucket {
if bucket[i].key == key {
bucket[i].value = value
h.buckets[index] = bucket
return
}
}
// Insert new key
h.buckets[index] = append(bucket, entry[V]{key: key, value: value})
h.size++
}
func (h *HashTable[V]) Get(key string) (V, bool) {
index := h.bucketIndex(key)
bucket := h.buckets[index]
for i := range bucket {
if bucket[i].key == key {
return bucket[i].value, true
}
}
var zero V
return zero, false
}
func (h *HashTable[V]) Remove(key string) bool {
index := h.bucketIndex(key)
bucket := h.buckets[index]
for i := range bucket {
if bucket[i].key == key {
// remove by swapping with last (O(1) after lookup)
last := len(bucket) - 1
bucket[i] = bucket[last]
bucket = bucket[:last]
h.buckets[index] = bucket
h.size--
return true
}
}
return false
}
func (h *HashTable[V]) ContainsKey(key string) bool {
_, ok := h.Get(key)
return ok
}
func (h *HashTable[V]) Keys() []string {
keys := make([]string, 0, h.size)
for _, bucket := range h.buckets {
for _, e := range bucket {
keys = append(keys, e.key)
}
}
return keys
}
func (h *HashTable[V]) LoadFactor() float64 {
if len(h.buckets) == 0 {
return 0
}
return float64(h.size) / float64(len(h.buckets))
}
func (h *HashTable[V]) bucketIndex(key string) int {
hash := fnv.New64a()
_, _ = hash.Write([]byte(key))
return int(hash.Sum64() % uint64(len(h.buckets)))
}
func (h *HashTable[V]) resize(newBucketCount int) {
oldBuckets := h.buckets
h.buckets = make([][]entry[V], newBucketCount)
h.size = 0
for _, bucket := range oldBuckets {
for _, e := range bucket {
h.Put(e.key, e.value)
}
}
}
Imports used by this implementation:
import (
"fmt"
"hash/fnv"
"strings"
)
Resize visualization
flowchart TD
A["size=6, buckets=8, alpha=0.75"] --> B["insert one more key"]
B --> C["alpha exceeds threshold"]
C --> D["create new buckets (16)"]
D --> E["rehash all existing entries"]
E --> F["resume normal operations"]
Important implementation details
Updating existing keys
Put("k", old) then Put("k", new) should update the value, not duplicate key entries.
Hash function quality matters
Weak hash functions can generate clustered indexes and hurt performance.
In production:
- prefer stable and well-known hash functions
- avoid simplistic custom hashes unless really needed
Iteration order is not guaranteed
Like Go's native map, hash table iteration should usually be considered unordered.
If stable ordering is required, keep an extra ordered structure (like slice of keys or tree).
Memory behavior
Hash tables trade memory for speed:
- bucket array
- per-entry metadata
- temporary allocations during resize
This is often acceptable because lookup speed is the main goal.
Real-world use cases
- Caches: key to computed result
- Counting/frequency maps: key to occurrences
- Indexes: external identifier to internal object
- Sets: key with empty struct / bool value
- Joins/grouping: hash one side, probe the other
Hash Tables vs Trees
- Use Hash Table when your primary operation is exact key lookup.
- Use Balanced Tree when you need sorted order, range queries, predecessor/successor operations, or deterministic traversal.
Both are foundational and often used together in real systems.
Conclusion
Hash tables are one of the most practical data structures in software engineering because they offer fast average operations with a conceptually simple model: hash key, choose bucket, resolve collisions.
In this article we went deep on collisions, load factor, resizing and a complete Go implementation using separate chaining while still following the same Container interface used in the previous posts.
In the next article we can move to Heaps/Priority Queues, which are a great next step after hash-based lookup structures.