Hashing & Hash Tables

Hash tables are the most frequently used data structure in performance-critical code after arrays. Their speed is not a mystery: it is the result of careful engineering around cache lines, load factors, and probe sequences.

Overview

Technique	Lookup	Insert	Delete	Memory	Cache	Notes
Separate chaining	O(1) exp	O(1) exp	O(1) exp	O(n) + overhead	Poor	Pointers scatter memory; simple
Linear probing	O(1) exp	O(1) exp	O(1) exp	O(n)	Good	Primary clustering; best cache behavior
Robin Hood	O(1) exp	O(1) exp	O(1) exp	O(n)	Good	Bounded probe distance; deletion via tombstones
Cuckoo hashing	O(1) worst	O(1) exp	O(1) worst	O(n)	Moderate	Two tables + rehash on cycle
Swiss Tables (F14)	O(1) exp	O(1) exp	O(1) exp	O(n)	Excellent	SIMD group probing; flat layout

Hash Function Quality

Before table layout matters, the hash function must distribute keys uniformly. A poor hash function can turn any open-addressing table into a linked list.

What makes a hash function fast in C++

Avoid modulo by prime. Prefer power-of-two table sizes and bitmasking (hash & (size - 1)). This is one integer op instead of a division.
Avalanche property. A single-bit change in the key should flip approximately half the bits in the hash output.
Specialization for common types. Integer hashing should mix the value; string hashing should process the string in word-sized chunks.

cpp

// A fast, decent integer hash (MurmurHash3 finalizer)
uint64_t hash_u64(uint64_t x) {
  x ^= x >> 33;
  x *= 0xff51afd7ed558ccdULL;
  x ^= x >> 33;
  x *= 0xc4ceb9fe1a85ec53ULL;
  x ^= x >> 33;
  return x;
}

For strings, the repository uses a FNV-1a variant or hardware CRC32 where available. See examples/02-memory-cache/ for a benchmark comparing hash functions on the same key distribution.

Linear Probing and Its Limits

Linear probing is the simplest open-addressing scheme: if slot h(key) is occupied, try (h(key) + 1) % M, then +2, and so on.

Why it is fast

Cache-friendly. Probing a contiguous array means prefetchers and cache lines work in your favor.
Simple code. No extra indirections, no pointer chasing.

Why it can fail

Primary clustering. Contiguous runs of occupied slots form, increasing average probe distance.
Sensitive to load factor. Above 0.7–0.75, performance degrades rapidly.

cpp

// Simplified linear-probing lookup
Value* find(const Key& key) {
  size_t idx = hash(key) & mask;
  while (states[idx] != EMPTY) {
    if (states[idx] == OCCUPIED && keys[idx] == key) return &values[idx];
    idx = (idx + 1) & mask;
  }
  return nullptr;
}

Robin Hood Hashing

Robin Hood hashing bounds the maximum probe distance by displacing existing entries when a new entry would have a longer probe sequence.

The Robin Hood principle

When inserting, if the new element's distance-from-ideal-slot is greater than the existing element's distance, swap them and continue probing with the displaced element. This keeps the variance of probe distances low.

Deletion

Standard open-addressing deletion requires tombstones. Robin Hood tables can use backward shifting: on deletion, scan forward and shift elements back to their ideal positions if possible.

SIMD-accelerated Robin Hood

Some implementations (e.g., ska::flat_hash_map) store metadata bytes (hash fragments) in a separate parallel array. Lookup first compares 16 metadata bytes at once with SSE/NEON, then only checks keys on matching fragments. This reduces cache misses and key comparisons.

Swiss Tables and F14

Facebook's F14 and Abseil's Swiss Tables represent the current state of the art for general-purpose hash tables in C++.

Key design ideas

Group probing. The table is divided into groups of 16 slots. A SIMD comparison checks all 16 control bytes in parallel to find candidates.
Flat layout. Keys and values are stored inline in a single array; no separate chaining or pointer indirection.
Two-tier storage. F14 uses a small metadata array (control bytes) and a separate value array, allowing values to be reallocated without moving keys.

cpp

// Conceptual SIMD group probe (SSE2/NEON)
// control_bytes: 16 bytes, each is (hash_fragment | special_states)
// target: 8-bit hash fragment of key
uint16_t match = simd_eq_16x8(control_bytes, target);
while (match) {
  int slot = ctz(match);  // count trailing zeros
  if (keys[slot] == key) return &values[slot];
  match &= match - 1;     // clear lowest set bit
}

When to use Swiss Tables

General-purpose maps where lookup dominates.
When absl::flat_hash_map or folly::F14VectorMap is already available as a dependency.
When the repository's teaching-sized implementation is not enough for production workloads.

Practical Selection Guide

Hash Table Selection

Benchmark Notes

The repository benchmarks hash tables with:

Insertion burst. Insert N elements, measure cycles/insertion.
Lookup mix. 80% successful lookups, 20% misses (typical cache pattern).
Deletion pattern. Delete half, re-insert half, measure rehash rate.
Memory footprint. RSS at peak and after steady state.

All benchmarks run with perf stat -e cycles,cache-misses,branch-misses.

References

Celis, P. (1985). Robin Hood Hashing. PhD thesis, University of WaterlooLink
Leiserson, C. E. et al. (2020). F14: A Memory-Efficient Hash Table. Facebook EngineeringLink
Lemire, D. (2018). Fast Random Integer Generation in an Interval. ACM Trans. Model. Comput. Simul.Link
Abseil Authors (2024). Swiss Tables Design Notes. Abseil C++ DocumentationLink

Hashing & Hash Tables ​

Overview ​

Hash Function Quality ​

What makes a hash function fast in C++ ​

Linear Probing and Its Limits ​

Why it is fast ​

Why it can fail ​

Robin Hood Hashing ​

The Robin Hood principle ​

Deletion ​

SIMD-accelerated Robin Hood ​

Swiss Tables and F14 ​

Key design ideas ​

When to use Swiss Tables ​

Practical Selection Guide ​

Benchmark Notes ​

References ​

References

Hashing & Hash Tables

Overview

Hash Function Quality

What makes a hash function fast in C++

Linear Probing and Its Limits

Why it is fast

Why it can fail

Robin Hood Hashing

The Robin Hood principle

Deletion

SIMD-accelerated Robin Hood

Swiss Tables and F14

Key design ideas

When to use Swiss Tables

Practical Selection Guide

Benchmark Notes

References