Hash Tables — Brief ☧
Deep version → | Lab → | Flashcards →
Q: In Revelation 20:12, the Book of Life is opened, and each person
is judged by whether their name is written in it. Imagine millions of
names in this book. A soul stands before the throne. How long does it
take to check whether their name is recorded?
A: That depends on how the book is organized. If the names are
written in the order people lived — no particular arrangement — the
angel must read from the beginning, name by name, until finding a
match (or reaching the end). That is a scan through a list:
O(N), where N is the number of
names. For millions of names, that is slow.
Q: Could we do better by sorting the names first?
A: Yes! If the names are sorted — aleph to tav — the angel can
use binary search: open to
the middle, decide if the name comes before or after, discard the
wrong half, repeat. That gives
O(log N) — about 20 steps for a
million names. Much better.
Q: Can we do even better than that?
A: What if we had a rule — a formula — that converts any name into
a page number? "Judah" always gives page 7. "Dan" always gives page 5.
To check a name, compute its page number and go directly there. No
searching at all. That formula is a
hash function, and the book
organized this way is a
The lookup is O(1) — it does
not grow with the number of names.
Q: But what if the formula sends two different names to the same
page?
A: That is a collision — like
two guests assigned to the same seat at a feast. There are two common
ways to handle it:
- Chaining: keep a short list of names at that page. When you arrive at page 7 and find both "Judah" and "Joel," scan the short list.
- Open addressing: if page 7 is taken, look at the next empty page.
A good hash function spreads
names evenly across pages, making collisions rare. When collisions are
rare, lookup stays effectively O(1).
Q: How does this connect to our FPGA?
A: Hash consing is the heart of our term store. Every term
(like
cons(1, cons(2, nil))) is hashed to a unique integer ID.If two terms are structurally identical, they get the same ID —
guaranteed. So instead of walking two
trees to compare them node by node,
we just compare their integer IDs. Equality check becomes a single
comparison: O(1).
The Key Operations
| Operation | Hash Table | List | Sorted Array |
|---|---|---|---|
| Lookup | O(1) | O(N) | O(log N) |
| Insert | O(1) | O(1) append | O(N) shift |
| Delete | O(1) | O(N) find | O(N) shift |
| Sorted order | O(N log N) | O(N log N) | O(N) scan |
The key insight from this comparison is that hash tables give you the best of all worlds for single-key lookups: O(1) for insert, lookup, and delete. The price you pay is that the data has no inherent order — if you need to scan entries in sorted sequence, a hash table cannot help you (you would need to sort first, which costs O(N log N)). But for the operations our FPGA cares about — "does this term exist?" and "what is the ID of this term?" — hash tables are the perfect fit.
Hash tables trade order for speed. When you need fast lookup by key
— whether names in the Book of Life or term IDs in a solver — hash
tables are the foundation.
Now that you understand the basic idea — a formula that converts keys to locations — the deep version will show you how hash consing works in practice, how our FPGA uses FNV-1a hashing for O(1) term lookup, and the engineering details of collision resolution.
Learn more in the deep version
Bitmasks Sorting
Soli Deo Gloria