1.5

Hash Tables

Hash consing, amortized O(1), term identity preservation.

Lab

Hash Tables — Brief ☧

Deep version → | Lab → | Flashcards →


Q: In Revelation 20:12, the Book of Life is opened, and each person

is judged by whether their name is written in it. Imagine millions of

names in this book. A soul stands before the throne. How long does it

take to check whether their name is recorded?

A: That depends on how the book is organized. If the names are

written in the order people lived — no particular arrangement — the

angel must read from the beginning, name by name, until finding a

match (or reaching the end). That is a scan through a list:

O(N), where N is the number of

names. For millions of names, that is slow.

Q: Could we do better by sorting the names first?

A: Yes! If the names are sorted — aleph to tav — the angel can

use binary search: open to

the middle, decide if the name comes before or after, discard the

wrong half, repeat. That gives

O(log N) — about 20 steps for a

million names. Much better.

Q: Can we do even better than that?

A: What if we had a rule — a formula — that converts any name into

a page number? "Judah" always gives page 7. "Dan" always gives page 5.

To check a name, compute its page number and go directly there. No

searching at all. That formula is a

hash function, and the book

organized this way is a

hash table.

The lookup is O(1) — it does

not grow with the number of names.

Q: But what if the formula sends two different names to the same

page?

A: That is a collision — like

two guests assigned to the same seat at a feast. There are two common

ways to handle it:

  • Chaining: keep a short list of names at that page. When you arrive at page 7 and find both "Judah" and "Joel," scan the short list.
  • Open addressing: if page 7 is taken, look at the next empty page.

A good hash function spreads

names evenly across pages, making collisions rare. When collisions are

rare, lookup stays effectively O(1).

Q: How does this connect to our FPGA?

A: Hash consing is the heart of our term store. Every term

(like cons(1, cons(2, nil))) is hashed to a unique integer ID.

If two terms are structurally identical, they get the same ID —

guaranteed. So instead of walking two

trees to compare them node by node,

we just compare their integer IDs. Equality check becomes a single

comparison: O(1).

The Key Operations

OperationHash TableListSorted Array
LookupO(1)O(N)O(log N)
InsertO(1)O(1) appendO(N) shift
DeleteO(1)O(N) findO(N) shift
Sorted orderO(N log N)O(N log N)O(N) scan

The key insight from this comparison is that hash tables give you the best of all worlds for single-key lookups: O(1) for insert, lookup, and delete. The price you pay is that the data has no inherent order — if you need to scan entries in sorted sequence, a hash table cannot help you (you would need to sort first, which costs O(N log N)). But for the operations our FPGA cares about — "does this term exist?" and "what is the ID of this term?" — hash tables are the perfect fit.

Hash tables trade order for speed. When you need fast lookup by key

— whether names in the Book of Life or term IDs in a solver — hash

tables are the foundation.

Now that you understand the basic idea — a formula that converts keys to locations — the deep version will show you how hash consing works in practice, how our FPGA uses FNV-1a hashing for O(1) term lookup, and the engineering details of collision resolution.

Learn more in the deep version

Related: Trees | Big-O

BitmasksSorting

Soli Deo Gloria

Self-Check 1/4

What is the average-case lookup time for a hash table?

Lab

Hash Tables — Python Lab