Memory Architecture — Brief ☧

Deep version → | Next: AXI Protocol → | Back: Timing Closure →

Q: "Lay not up for yourselves treasures upon earth, where moth and

rust doth corrupt... but lay up for yourselves treasures in heaven"

(Matthew 6:19-20). In FPGA design, where do we lay up our data?

A: It depends on two things: how fast we need the data and how

much of it we have. Think about your own life — you keep your

phone in your pocket (tiny, instant access), important papers in a

desk drawer (more room, a few seconds to fetch), and old files in a

storage unit across town (huge capacity, takes a trip to retrieve).

Memory in an FPGA works the same way: it forms a hierarchy from

tiny-and-fast at the top to huge-and-slow at the bottom.

Q: What do the layers look like concretely?

A: Like the Temple with its nested courts — the Holy of Holies at

the center (most precious, smallest) out to the surrounding camps:

Memory Capacity Latency Analogy
Flip-flops Individual bits 0 cycles (instant) Your pocket — always right there
BRAM 36 Kbit blocks 1-2 cycles Your desk drawer
URAM 288 Kbit blocks 1-2 cycles A filing cabinet nearby
HBM 8 GB 80-100 cycles A warehouse in your building
DDR 16-64 GB 100-200 cycles The storage unit across town

Notice the pattern: as capacity grows, so does latency. This is just

like the time-complexity tradeoffs you see in

data structures — a tiny array is fast to scan, but a

hash table lets you store and find much more data

(at the cost of more overhead per access).

Q: Which layers does our design actually use?

A: Our domain data (the bitmask hierarchies) lives in

HBM — 8 GB of High Bandwidth Memory with ~460 GB/s bandwidth.

Each domain is a depth-3 hierarchy of increasingly detailed bitmasks:

L0 (8 bytes) + L1 (512 bytes) + L2 (32,768 bytes) = roughly 33 KB

per domain. HBM stores thousands of domains; BRAM caches the ones

currently being processed by each lane — like moving a file from the

warehouse to your desk while you work on it.

Memory	Capacity	Latency	Analogy
Flip-flops	Individual bits	0 cycles (instant)	Your pocket — always right there
BRAM	36 Kbit blocks	1-2 cycles	Your desk drawer
URAM	288 Kbit blocks	1-2 cycles	A filing cabinet nearby
HBM	8 GB	80-100 cycles	A warehouse in your building
DDR	16-64 GB	100-200 cycles	The storage unit across town

Our HBM Layout

HBM Address Space (8 GB)
├── Slot 0: Domain A  (stride = 2,105,376 bytes)
├── Slot 1: Domain B
├── ...
└── Slot 255: Domain 255

Each slot reserves a fixed stride so that address calculation is simple

— the hardware computes base + slot * stride without needing a lookup

table, which keeps the algorithm running in

constant time per domain access.

Learn more in the deep version

Related: Our Architecture | AXI Protocol

Soli Deo Gloria