4.4

Memory: BRAM, HBM, DDR

Embedded RAM, HBM2 (8GB, 460 GB/s), dual-port BRAM.

Memory Architecture — Brief ☧

Deep version → | Next: AXI Protocol → | Back: Timing Closure →


Q: "Lay not up for yourselves treasures upon earth, where moth and

rust doth corrupt... but lay up for yourselves treasures in heaven"

(Matthew 6:19-20). In FPGA design, where do we lay up our data?

A: It depends on two things: how fast we need the data and how

much of it we have. Think about your own life — you keep your

phone in your pocket (tiny, instant access), important papers in a

desk drawer (more room, a few seconds to fetch), and old files in a

storage unit across town (huge capacity, takes a trip to retrieve).

Memory in an FPGA works the same way: it forms a hierarchy from

tiny-and-fast at the top to huge-and-slow at the bottom.

Q: What do the layers look like concretely?

A: Like the Temple with its nested courts — the Holy of Holies at

the center (most precious, smallest) out to the surrounding camps:

MemoryCapacityLatencyAnalogy
Flip-flopsIndividual bits0 cycles (instant)Your pocket — always right there
BRAM36 Kbit blocks1-2 cyclesYour desk drawer
URAM288 Kbit blocks1-2 cyclesA filing cabinet nearby
HBM8 GB80-100 cyclesA warehouse in your building
DDR16-64 GB100-200 cyclesThe storage unit across town

Notice the pattern: as capacity grows, so does latency. This is just

like the time-complexity tradeoffs you see in

data structures — a tiny array is fast to scan, but a

hash table lets you store and find much more data

(at the cost of more overhead per access).

Q: Which layers does our design actually use?

A: Our domain data (the bitmask hierarchies) lives in

HBM — 8 GB of High Bandwidth Memory with ~460 GB/s bandwidth.

Each domain is a depth-3 hierarchy of increasingly detailed bitmasks:

L0 (8 bytes) + L1 (512 bytes) + L2 (32,768 bytes) = roughly 33 KB

per domain. HBM stores thousands of domains; BRAM caches the ones

currently being processed by each lane — like moving a file from the

warehouse to your desk while you work on it.

Our HBM Layout

HBM Address Space (8 GB)
├── Slot 0: Domain A  (stride = 2,105,376 bytes)
├── Slot 1: Domain B
├── ...
└── Slot 255: Domain 255

Each slot reserves a fixed stride so that address calculation is simple

— the hardware computes base + slot * stride without needing a lookup

table, which keeps the algorithm running in

constant time per domain access.

Learn more in the deep version

Related: Our Architecture | AXI Protocol


Soli Deo Gloria

Self-Check 1/1

HBM stands for: