Datalog — Brief ☧
Deep version → | Back to Prolog vs miniKanren →
Q: Genesis records a chain of fathers and sons: Abraham begat Isaac, Isaac begat Jacob, Jacob begat Judah. Suppose I want to ask: "Is Abraham an ancestor of Judah?" — not just a father, but a grandfather, great-grandfather, or further back. How would I express that?
A: You need two rules. The first says a parent is an ancestor (the direct case). The second says that if someone is a parent of a person who is themselves an ancestor, then they are an ancestor too (the chain case):
ancestor_chirho(X, Y) :- parent_chirho(X, Y). ancestor_chirho(X, Y) :- parent_chirho(X, Z), ancestor_chirho(Z, Y).The second rule uses recursion — the definition of
ancestorrefers to itself. This lets the logic follow chains of any length: parent, grandparent, great-grandparent, and so on.Q: That looks like Prolog. Is this just Prolog?
A: It looks similar, but this is Datalog — a deliberately restricted version of Prolog. Datalog forbids certain things that Prolog allows.
Q: Why would you want restrictions? That sounds like you are giving up power.
A: Because restrictions buy you guarantees. Think of it this way: a road with guardrails is more limited than an open field, but those guardrails guarantee you will not drive off a cliff. Datalog's restrictions guarantee three things that Prolog cannot:
- Every query terminates. No infinite loops, ever.
- Every query can be evaluated bottom-up — start from the facts and build toward the answer, instead of searching a tree top-down.
- Every query can be parallelized — and maps naturally to hardware.
The Three Restrictions
| What Datalog forbids | Why it helps |
|---|---|
Function symbols (cons, f(g(x)), etc.) | The universe of values stays finite — no inventing new terms during computation |
| Negation inside recursion | Computation is monotonic: facts are only added, never removed, so the process converges |
| Unsafe variables (variables not grounded in a positive fact) | Every variable is tied to actual data, preventing meaningless or infinite answers |
Biblical Genealogy in Datalog
% Facts (the database)
parent_chirho(abraham, isaac).
parent_chirho(isaac, jacob).
parent_chirho(jacob, judah).
parent_chirho(jacob, joseph).
% Rules (recursive)
ancestor_chirho(X, Y) :- parent_chirho(X, Y).
ancestor_chirho(X, Y) :- parent_chirho(X, Z), ancestor_chirho(Z, Y).
% Query — works in ANY direction
?- ancestor_chirho(abraham, judah). % YES
?- ancestor_chirho(abraham, X). % X = isaac, jacob, judah, joseph
?- ancestor_chirho(X, judah). % X = jacob, isaac, abraham
Every query terminates. Every query works in any direction. This is the subset of logic programming that maps most cleanly to hardware.
The example above illustrates all three of Datalog's strengths in action. The query ancestor_chirho(abraham, X) works because the rules are recursive — they follow chains of parent links to any depth. The query ancestor_chirho(X, judah) works backward because there are no function symbols to create new values; the system simply traces the parent chain in reverse. And both queries terminate because the universe of people is finite: there are only so many names in the database, so the recursion must eventually bottom out.
The Fixed-Point Connection
Datalog evaluation works by repeatedly applying the rules until no new facts appear — a fixed point. This process is equivalent to repeated matrix multiplication using boolean arithmetic:
ancestor = parent + parent * ancestor
Keep multiplying until the result stops changing. On an FPGA, this becomes Boolean matrix multiply in hardware — the same operation, executed in parallel across all entries at once.
To understand why this works, think of the parent relation as a matrix where row i, column j is 1 if person i is the parent of person j. Multiplying this matrix by itself (using OR for addition and AND for multiplication) gives you the "grandparent" relation. Multiplying again gives you "great-grandparent." Taking the union (OR) of all these products gives you the full ancestor relation. The fixed point is reached when another multiplication adds no new 1-bits to the matrix — meaning you have discovered all ancestor relationships that exist in the data.
This is why Datalog is the "sweet spot" between the expressiveness of full logic programming and the predictability hardware needs. It is expressive enough to compute transitive closures, recursive queries, and multi-hop joins. But it is restricted enough that every computation terminates, every intermediate result is finite, and the entire evaluation can be expressed as a sequence of matrix operations that map directly to FPGA hardware.
Soli Deo Gloria