5.6

Generative Models

VAE, diffusion models, score matching, generative flow.

Generative Models — Brief ☧

Deep version → | Related: Neural Nets → | Embeddings →


"The hearing ear, and the seeing eye, the LORD hath made even both of them."

— Proverbs 20:12 (KJV)

Q: A classifier looks at a photo and says "cat." But what if you

wanted to go the other direction — start from nothing and create a

realistic photo of a cat that never existed? How would that work?

A: You would need a model that has studied thousands of real cat

photos and learned the underlying patterns — the distribution of shapes,

colors, poses. Then it can sample from those patterns to create new

images that look like they belong. This is what generative models

do: they learn patterns from data and then produce new instances that

match those patterns.

Q: So a classifier asks "what is this?" and a generator asks

"what could exist?" — one reads, the other writes?

A: Exactly. And there are three main families of generators, each

with a different strategy:

  • VAE (Variational Autoencoder) — compress the input down to a small array of numbers (the "latent space"), then decode that back into a full output. Like summarizing a book to its key themes, then writing a new book from those themes.
  • GAN (Generative Adversarial Network) — a generator and a discriminator compete. The generator tries to create fakes; the discriminator tries to catch them. Both improve through competition.
  • Diffusion — start with a clean image, add noise step by step until it is pure static, then train the model to reverse that process. At generation time, start from noise and denoise step by step.

Q: In Genesis 1, God spoke and creation came into being — light,

seas, creatures, each "after its kind." Is a generative model creating

"after its kind"?

A: In a limited sense. It generates new images, text, or music

"after the kind" of its training data — new cats that look like cats,

new sentences that read like sentences. It discovers and reproduces the

patterns of creation, though the patterns themselves are the Creator's.

The Three Families

ModelCore IdeaHow It Generates
VAECompress to essence, decode to outputEncode to a latent array, sample, decode
GANTwo adversaries sharpen each otherGenerator creates, discriminator critiques, both improve
DiffusionReverse a noise process step by stepDenoise from pure randomness to coherent output
VAE:     input -> [encoder] -> latent z -> [decoder] -> output
GAN:     noise -> [generator] -> fake  vs  real -> [discriminator] -> real/fake?
Diffusion: image -> add noise x1000 -> pure noise -> denoise x1000 -> image

Each family uses a different algorithm

for training, but all share the same goal: learn the data's distribution

so well that you can sample new examples from it. The table above captures the essential character of each approach. VAEs are methodical -- they compress, then reconstruct, learning a smooth latent space you can explore. GANs are adversarial -- two networks sharpen each other through competition, often producing the sharpest results but at the cost of training instability. Diffusion models are patient -- they learn to undo destruction one small step at a time, producing the highest-quality results at the cost of slower generation. Understanding which family to reach for depends on your priorities: do you need fast generation (VAE), sharp outputs (GAN), or maximum quality (diffusion)?

Connection to our project: Our differentiable_chirho.py uses a latent

space over logic domains — soft bitmasks

that can be sampled to generate valid constraint solutions, bridging

generative and symbolic reasoning. Think of it this way: a constraint satisfaction problem has many possible solutions, and we want to generate valid ones. Our soft domain representation works like a VAE's latent space -- it encodes a compressed summary of which solutions are possible. By sampling from this soft representation using Gumbel-softmax, we can generate concrete assignments that satisfy the constraints. The FPGA then validates and refines these assignments at hardware speed, ensuring the generated solutions are not just plausible but provably correct.

Learn more in the deep version

Related: Embeddings | Neurosymbolic


Soli Deo Gloria

Self-Check 1/1

VAE stands for: