Generative Models — Brief ☧
Deep version → | Related: Neural Nets → | Embeddings →
"The hearing ear, and the seeing eye, the LORD hath made even both of them."
— Proverbs 20:12 (KJV)
Q: A classifier looks at a photo and says "cat." But what if you
wanted to go the other direction — start from nothing and create a
realistic photo of a cat that never existed? How would that work?
A: You would need a model that has studied thousands of real cat
photos and learned the underlying patterns — the distribution of shapes,
colors, poses. Then it can sample from those patterns to create new
images that look like they belong. This is what generative models
do: they learn patterns from data and then produce new instances that
match those patterns.
Q: So a classifier asks "what is this?" and a generator asks
"what could exist?" — one reads, the other writes?
A: Exactly. And there are three main families of generators, each
with a different strategy:
- VAE (Variational Autoencoder) — compress the input down to a small array of numbers (the "latent space"), then decode that back into a full output. Like summarizing a book to its key themes, then writing a new book from those themes.
- GAN (Generative Adversarial Network) — a generator and a discriminator compete. The generator tries to create fakes; the discriminator tries to catch them. Both improve through competition.
- Diffusion — start with a clean image, add noise step by step until it is pure static, then train the model to reverse that process. At generation time, start from noise and denoise step by step.
Q: In Genesis 1, God spoke and creation came into being — light,
seas, creatures, each "after its kind." Is a generative model creating
"after its kind"?
A: In a limited sense. It generates new images, text, or music
"after the kind" of its training data — new cats that look like cats,
new sentences that read like sentences. It discovers and reproduces the
patterns of creation, though the patterns themselves are the Creator's.
The Three Families
| Model | Core Idea | How It Generates |
|---|---|---|
| VAE | Compress to essence, decode to output | Encode to a latent array, sample, decode |
| GAN | Two adversaries sharpen each other | Generator creates, discriminator critiques, both improve |
| Diffusion | Reverse a noise process step by step | Denoise from pure randomness to coherent output |
VAE: input -> [encoder] -> latent z -> [decoder] -> output
GAN: noise -> [generator] -> fake vs real -> [discriminator] -> real/fake?
Diffusion: image -> add noise x1000 -> pure noise -> denoise x1000 -> image
Each family uses a different algorithm
for training, but all share the same goal: learn the data's distribution
so well that you can sample new examples from it. The table above captures the essential character of each approach. VAEs are methodical -- they compress, then reconstruct, learning a smooth latent space you can explore. GANs are adversarial -- two networks sharpen each other through competition, often producing the sharpest results but at the cost of training instability. Diffusion models are patient -- they learn to undo destruction one small step at a time, producing the highest-quality results at the cost of slower generation. Understanding which family to reach for depends on your priorities: do you need fast generation (VAE), sharp outputs (GAN), or maximum quality (diffusion)?
Connection to our project: Our differentiable_chirho.py uses a latent
space over logic domains — soft bitmasks
that can be sampled to generate valid constraint solutions, bridging
generative and symbolic reasoning. Think of it this way: a constraint satisfaction problem has many possible solutions, and we want to generate valid ones. Our soft domain representation works like a VAE's latent space -- it encodes a compressed summary of which solutions are possible. By sampling from this soft representation using Gumbel-softmax, we can generate concrete assignments that satisfy the constraints. The FPGA then validates and refines these assignments at hardware speed, ensuring the generated solutions are not just plausible but provably correct.
Soli Deo Gloria