Markov Chains — Brief ☧
"In whom are hid all the treasures of wisdom and knowledge."
— Colossians 2:3 (KJV)
Deep version → | Back to MCMC →
Q: Imagine a board game where you roll a die and move to a new
square. The key rule: which squares you can land on next depends
only on where you are now — not on how you got there. Your history
does not matter, only your current position. This feels like a
simplifying assumption. Is there a mathematical framework built on it?
A: Yes — a Markov chain. It is a sequence of states (the
squares on the board) where the probability of moving to the next
state depends only on the current state. This is called the **Markov
property** — "the future depends only on the present, not the past."
You can describe the entire system with a transition matrix: a
table where entry P[i, j] gives the probability of moving from state
i to state j. Every row sums to 1 (you must go somewhere).
Q: If the game keeps going forever with the same rules, what
happens? Does the player just bounce around forever, or does something
settle down?
A: Great question. Under the right conditions (the chain is
ergodic — every state is reachable and the chain does not get stuck
in cycles), the system converges to a stationary distribution:
a fixed set of probabilities over the states that does not change no
matter how many more steps you take. The player still moves, but the
fraction of time spent in each state stabilizes.
Think of it like shuffling a deck of cards. After enough shuffles,
any further shuffling does not make the deck "more random." It has
already reached its stationary state.
Q: How quickly does it converge?
A: The mixing time measures how many steps it takes to get
close to the stationary distribution. A well-connected
graph (many transitions between states) mixes
fast. A poorly connected one (states that are hard to reach) mixes
slowly. In our constraint propagation system, faster mixing means
fewer iterations to converge on a solution.
Key Concepts
| Concept | Meaning | Our Project |
|---|---|---|
| Transition matrix | P[i,j] = prob of i → j | Constraint propagation steps |
| Stationary distribution | pi = pi * P | Fixpoint of arc consistency |
| Ergodicity | Every state reachable | No dead-end search branches |
| Mixing time | Steps to converge | Propagation iterations needed |
The table reveals a beautiful correspondence between probability theory and constraint solving. The transition matrix describes how the system evolves from one step to the next; in our system, each propagation step refines domains based on current values. The stationary distribution is the state where no further refinement is possible; in our system, that is the arc consistency fixpoint. Ergodicity means every state is reachable; in our system, this corresponds to the guarantee that our search strategy can reach every valid solution. And mixing time determines how many iterations are needed; in our system, this is the number of propagation rounds before domains stabilize.
Connection to our system: Constraint propagation is an iterative process:
at each step, domains are refined based on their current values, and the new domain configuration depends only on the current one (not on the history of how we got there). This is
precisely a Markov chain on domain configurations. The transition function is "apply all constraints once": given the current set of domain bitmasks, intersect each variable's domain with the constraints that mention it. When propagation reaches
a fixpoint (domains stop changing), it has reached the stationary distribution -- the unique domain configuration that is consistent with all constraints at the current level of analysis.
The time complexity of propagation is
determined by the mixing time of this underlying chain: tightly coupled constraints (where many variables interact) tend to converge quickly, while loosely coupled constraints may require more rounds. Our FPGA's ability to perform 770,000 propagation steps per second means that even chains with slow mixing converge in practical time.
Soli Deo Gloria