ML in Rust — Brief ☧
Deep version → | Related: PyTorch vs TF → | Neural Nets →
"The hearing ear, and the seeing eye, the LORD hath made even both of them."
— Proverbs 20:12 (KJV)
Q: Most machine learning code is written in Python, but Python is
slow — often 50-100x slower than C for raw computation. How does ML
get away with using a slow language?
A: It cheats. The Python code is just the "steering wheel." The
actual heavy computation — matrix multiplications, convolutions,
gradient calculations — runs in C++ and CUDA libraries underneath.
Python tells those libraries what to do, but the real work happens in
compiled code.
Q: So what happens when you want to deploy a model on a server,
a phone, or an embedded device where you cannot afford Python's
overhead?
A: You need a language that is fast from top to bottom. Rust
fills this role: it runs as fast as C++, prevents memory bugs at compile
time (no segfaults, no garbage collector), and lets you write the entire
stack — model definition, training loop, inference server — in one
safe, fast language.
Q: But does Rust have ML libraries?
A: Three main ones, each with different strengths:
- tch-rs: Rust bindings to PyTorch's C++ core (LibTorch). Familiar API for anyone who knows PyTorch, full GPU support.
- candle: Built by Hugging Face in pure Rust, minimal dependencies. Great for inference and serverless deployment.
- burn: Backend-agnostic, community-driven. Can target CPU, GPU, and even WebAssembly (WASM) for running models in a browser.
Q: "There was no smith in all the land of Israel" (1 Samuel 13:19).
The Philistines controlled iron-working. Is Rust like having your own
forge — strategic independence from the C++/Python supply chain?
A: That captures it well. With Rust, you control the full stack
from memory allocation to GPU kernels. No Python GIL bottleneck, no
garbage collector pauses, no C++ undefined behavior. Your own forge.
The Three Forges
| Library | Backing | Approach | Best For |
|---|---|---|---|
| tch-rs | PyTorch C++ (LibTorch) | Rust bindings to PyTorch | Familiar API, GPU support |
| candle | Hugging Face | Pure Rust, minimal deps | Inference, serverless |
| burn | Community | Backend-agnostic | Portability, WASM |
Python/PyTorch Rust/tch-rs Rust/candle
───────────── ─────────── ────────────
import torch use tch::Tensor; use candle_core::Tensor;
x = torch.ones(3) let x = Tensor:: let x = Tensor::
ones(&[3], ...); ones((3,), ...)?;
The table above reveals a strategic choice depending on your priorities. If you need the full power of PyTorch (including GPU training), tch-rs gives you that through C++ bindings. If you want a single, self-contained binary with no external dependencies, candle is the pure-Rust answer. And if you need your model to run on multiple platforms including web browsers via WebAssembly, burn abstracts over all backends. All three represent data as tensors --
multi-dimensional arrays of numbers, the
same fundamental data structure as PyTorch and TensorFlow. The difference
is that every algorithm — from the
training loop to inference — runs in compiled Rust with zero interpreter
overhead. This matters when
time complexity meets real-world
latency requirements: a serverless function that cold-starts in 50 milliseconds versus 8 seconds is not just faster, it changes what kinds of architectures are practical.
Connection to our project: Our core engine (rust_chirho/) is already
in Rust, which makes this integration particularly natural. Using tch-rs or candle, we can run neural inference alongside
the bit-parallel constraint solver in the same process — no Python overhead,
no serialization, no inter-process communication. The differentiable_chirho.rs module already
implements Gumbel-softmax and semiring gradients in pure Rust. This means the entire neurosymbolic pipeline -- from neural encoder to Gumbel-softmax bridge to FPGA-accelerated constraint propagation -- can live in a single Rust binary. The practical benefit is enormous: no Python dependency management, no container overhead, no serialization bottleneck between the neural and symbolic components. It is the software equivalent of forging a sword from a single piece of steel rather than welding separate parts together.
Soli Deo Gloria