5.11

ML in Rust

tch-rs, Candle, Burn. Compilation-time safety for model inference.

ML in Rust — Brief ☧

Deep version → | Related: PyTorch vs TF → | Neural Nets →


"The hearing ear, and the seeing eye, the LORD hath made even both of them."

— Proverbs 20:12 (KJV)

Q: Most machine learning code is written in Python, but Python is

slow — often 50-100x slower than C for raw computation. How does ML

get away with using a slow language?

A: It cheats. The Python code is just the "steering wheel." The

actual heavy computation — matrix multiplications, convolutions,

gradient calculations — runs in C++ and CUDA libraries underneath.

Python tells those libraries what to do, but the real work happens in

compiled code.

Q: So what happens when you want to deploy a model on a server,

a phone, or an embedded device where you cannot afford Python's

overhead?

A: You need a language that is fast from top to bottom. Rust

fills this role: it runs as fast as C++, prevents memory bugs at compile

time (no segfaults, no garbage collector), and lets you write the entire

stack — model definition, training loop, inference server — in one

safe, fast language.

Q: But does Rust have ML libraries?

A: Three main ones, each with different strengths:

  • tch-rs: Rust bindings to PyTorch's C++ core (LibTorch). Familiar API for anyone who knows PyTorch, full GPU support.
  • candle: Built by Hugging Face in pure Rust, minimal dependencies. Great for inference and serverless deployment.
  • burn: Backend-agnostic, community-driven. Can target CPU, GPU, and even WebAssembly (WASM) for running models in a browser.

Q: "There was no smith in all the land of Israel" (1 Samuel 13:19).

The Philistines controlled iron-working. Is Rust like having your own

forge — strategic independence from the C++/Python supply chain?

A: That captures it well. With Rust, you control the full stack

from memory allocation to GPU kernels. No Python GIL bottleneck, no

garbage collector pauses, no C++ undefined behavior. Your own forge.

The Three Forges

LibraryBackingApproachBest For
tch-rsPyTorch C++ (LibTorch)Rust bindings to PyTorchFamiliar API, GPU support
candleHugging FacePure Rust, minimal depsInference, serverless
burnCommunityBackend-agnosticPortability, WASM
Python/PyTorch         Rust/tch-rs          Rust/candle
─────────────         ───────────          ────────────
import torch          use tch::Tensor;     use candle_core::Tensor;
x = torch.ones(3)    let x = Tensor::     let x = Tensor::
                        ones(&[3], ...);     ones((3,), ...)?;

The table above reveals a strategic choice depending on your priorities. If you need the full power of PyTorch (including GPU training), tch-rs gives you that through C++ bindings. If you want a single, self-contained binary with no external dependencies, candle is the pure-Rust answer. And if you need your model to run on multiple platforms including web browsers via WebAssembly, burn abstracts over all backends. All three represent data as tensors --

multi-dimensional arrays of numbers, the

same fundamental data structure as PyTorch and TensorFlow. The difference

is that every algorithm — from the

training loop to inference — runs in compiled Rust with zero interpreter

overhead. This matters when

time complexity meets real-world

latency requirements: a serverless function that cold-starts in 50 milliseconds versus 8 seconds is not just faster, it changes what kinds of architectures are practical.

Connection to our project: Our core engine (rust_chirho/) is already

in Rust, which makes this integration particularly natural. Using tch-rs or candle, we can run neural inference alongside

the bit-parallel constraint solver in the same process — no Python overhead,

no serialization, no inter-process communication. The differentiable_chirho.rs module already

implements Gumbel-softmax and semiring gradients in pure Rust. This means the entire neurosymbolic pipeline -- from neural encoder to Gumbel-softmax bridge to FPGA-accelerated constraint propagation -- can live in a single Rust binary. The practical benefit is enormous: no Python dependency management, no container overhead, no serialization bottleneck between the neural and symbolic components. It is the software equivalent of forging a sword from a single piece of steel rather than welding separate parts together.

Learn more in the deep version

Related: PyTorch vs TensorFlow | NVIDIA/AMD


Soli Deo Gloria

Self-Check 1/1

The tch-rs crate provides Rust bindings for _____.