5.5

Unsupervised Learning

Clustering, K-means, autoencoders, dimensionality reduction.

Unsupervised Learning — Brief ☧

"The hearing ear, and the seeing eye, the LORD hath made even both of them."

— Proverbs 20:12 (KJV)

Deep version → | Related: Supervised → | Neural Networks →


Q: Imagine you dump a thousand unlabeled photos on a table. Nobody

tells you what is in them. But you start naturally sorting: these have

blue skies, those are indoors, these have faces. You found groups

without anyone telling you the categories. How?

A: By noticing patterns — similarities and differences — in the data

itself. This is unsupervised learning: discovering structure in data

that has no labels. The algorithm

must find groupings, patterns, and compressed representations on its own.

Q: What kinds of structure can an algorithm discover?

A: Three main kinds:

  • Clusters (K-Means, DBSCAN): groups of similar items — "these photos all show beaches, those all show mountains"
  • Dimensions (PCA, t-SNE): the few traits that explain most of the variation — "brightness and color temperature explain 80% of the differences"
  • Representations (Autoencoders): compressed summaries that preserve the essential information — like shrinking a photo to a thumbnail that still captures the scene

Q: How is this different from supervised learning?

A: Supervised learning has a teacher saying "this is a beach, this is

a mountain" for every example. Unsupervised learning has no teacher — it

finds the groupings by measuring similarity among the data points

themselves.

Q: In the Parable of the Wheat and Tares (Matthew 13:24-30), the

reapers separate wheat from tares at harvest without anyone labeling

each plant. Is that unsupervised learning?

A: Yes. The reapers observe natural differences — wheat has heavy,

golden heads; tares are lighter and darker. They find clusters in

the data without prior labels, just as K-Means groups data points by

proximity.

Key Methods

MethodWhat It FindsHow It Works
K-MeansK clusters by proximityAssign each point to the nearest center, then update centers; repeat via iteration
DBSCANDense clusters of any shapeFind points with many neighbors; expand outward
PCAPrincipal axes of variationFind the directions along which data varies most
t-SNE2D visualization of high-dimensional dataMap a high-dimensional array of features to two dimensions, preserving neighborhoods
AutoencoderCompressed representationEncode input to a small latent space, then decode — like compressing a file and reconstructing it

The key insight from this table is that unsupervised learning is not one technique but a family of approaches, each answering a different question about your data. Clustering asks "what groups naturally exist?" Dimensionality reduction asks "what are the few key factors that explain most of the variation?" And representation learning asks "how can I compress this data while preserving its essence?" In practice, these often work together: you might use PCA to reduce your data to manageable dimensions, then run K-Means on the reduced data to find clusters.

Think of clustering as building a tree of

groups: broad categories at the top, finer sub-groups below. Dimensionality

reduction helps you visualize a graph of

relationships among thousands of data points that would otherwise be

impossible to see. And autoencoders learn to distill the data down to its essence -- the smallest representation from which the original can be approximately recovered.

Connection to our project: Our domain hierarchy is unsupervised compression.

262,144 domain values are compressed to 4,096 summary bits, then to 64 bits.

This 4000:1 compression is like an autoencoder — but exact (lossless via

bitmasks), not approximate. Where an autoencoder must learn its compression through training on examples, our hierarchical OR-based compression is derived directly from the data structure itself -- no training needed. A bit at Level 1 is set to 1 if and only if the corresponding 64-bit word at Level 0 contains at least one set bit. This means the summary is perfectly faithful: you never lose information about which values are possible, you just get a faster way to check whether a region of the domain is worth examining in detail.

Learn more in the deep version

Related: Supervised Learning | Training


Soli Deo Gloria

Self-Check 1/1

K-means is a _____ algorithm.