5.4

Supervised Learning

Regression, classification, cross-entropy, MSE loss.

Supervised Learning — Brief ☧

"The hearing ear, and the seeing eye, the LORD hath made even both of them."

— Proverbs 20:12 (KJV)

Deep version → | Related: Training → | Unsupervised →


Q: When you were a child, someone pointed at a dog and said "dog,"

pointed at a cat and said "cat." After enough examples, you could

identify animals you had never seen before. What made this work?

A: Two things: the examples (the animals) and the labels (the words

someone told you). Every example came with the correct answer. This is

supervised learning — learning from labeled data. You are given

inputs (features like ear shape, fur length, size) and correct outputs

(labels like "dog" or "cat"), and the model learns the mapping between

them.

Q: What if the answer is not a category but a number — like

predicting a house's price from its square footage?

A: Categories vs. numbers are the two main tasks:

  • Classification: predict a category ("dog" or "cat," "spam" or "not spam")
  • Regression: predict a continuous number (price, temperature, age)

Q: How do you know the model actually learned, rather than just

memorizing the training examples?

A: You test it on examples it has never seen. The data is split:

  • Training set: examples the model learns from
  • Validation set: examples for tuning during development
  • Test set: examples held back for final evaluation

If the model scores well on training data but poorly on test data,

it memorized rather than learned — that is called overfitting.

Q: Jesus said, "I am the good shepherd, and know my sheep"

(John 10:14). Is the shepherd doing supervised learning?

A: Yes. The shepherd has labeled each sheep — he knows which are

his and which are not. Given features (wool color, size, markings) and

labels (mine vs. not mine), he can classify new animals he has never

encountered.

The Two Tasks

TaskOutputLoss FunctionExample
ClassificationCategory labelCross-entropy"Is this a sheep or a goat?"
RegressionContinuous numberMean Squared Error"How old is this sheep?"

Key Concepts

ConceptMeaning
Training setLabeled examples the model learns from
Validation setHeld-out examples for tuning hyperparameters
Test setFinal unseen examples for honest evaluation
Cross-validationSplitting data K ways for robust estimation
OverfittingMemorizing training data, failing on new inputs
PrecisionOf those predicted positive, how many are correct?
RecallOf all true positives, how many did the model find?

What you should take away from these two tables is that supervised learning is conceptually simple: give the model labeled examples, let it learn the pattern, then test whether it can apply that pattern to new data. The two task types -- classification and regression -- cover an enormous range of real-world problems, from email spam detection (classification) to predicting tomorrow's temperature (regression). The key concepts in the second table are all about being honest with yourself: the training/validation/test split exists to prevent you from fooling yourself into thinking the model is smarter than it really is.

Supervised learning relies on an algorithm

that iterates over the labeled dataset,

adjusting parameters each pass. The features fed into the model are

typically stored as arrays of numbers, and

the model's time complexity depends

on both the number of examples and the number of features. Precision and recall deserve special attention: in many real applications, the cost of a false positive (saying "yes" when the answer is "no") is very different from the cost of a false negative (missing a true "yes"). A medical screening test, for instance, should have high recall -- it is better to flag a healthy patient for further testing than to miss a sick one.

Connection to our project: Our SemMedDB analysis uses supervised classification:

given medical concept pairs (features), predict whether a TREATS relationship

holds (label). The FPGA accelerates the domain intersection step. In concrete terms, each medical concept is represented as a hierarchical bitmask domain of up to 262,144 possible values. When the system checks whether concept A can TREATS concept B, it performs a domain intersection -- a bitwise AND operation -- that acts as a fast binary classifier. If the intersection is non-empty, the relationship is plausible; if it is empty, the relationship is ruled out. The FPGA performs 307,000 of these intersection tests per second, making it possible to screen all 950 million concept pairs in the SemMedDB dataset in a practical timeframe.

Learn more in the deep version

Related: Unsupervised | Neural Networks


Soli Deo Gloria

Self-Check 1/1

Supervised learning requires: