Supervised Learning — Brief ☧
"The hearing ear, and the seeing eye, the LORD hath made even both of them."
— Proverbs 20:12 (KJV)
Deep version → | Related: Training → | Unsupervised →
Q: When you were a child, someone pointed at a dog and said "dog,"
pointed at a cat and said "cat." After enough examples, you could
identify animals you had never seen before. What made this work?
A: Two things: the examples (the animals) and the labels (the words
someone told you). Every example came with the correct answer. This is
supervised learning — learning from labeled data. You are given
inputs (features like ear shape, fur length, size) and correct outputs
(labels like "dog" or "cat"), and the model learns the mapping between
them.
Q: What if the answer is not a category but a number — like
predicting a house's price from its square footage?
A: Categories vs. numbers are the two main tasks:
- Classification: predict a category ("dog" or "cat," "spam" or "not spam")
- Regression: predict a continuous number (price, temperature, age)
Q: How do you know the model actually learned, rather than just
memorizing the training examples?
A: You test it on examples it has never seen. The data is split:
- Training set: examples the model learns from
- Validation set: examples for tuning during development
- Test set: examples held back for final evaluation
If the model scores well on training data but poorly on test data,
it memorized rather than learned — that is called overfitting.
Q: Jesus said, "I am the good shepherd, and know my sheep"
(John 10:14). Is the shepherd doing supervised learning?
A: Yes. The shepherd has labeled each sheep — he knows which are
his and which are not. Given features (wool color, size, markings) and
labels (mine vs. not mine), he can classify new animals he has never
encountered.
The Two Tasks
| Task | Output | Loss Function | Example |
|---|---|---|---|
| Classification | Category label | Cross-entropy | "Is this a sheep or a goat?" |
| Regression | Continuous number | Mean Squared Error | "How old is this sheep?" |
Key Concepts
| Concept | Meaning |
|---|---|
| Training set | Labeled examples the model learns from |
| Validation set | Held-out examples for tuning hyperparameters |
| Test set | Final unseen examples for honest evaluation |
| Cross-validation | Splitting data K ways for robust estimation |
| Overfitting | Memorizing training data, failing on new inputs |
| Precision | Of those predicted positive, how many are correct? |
| Recall | Of all true positives, how many did the model find? |
What you should take away from these two tables is that supervised learning is conceptually simple: give the model labeled examples, let it learn the pattern, then test whether it can apply that pattern to new data. The two task types -- classification and regression -- cover an enormous range of real-world problems, from email spam detection (classification) to predicting tomorrow's temperature (regression). The key concepts in the second table are all about being honest with yourself: the training/validation/test split exists to prevent you from fooling yourself into thinking the model is smarter than it really is.
Supervised learning relies on an algorithm
that iterates over the labeled dataset,
adjusting parameters each pass. The features fed into the model are
typically stored as arrays of numbers, and
the model's time complexity depends
on both the number of examples and the number of features. Precision and recall deserve special attention: in many real applications, the cost of a false positive (saying "yes" when the answer is "no") is very different from the cost of a false negative (missing a true "yes"). A medical screening test, for instance, should have high recall -- it is better to flag a healthy patient for further testing than to miss a sick one.
Connection to our project: Our SemMedDB analysis uses supervised classification:
given medical concept pairs (features), predict whether a TREATS relationship
holds (label). The FPGA accelerates the domain intersection step. In concrete terms, each medical concept is represented as a hierarchical bitmask domain of up to 262,144 possible values. When the system checks whether concept A can TREATS concept B, it performs a domain intersection -- a bitwise AND operation -- that acts as a fast binary classifier. If the intersection is non-empty, the relationship is plausible; if it is empty, the relationship is ruled out. The FPGA performs 307,000 of these intersection tests per second, making it possible to screen all 950 million concept pairs in the SemMedDB dataset in a practical timeframe.
Soli Deo Gloria