A step-by-step procedure for solving a problem or performing a computation.
Like a recipe: a finite sequence of well-defined instructions. An algorithm takes an input, processes it through defined steps, and produces an output.
Consider a simple example: finding the largest number in a list. The algorithm is: (1) assume the first number is the largest, (2) look at each remaining number, (3) if it is bigger than your current largest, update your answer, (4) when you reach the end, you have the largest number.
The study of algorithms focuses on two questions: **correctness** (does it always produce the right answer?) and **efficiency** (how fast does it run, and how much memory does it use?). Two algorithms can solve the same problem but differ enormously in speed — one might finish in a second while the other takes years. That difference is what Big-O notation measures.
A contiguous block of memory holding elements of the same type, accessed by index.
Think of numbered mailboxes in a row at the post office. Mailbox 0, mailbox 1, mailbox 2, and so on. To check mailbox 5, you walk straight to position 5 — you do not need to open mailboxes 0 through 4 first. This direct access by number (the "index") takes O(1) time.
The tradeoff: inserting an element in the middle means shifting everything after it down by one position to make room, which takes O(N) time. And the array has a fixed size — if you fill all the mailboxes, you need to build a whole new, larger row and move everything over.
In Python, the `list` type is a dynamic array (it resizes automatically). In JavaScript, `Array` works similarly. In C and Rust, arrays have a fixed size set at creation. Arrays are the most fundamental data structure — nearly every other structure (hash tables, heaps, stacks) is built on top of them.
A mathematical notation describing the upper bound of an algorithm's growth rate as input size increases.
Big-O answers the question: "If I double my input, how much longer will this take?"
- **O(1)** — constant: the time does not change. Looking up a value by index in an array always takes the same time whether the array has 10 items or 10 million.
- **O(log N)** — logarithmic: doubling the input adds just one more step. Binary search works this way — searching 1,000 items takes ~10 steps, and 1,000,000 items takes ~20 steps.
- **O(N)** — linear: double the input, double the time. Scanning every item in a list once.
- **O(N²)** — quadratic: double the input, quadruple the time. Comparing every item against every other item (nested loops).
- **O(2^N)** — exponential: adding one more item doubles the time. Trying every possible combination (brute force).
We ignore constant factors because they depend on hardware. An O(N) algorithm on a slow computer will eventually beat an O(N²) algorithm on a fast one — you just have to make N large enough.
A search algorithm that finds an item in a sorted collection by repeatedly halving the search space.
Imagine a dictionary with 1,000 pages. You are looking for the word "mango." You open to the middle (page 500). The words there start with "L" — too early. So you take the second half (pages 500-1000) and open to its middle (page 750). "P" — too far. Take pages 500-750, open to 625. "M" — close! A few more halvings and you find "mango."
Each step eliminates half the remaining pages. After 10 halvings, you have narrowed 1,000 pages down to 1. That is O(log N) — extraordinarily efficient. For a million items, you need only about 20 comparisons.
The critical requirement: **the data must be sorted**. Binary search on an unsorted list gives nonsense. This is why sorting algorithms matter — they enable fast search.
In code, binary search maintains two pointers (low and high) that progressively narrow the search window. Most languages provide a built-in: Python's `bisect`, Java's `Arrays.binarySearch()`, C++'s `std::lower_bound`.
A pattern of bits used to select, set, or test specific positions within a binary number.
Think of a row of light switches, each one either ON (1) or OFF (0). Each switch represents whether something is present or absent. With 8 switches, you can represent any combination of 8 items using a single number from 0 to 255.
For example, with the twelve tribes of Israel, a 12-bit number like `101000001001` means "Reuben (bit 0), Judah (bit 3), and Joseph (bit 10) are present." To check if Judah is present, you AND the mask with `000000001000` — if the result is not zero, Judah is there. One CPU instruction, regardless of how many tribes exist.
Key operations: AND (intersection — who is in both groups?), OR (union — who is in either group?), XOR (difference — who is in one but not the other?), NOT (complement — who is absent?), popcount (how many bits are set?).
This is the foundation of our entire FPGA constraint solver — every variable's domain is a bitmask, and constraint propagation is just AND operations.
A value that is either true or false. Named after mathematician George Boole.
The simplest possible data type: yes or no, 1 or 0, on or off. Every decision a computer makes — from "is this pixel red?" to "should I branch left or right?" — ultimately reduces to booleans.
George Boole (1815-1864) showed that logical reasoning could be expressed as algebra: AND, OR, and NOT operating on true/false values. This "Boolean algebra" turned out to be exactly what electrical circuits compute — a high voltage is "true," a low voltage is "false," and logic gates perform AND, OR, and NOT.
In programming, booleans control flow: `if (x > 5)` evaluates to true or false. In hardware, every transistor is a boolean switch. The entire digital world — from your phone to the FPGA — is built from billions of boolean operations per second.
When two different inputs produce the same hash value, mapping to the same position in a hash table.
Imagine 30 students assigned to 26 mailboxes (A-Z) based on the first letter of their last name. Two students named "Smith" and "Santos" both map to box "S" — that is a collision.
There are two main strategies to handle this:
**Chaining**: Each mailbox holds a short list. Box "S" contains [Smith, Santos]. To find Santos, go to box S and scan the short list. As long as the lists stay short (good hash function), this is fast.
**Open addressing**: If box S is taken, look at box T, then U, and so on until you find an empty slot. To find Santos later, start at S and probe forward until you find it.
A perfect hash function would give every input a unique position — no collisions at all. In practice, the birthday paradox means collisions are more common than you might expect: with only 23 people, there is a 50% chance two share a birthday (mapping to the same day out of 365).
An operation that takes the same amount of time regardless of input size. Denoted O(1).
The "holy grail" of algorithm efficiency. Whether your dataset has 10 items or 10 billion, the operation takes the same amount of time.
Examples: looking up element `arr[5]` in an array (just compute the memory address), checking if a bit is set in a bitmask (one AND operation), or looking up a key in a well-built hash table.
The "1" in O(1) does NOT mean "one step." It means "a fixed number of steps that does not depend on N." A function that always does exactly 47 operations is still O(1) — the point is that 47 does not grow as the input gets larger.
Our FPGA's core advantage is making domain intersection O(1): a 64-bit AND instruction takes one clock cycle regardless of which values are in the domain.
A data structure consisting of nodes (vertices) connected by edges, representing relationships.
A graph is simply a collection of "things" (nodes) and "connections" between them (edges). This is one of the most versatile structures in computer science because relationships are everywhere:
- **Social networks**: people are nodes, friendships are edges
- **Road maps**: cities are nodes, roads are edges
- **Web pages**: pages are nodes, hyperlinks are edges
- **Dependencies**: tasks are nodes, "must come before" is an edge
Graphs come in flavors: **directed** (edges have a direction, like one-way streets) vs. **undirected** (two-way connections). **Weighted** (edges have a cost, like distance) vs. **unweighted**. **Cyclic** (you can follow edges in a loop back to where you started) vs. **acyclic** (no loops — called a DAG, directed acyclic graph).
Graph algorithms solve problems like: shortest path (GPS navigation), connectivity (can you reach B from A?), topological sort (in what order should you complete these tasks?), and cycle detection (is there a circular dependency?).
A function that converts input data of any size into a fixed-size value (the hash), used to index into a hash table.
A hash function takes any input — a name, a file, a sentence — and produces a fixed-size number. Think of it as a fingerprint machine: you feed in a person, and it outputs a 10-digit number. The same person always produces the same number (deterministic), but different people almost always produce different numbers.
Here is a simple (bad) hash for names: add up the ASCII values of each letter. "cat" = 99+97+116 = 312. "act" = 97+99+116 = 312. Oops — "cat" and "act" give the same hash! That is a collision, and it shows why designing good hash functions is an art.
Real hash functions (like SHA-256 or MurmurHash) use mixing, shifting, and multiplication to spread values uniformly. A good hash function has three properties:
1. **Deterministic** — same input always gives same output
2. **Uniform distribution** — outputs are spread evenly across the range
3. **Avalanche effect** — changing one bit of input changes ~50% of output bits
Hash functions are used everywhere: hash tables (fast lookup), checksums (detecting corruption), digital signatures (verifying identity), and our FPGA's hash consing (giving every unique term a unique ID).
A data structure that maps keys to values using a hash function, enabling O(1) average-case lookup, insertion, and deletion.
Also called a hash map, dictionary, or associative array. It is one of the most important data structures in all of computer science.
The idea: instead of searching through items to find the one you want, compute exactly where it should be. A hash function converts the key ("Judah") into an index (7), and you go directly to position 7. No scanning, no sorting required.
**How it works step by step:**
1. You want to store the pair ("Judah", "tribe of kings")
2. The hash function computes: hash("Judah") = 7
3. Store the pair at position 7 in the underlying array
4. Later, to find "Judah": compute hash("Judah") = 7, go to position 7, done
**Performance**: O(1) average for lookup, insert, and delete. This is why hash tables are everywhere — Python's `dict`, JavaScript's `{}` objects, Java's `HashMap`, Rust's `HashMap`. Nearly every program uses them.
**The catch**: worst case is O(N) if many keys collide at the same position. Good hash functions and proper sizing keep this from happening in practice. When the table gets too full (load factor > ~0.75), it resizes — doubles in size and rehashes everything.
Repeating a set of instructions, typically using a loop (for, while).
The most basic form of repetition in programming. A `for` loop that visits each tribe once, a `while` loop that keeps refining until convergence — these are iteration.
Iteration and recursion can solve the same problems, but they work differently:
- **Iteration** uses a loop variable and modifies it each pass. It uses constant memory (just the loop variable).
- **Recursion** calls itself with a smaller input. Each call adds a frame to the call stack, using O(N) memory for N calls.
For simple tasks like summing a list, iteration is typically preferred — it is faster and uses less memory. For naturally hierarchical problems (tree traversal, divide-and-conquer), recursion is more natural. In our FPGA, constraint propagation is iterative: apply all constraints, check if anything changed, repeat until stable (fixpoint).
An operation whose time grows proportionally to the input size. Denoted O(N).
Double the input, double the time. This is the natural speed for any task that must look at every item at least once.
Finding the maximum in an unsorted list? You must check every element — O(N). Summing an array? Visit each element once — O(N). Printing every name in a phone book? O(N).
Linear time is often the best you can achieve for problems where the answer depends on all the data. You cannot find the largest number without looking at all of them. But many problems *can* be solved faster if the data is organized: binary search finds an item in O(log N) by exploiting sorted order, and hash tables find items in O(1) by exploiting computed positions.
When someone says an algorithm is "linear," it means performance scales gracefully. Processing 1,000 items takes 1 ms? Processing 1,000,000 items takes about 1 second. That is manageable. Compare with O(N²): 1,000 items in 1 ms means 1,000,000 items takes 1,000 seconds (nearly 17 minutes).
A data structure where each element points to the next, forming a chain.
Imagine a treasure hunt: each clue tells you where to find the next clue. To reach clue #5, you must follow clues 1, 2, 3, 4 in order. You cannot jump directly to #5 the way you can jump to mailbox #5 in an array.
Each element (called a "node") contains two things: the data itself, and a pointer to the next node. The last node points to nothing (null).
**Advantages over arrays**: inserting or removing an element at the front is O(1) — just update a pointer. No shifting required. This makes linked lists great for queues, stacks, and situations where elements are frequently added and removed.
**Disadvantages**: finding the Nth element takes O(N) — you must walk the chain. No random access. And each node uses extra memory for the pointer. In modern hardware, arrays are often faster in practice because they are cache-friendly (elements are adjacent in memory), while linked list nodes can be scattered.
The inverse of exponentiation. log₂(N) answers: "How many times must I halve N to reach 1?"
If exponentiation asks "2 raised to what power gives me 1024?" (answer: 10), then the logarithm asks the reverse: "How many times must I multiply 2 to reach 1024?" (also 10). Equivalently: "How many times must I halve 1024 to reach 1?" — 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1 — that is 10 halvings.
Why programmers care: logarithms grow *incredibly slowly*. Here is a comparison:
| N | log₂(N) |
|---|---------|
| 10 | ~3.3 |
| 1,000 | ~10 |
| 1,000,000 | ~20 |
| 1,000,000,000 | ~30 |
A billion items, and log only gives 30. This is why O(log N) algorithms like binary search are so powerful — they scale effortlessly to enormous datasets. Each doubling of data adds just one more step.
In CS, "log" usually means log base 2 (because computers work in binary). The base does not affect Big-O analysis — log₂(N) and log₁₀(N) differ only by a constant factor — but base 2 is the most natural.
An operation whose time grows with the square of the input size. Denoted O(N²).
Double the input, quadruple the time. This is the typical cost of comparing every item against every other item — the classic "nested loop" pattern.
Real-world example: at a dinner party of N people, everyone shakes hands with everyone else. With 10 people, that is about 45 handshakes. With 20 people: about 190 handshakes. With 100 people: nearly 5,000 handshakes. The formula is N(N-1)/2, which is O(N²).
Common O(N²) algorithms: bubble sort, selection sort, insertion sort (for large inputs), and naive string matching. These are fine for small inputs (N < 1000) but become impractical quickly. At N = 1,000,000, an O(N²) algorithm does a trillion operations — compared to 20 million for O(N log N).
The lesson: when you see nested loops over the same data, think "quadratic" and ask if there is a smarter approach.
A data structure following First-In-First-Out (FIFO) order — like a line at the market.
The first person to join the line is the first to be served. No cutting! This is First-In-First-Out (FIFO) ordering.
Two operations: **enqueue** (add to the back of the line) and **dequeue** (remove from the front). Both are O(1).
Queues appear everywhere in computing:
- **Breadth-first search**: explore a graph level by level — process all neighbors before going deeper
- **Print spooling**: documents print in the order they were submitted
- **Message passing**: requests are processed in arrival order
- **Our FPGA**: the command ring buffer is a queue of search states — each state gets processed in order, and new branches from `conde` are enqueued at the back
The opposite of a queue is a stack (LIFO — last in, first out). Choosing between a queue and a stack determines whether you explore breadth-first or depth-first.
A technique where a function calls itself to solve a smaller instance of the same problem.
The idea: if you can solve the problem for N by first solving it for N-1, and you know the answer for the smallest case (the "base case"), then you can solve it for any N.
Classic example — factorial: 5! = 5 × 4! = 5 × 4 × 3! = ... = 5 × 4 × 3 × 2 × 1 = 120. The function calls itself with a smaller number until it reaches 1 (the base case).
Recursion shines for problems with natural hierarchical or self-similar structure:
- **Tree traversal**: to process a tree, process the root, then recursively process each subtree
- **Merge sort**: split the list, recursively sort each half, merge
- **Fibonacci**: F(n) = F(n-1) + F(n-2), with F(0)=0, F(1)=1
**Every recursive solution needs a base case** — the condition where it stops calling itself. Without it, you get infinite recursion (like looking up "recursion" in a dictionary and finding "see: recursion").
In practice, deep recursion can overflow the call stack. Some languages (especially functional ones like Haskell) optimize "tail recursion" to avoid this. In other languages, you may need to convert deep recursion to iteration.
A data structure following Last-In-First-Out (LIFO) order — like a stack of plates.
You can only add or remove from the top. The last plate you put on the stack is the first one you take off. This is Last-In-First-Out (LIFO).
Two operations: **push** (add to the top) and **pop** (remove from the top). Both are O(1).
Stacks are fundamental to how computers work:
- **The call stack**: when a function calls another function, the return address is pushed onto the stack. When the inner function finishes, the address is popped and execution returns. This is how your program knows where to go back to.
- **Undo/Redo**: each action is pushed; Ctrl+Z pops the last action
- **Expression evaluation**: to compute `3 + 4 * 2`, a stack tracks operators and operands in the right order
- **Depth-first search**: explore one path as deep as possible, then pop back to try another
The call stack is why deeply recursive functions can cause a "stack overflow" — too many nested calls exhaust the available stack memory.
A measure of how the running time of an algorithm grows as the input size increases.
Time complexity does not measure actual seconds — it measures how the number of operations scales. An O(N) algorithm might take 1 second for N=1000 on your laptop and 0.001 seconds on a supercomputer, but on both machines, doubling N will approximately double the time.
This abstraction is powerful because it lets you compare algorithms independently of hardware. If Algorithm A is O(N log N) and Algorithm B is O(N²), then for large enough inputs, A will always beat B — regardless of which computer you run them on or what constant factors are involved.
The common complexity classes, from fastest to slowest:
O(1) < O(log N) < O(N) < O(N log N) < O(N²) < O(2^N) < O(N!)
Choosing the right algorithm is often the single most impactful optimization you can make. Switching from O(N²) to O(N log N) for sorting a million items cuts the time from ~17 minutes to ~0.3 seconds. No amount of hardware improvement matches that.
A hierarchical data structure with a root node and child nodes, where each node has at most one parent.
Think of a family tree: one ancestor at the top (the root), branching down to children, grandchildren, and so on. No person has two parents in the tree (this is a simplification — in a real family tree they do, but in CS trees they do not).
Key vocabulary:
- **Root**: the topmost node (no parent)
- **Leaf**: a node with no children (the bottom)
- **Internal node**: a node with at least one child
- **Depth**: how many edges from the root to this node
- **Height**: the depth of the deepest leaf
**Binary trees** (each node has at most 2 children) are especially important. A **binary search tree (BST)** keeps values sorted: smaller values go left, larger values go right. This enables O(log N) lookup — at each node, you choose left or right, halving the possibilities.
Trees appear everywhere: file systems (folders containing folders), HTML/DOM (nested elements), decision trees (if/else chains), expression trees (parsing math), and our FPGA's cons-cell terms (each cons has a car and a cdr — a binary tree).