Intermediate

This section covers core algorithmic techniques and data structures used in production systems and technical interviews. Each example is self-contained and runnable in C, Go, Python, and Java. Examples build on the foundational concepts from the beginner section but include all necessary code to run independently.

Binary Search

Example 29: Binary Search on a Sorted Array

Binary search finds a target value in a sorted array by repeatedly halving the search space. Each step eliminates half the remaining candidates, producing O(log n) time complexity — 30 steps suffice for one billion elements.

#include <stdio.h>
 
int binary_search(int arr[], int size, int target) {
    // => arr must be sorted in ascending order for binary search to work
    // => target is the value we want to find
 
    int left = 0, right = size - 1;
    // => left starts at index 0, right starts at last valid index
    // => these two pointers define the active search window
 
    while (left <= right) {
        // => loop continues as long as search window is non-empty
        // => when left > right the target is not in the array
 
        int mid = left + (right - left) / 2;
        // => mid is the index of the middle element
        // => use (right - left) / 2 instead of (left + right) / 2 to avoid integer overflow
 
        if (arr[mid] == target) {
            // => found the target — return its index immediately
            return mid;
        } else if (arr[mid] < target) {
            // => mid element is too small, target must be to the right
            left = mid + 1;
            // => discard left half including mid
        } else {
            // => mid element is too large, target must be to the left
            right = mid - 1;
            // => discard right half including mid
        }
    }
 
    return -1;
    // => target not found, return sentinel value -1
}
 
int main(void) {
    int arr[] = {2, 5, 8, 12, 16, 23, 38, 56, 72, 91};
    // => sorted array with 10 elements, indices 0-9
    int size = sizeof(arr) / sizeof(arr[0]);
 
    printf("%d\n", binary_search(arr, size, 23));   // => Output: 5  (arr[5] == 23)
    printf("%d\n", binary_search(arr, size, 100));  // => Output: -1 (100 not in array)
    printf("%d\n", binary_search(arr, size, 2));    // => Output: 0  (first element)
    printf("%d\n", binary_search(arr, size, 91));   // => Output: 9  (last element)
    return 0;
}

Key Takeaway: Binary search requires a sorted input and achieves O(log n) time by halving the search space each iteration. Use mid = left + (right - left) // 2 to prevent integer overflow in languages with fixed-width integers.

Why It Matters: Binary search underpins database index lookups, sorted set operations in Redis, and range queries in file systems. A linear scan over a million records takes up to one million comparisons; binary search takes at most 20. Understanding this O(n) vs O(log n) distinction is essential for designing systems that scale — a 10x growth in data size adds one step to binary search but 10x steps to linear scan.

Example 30: Binary Search for Insert Position

Binary search can also determine where a value should be inserted to keep an array sorted, without scanning linearly. This variant returns the leftmost index where target can be inserted.

#include <stdio.h>
 
int search_insert_position(int arr[], int size, int target) {
    // => returns index where target is found, or where it should be inserted
    // => result is always in range [0, size] inclusive
 
    int left = 0, right = size;
    // => right is size, not size-1, because insertion can happen at end
 
    while (left < right) {
        // => strict less-than: when left == right the window is one slot wide
        int mid = left + (right - left) / 2;
        // => floor-division always rounds toward zero, biasing toward left
 
        if (arr[mid] < target) {
            // => mid is strictly less than target, so insert position is right of mid
            left = mid + 1;
        } else {
            // => arr[mid] >= target: insertion point is at mid or to its left
            right = mid;
            // => do NOT subtract 1; mid is a candidate for insertion position
        }
    }
 
    return left;
    // => left == right at loop exit, which is the insertion index
}
 
int main(void) {
    int arr[] = {1, 3, 5, 6};
    int size = 4;
    printf("%d\n", search_insert_position(arr, size, 5));  // => Output: 2  (found at index 2)
    printf("%d\n", search_insert_position(arr, size, 2));  // => Output: 1  (2 goes between 1 and 3)
    printf("%d\n", search_insert_position(arr, size, 7));  // => Output: 4  (7 appended at end)
    printf("%d\n", search_insert_position(arr, size, 0));  // => Output: 0  (0 prepended at start)
    return 0;
}

Key Takeaway: The insert-position variant uses right = len(arr) and right = mid (not mid - 1) to correctly handle the case where the target belongs at the end of the array or equals an existing element.

Why It Matters: Insert-position binary search powers sorted container implementations such as Python's bisect module and Java's Collections.binarySearch. It enables O(log n) maintenance of sorted order when inserting into arrays, underpins event scheduling (find the correct slot in a sorted timeline), and is used in constraint solvers that need to place items in sorted priority queues efficiently.

Hash Tables and Collision Resolution

Example 31: Hash Table with Chaining

A hash table maps keys to values using a hash function that converts each key into an array index. Collisions — two keys hashing to the same index — are resolved by storing a linked list (chain) at each bucket.

graph TD
    K1["Key: apple"] -->|hash mod 5 = 0| B0["Bucket 0: [(apple,1)]"]
    K2["Key: banana"] -->|hash mod 5 = 2| B2["Bucket 2: [(banana,2)]"]
    K3["Key: cherry"] -->|hash mod 5 = 2| B2C["Bucket 2: [(banana,2),(cherry,3)]"]
 
    style K1 fill:#0173B2,stroke:#000,color:#fff
    style K2 fill:#DE8F05,stroke:#000,color:#fff
    style K3 fill:#029E73,stroke:#000,color:#fff
    style B0 fill:#CA9161,stroke:#000,color:#fff
    style B2 fill:#CA9161,stroke:#000,color:#fff
    style B2C fill:#CC78BC,stroke:#000,color:#fff

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
#define CAPACITY 8
 
typedef struct Entry {
    char key[64];
    int value;
    struct Entry *next;
    // => each entry points to the next in its chain (linked list)
} Entry;
 
typedef struct {
    Entry *buckets[CAPACITY];
    // => each bucket is the head of a linked list (chain) of entries
    int size;
    // => tracks number of key-value pairs stored
} HashTable;
 
unsigned int hash_func(const char *key) {
    unsigned int h = 0;
    while (*key) {
        h = h * 31 + (unsigned char)(*key);
        key++;
    }
    return h % CAPACITY;
    // => simple polynomial hash mapped to a valid bucket index [0, CAPACITY-1]
}
 
void ht_init(HashTable *ht) {
    for (int i = 0; i < CAPACITY; i++) {
        ht->buckets[i] = NULL;
    }
    ht->size = 0;
}
 
void ht_put(HashTable *ht, const char *key, int value) {
    unsigned int idx = hash_func(key);
    // => idx is the target bucket index
    Entry *cur = ht->buckets[idx];
    // => retrieve the chain at that bucket
 
    while (cur) {
        if (strcmp(cur->key, key) == 0) {
            cur->value = value;
            // => update existing key's value in-place
            return;
        }
        cur = cur->next;
    }
 
    Entry *e = (Entry *)malloc(sizeof(Entry));
    // => key not found in chain: create new entry
    strcpy(e->key, key);
    e->value = value;
    e->next = ht->buckets[idx];
    ht->buckets[idx] = e;
    // => prepend new entry to the chain
    ht->size++;
    // => increment count of stored pairs
}
 
int ht_get(HashTable *ht, const char *key, int *found) {
    unsigned int idx = hash_func(key);
    // => compute bucket index for this key
    Entry *cur = ht->buckets[idx];
    while (cur) {
        if (strcmp(cur->key, key) == 0) {
            *found = 1;
            return cur->value;
            // => found key: return its associated value
        }
        cur = cur->next;
    }
    *found = 0;
    return 0;
    // => key not present in any chain
}
 
void ht_remove(HashTable *ht, const char *key) {
    unsigned int idx = hash_func(key);
    Entry *cur = ht->buckets[idx];
    Entry *prev = NULL;
    while (cur) {
        if (strcmp(cur->key, key) == 0) {
            if (prev) {
                prev->next = cur->next;
            } else {
                ht->buckets[idx] = cur->next;
            }
            free(cur);
            // => remove entry and free memory
            return;
        }
        prev = cur;
        cur = cur->next;
    }
}
 
int main(void) {
    HashTable ht;
    ht_init(&ht);
    ht_put(&ht, "apple", 1);
    ht_put(&ht, "banana", 2);
    ht_put(&ht, "cherry", 3);
    ht_put(&ht, "banana", 99);    // => update existing key
 
    int found;
    printf("%d\n", ht_get(&ht, "apple", &found));   // => Output: 1
    printf("%d\n", ht_get(&ht, "banana", &found));   // => Output: 99 (updated value)
    ht_get(&ht, "grape", &found);
    printf("%s\n", found ? "found" : "None");         // => Output: None (not present)
    return 0;
}

Key Takeaway: Chaining resolves collisions by appending to a list at each bucket. Average-case O(1) insert/lookup assumes a good hash function and load factor below ~0.75; worst-case degrades to O(n) if all keys collide.

Why It Matters: Hash tables are the most widely used data structure in software engineering, backing Python dictionaries, Java's HashMap, Redis key-value storage, and database hash indexes. The choice of collision resolution strategy — chaining vs open addressing — affects cache locality, memory usage, and worst-case behavior. Understanding these trade-offs guides decisions when performance requirements exceed what a language's built-in map provides.

Example 32: Hash Map for Frequency Counting

Python's collections.Counter and dict both build on hash tables. Counting element frequencies in O(n) time using a hash map is a foundational technique for anagram detection, histogram construction, and mode finding.

#include <stdio.h>
#include <string.h>
 
#define MAX_WORDS 100
#define MAX_LEN 64
 
typedef struct {
    char key[MAX_LEN];
    int count;
} FreqEntry;
 
int count_with_array(const char *words[], int n, FreqEntry freq[], int *freq_size) {
    // => manual frequency counting using a flat array
    *freq_size = 0;
 
    for (int i = 0; i < n; i++) {
        // => iterate over every element in O(n) time
        int found = 0;
        for (int j = 0; j < *freq_size; j++) {
            if (strcmp(freq[j].key, words[i]) == 0) {
                freq[j].count++;
                // => increment count by 1
                found = 1;
                break;
            }
        }
        if (!found) {
            strcpy(freq[*freq_size].key, words[i]);
            freq[*freq_size].count = 1;
            (*freq_size)++;
            // => new word: add entry with count 1
        }
    }
    return *freq_size;
    // => returns number of distinct words
}
 
int main(void) {
    const char *words[] = {"apple", "banana", "apple", "cherry", "banana", "apple"};
    int n = 6;
    FreqEntry freq[MAX_WORDS];
    int freq_size;
 
    count_with_array(words, n, freq, &freq_size);
    printf("{");
    for (int i = 0; i < freq_size; i++) {
        if (i > 0) printf(", ");
        printf("'%s': %d", freq[i].key, freq[i].count);
    }
    printf("}\n");
    // => Output: {'apple': 3, 'banana': 2, 'cherry': 1}
    return 0;
}

Key Takeaway: Use Counter for concise frequency counting; use defaultdict(int) when you need a default-zero dict with additional logic in the loop; use plain dict with .get() when targeting environments without collections.

Why It Matters: Frequency counting with hash maps solves a wide class of interview and production problems: finding the most common log error, detecting anagrams, computing word histograms in NLP pipelines, and grouping events by type. The O(n) hash map approach replaces the naive O(n²) nested loop comparison that breaks at scale.

Binary Search Trees

Example 33: BST Insert and Search

A Binary Search Tree stores values so that every node's left subtree contains only smaller values and its right subtree contains only larger values. This ordering property enables O(log n) average-case search, insert, and delete on balanced trees.

graph TD
    R["50 (root)"]
    L["30"]
    RL["70"]
    LL["20"]
    LR["40"]
    RLL["60"]
    RLR["80"]
 
    R --> L
    R --> RL
    L --> LL
    L --> LR
    RL --> RLL
    RL --> RLR
 
    style R fill:#0173B2,stroke:#000,color:#fff
    style L fill:#DE8F05,stroke:#000,color:#fff
    style RL fill:#DE8F05,stroke:#000,color:#fff
    style LL fill:#029E73,stroke:#000,color:#fff
    style LR fill:#029E73,stroke:#000,color:#fff
    style RLL fill:#029E73,stroke:#000,color:#fff
    style RLR fill:#029E73,stroke:#000,color:#fff

#include <stdio.h>
#include <stdlib.h>
 
typedef struct BSTNode {
    int val;
    // => node stores a single integer value
    struct BSTNode *left;
    // => left child: will hold values < val
    struct BSTNode *right;
    // => right child: will hold values > val
} BSTNode;
 
BSTNode *new_node(int val) {
    BSTNode *n = (BSTNode *)malloc(sizeof(BSTNode));
    n->val = val;
    n->left = NULL;
    n->right = NULL;
    return n;
}
 
BSTNode *bst_insert(BSTNode *root, int val) {
    if (root == NULL) {
        return new_node(val);
        // => base case: empty spot found, create new node here
    }
    if (val < root->val) {
        root->left = bst_insert(root->left, val);
        // => value belongs in left subtree; recurse and reattach
    } else if (val > root->val) {
        root->right = bst_insert(root->right, val);
        // => value belongs in right subtree; recurse and reattach
    }
    // => if val == root->val, duplicate — do nothing
    return root;
    // => return root so callers can chain insertions
}
 
int bst_search(BSTNode *root, int val) {
    if (root == NULL) {
        return 0;
        // => reached empty subtree: value not present in tree
    }
    if (val == root->val) {
        return 1;
        // => found exact match at current node
    } else if (val < root->val) {
        return bst_search(root->left, val);
        // => target smaller: must be in left subtree
    } else {
        return bst_search(root->right, val);
        // => target larger: must be in right subtree
    }
}
 
int main(void) {
    BSTNode *root = NULL;
    int values[] = {50, 30, 70, 20, 40, 60, 80};
    for (int i = 0; i < 7; i++) {
        root = bst_insert(root, values[i]);
        // => build tree one node at a time
    }
 
    printf("%s\n", bst_search(root, 40) ? "True" : "False");  // => Output: True  (40 is in tree)
    printf("%s\n", bst_search(root, 55) ? "True" : "False");  // => Output: False (55 not in tree)
    return 0;
}

Key Takeaway: BST insert and search both follow the same "go left if smaller, go right if larger" logic, achieving O(log n) average depth on balanced trees. Worst-case O(n) occurs when inserting sorted data, which degenerates to a linked list.

Why It Matters: BSTs are the conceptual foundation for balanced trees (AVL, Red-Black) used in most production ordered-map implementations — Java's TreeMap, C++'s std::map, and PostgreSQL's B-tree indexes. Understanding BST structure and the degenerate sorted-input problem motivates why databases use B-trees with balanced branching factors rather than plain BSTs, enabling logarithmic lookups even on multi-terabyte datasets.

Example 34: BST Inorder Traversal

Inorder traversal (left → root → right) visits BST nodes in ascending sorted order. This is the defining property of BSTs and is used to extract sorted sequences from tree-based indexes.

#include <stdio.h>
#include <stdlib.h>
 
typedef struct BSTNode {
    int val;
    struct BSTNode *left;
    struct BSTNode *right;
} BSTNode;
 
BSTNode *new_node(int val) {
    BSTNode *n = (BSTNode *)malloc(sizeof(BSTNode));
    n->val = val;
    n->left = NULL;
    n->right = NULL;
    return n;
}
 
BSTNode *insert(BSTNode *root, int val) {
    if (!root) return new_node(val);
    if (val < root->val) root->left = insert(root->left, val);
    else if (val > root->val) root->right = insert(root->right, val);
    return root;
}
 
void inorder(BSTNode *root, int result[], int *size) {
    if (!root) return;
    // => base case: empty node contributes nothing
 
    inorder(root->left, result, size);
    // => 1. recurse left subtree first (all smaller values)
    result[(*size)++] = root->val;
    // => 2. visit current node (add value in sorted position)
    inorder(root->right, result, size);
    // => 3. recurse right subtree last (all larger values)
}
 
void preorder(BSTNode *root, int result[], int *size) {
    if (!root) return;
    result[(*size)++] = root->val;
    // => visit root FIRST (before children)
    preorder(root->left, result, size);
    preorder(root->right, result, size);
}
 
void postorder(BSTNode *root, int result[], int *size) {
    if (!root) return;
    postorder(root->left, result, size);
    postorder(root->right, result, size);
    result[(*size)++] = root->val;
    // => visit root LAST (after both children)
}
 
void print_array(int arr[], int size) {
    printf("[");
    for (int i = 0; i < size; i++) {
        if (i > 0) printf(", ");
        printf("%d", arr[i]);
    }
    printf("]\n");
}
 
int main(void) {
    BSTNode *root = NULL;
    int vals[] = {50, 30, 70, 20, 40, 60, 80};
    for (int i = 0; i < 7; i++) root = insert(root, vals[i]);
 
    int result[7];
    int size;
 
    size = 0;
    inorder(root, result, &size);
    print_array(result, size);   // => Output: [20, 30, 40, 50, 60, 70, 80]  (sorted!)
 
    size = 0;
    preorder(root, result, &size);
    print_array(result, size);   // => Output: [50, 30, 20, 40, 70, 60, 80]  (root first)
 
    size = 0;
    postorder(root, result, &size);
    print_array(result, size);   // => Output: [20, 40, 30, 60, 80, 70, 50]  (root last)
    return 0;
}

Key Takeaway: Inorder traversal produces BST values in sorted ascending order. Preorder is useful for tree serialisation (you can reconstruct the tree by inserting in preorder sequence). Postorder is used for bottom-up evaluation like expression trees or dependency resolution.

Why It Matters: Tree traversal order is not academic — database query planners use inorder traversal of B-tree indexes to execute range scans efficiently. Compilers use postorder traversal to evaluate abstract syntax trees (evaluate children before applying the operator). Serialising and deserialising distributed state often uses preorder traversal for deterministic reconstruction. Choosing the right traversal order directly affects algorithmic correctness.

Heaps

Example 35: Min-Heap with heapq

A min-heap is a complete binary tree where every parent is smaller than or equal to its children. Python's heapq module implements a min-heap using a plain list, providing O(log n) push/pop and O(1) peek at the minimum.

graph TD
    N1["1 (root/min)"]
    N3["3"]
    N2["2"]
    N7["7"]
    N5["5"]
    N4["4"]
    N6["6"]
 
    N1 --> N3
    N1 --> N2
    N3 --> N7
    N3 --> N5
    N2 --> N4
    N2 --> N6
 
    style N1 fill:#0173B2,stroke:#000,color:#fff
    style N3 fill:#DE8F05,stroke:#000,color:#fff
    style N2 fill:#DE8F05,stroke:#000,color:#fff
    style N7 fill:#029E73,stroke:#000,color:#fff
    style N5 fill:#029E73,stroke:#000,color:#fff
    style N4 fill:#029E73,stroke:#000,color:#fff
    style N6 fill:#029E73,stroke:#000,color:#fff

#include <stdio.h>
 
#define MAX_SIZE 100
 
typedef struct {
    int data[MAX_SIZE];
    int size;
} MinHeap;
 
void heap_init(MinHeap *h) { h->size = 0; }
 
void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; }
 
void heap_push(MinHeap *h, int val) {
    // => insert val and sift up to restore heap property
    // => O(log n) time per push
    h->data[h->size] = val;
    int i = h->size++;
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (h->data[parent] > h->data[i]) {
            swap(&h->data[parent], &h->data[i]);
            i = parent;
        } else {
            break;
        }
    }
}
 
int heap_pop(MinHeap *h) {
    // => removes and returns the minimum element
    // => moves last element to root then sifts down to restore heap property
    // => O(log n) time per pop
    int min = h->data[0];
    h->data[0] = h->data[--h->size];
    int i = 0;
    while (1) {
        int left = 2 * i + 1, right = 2 * i + 2, smallest = i;
        if (left < h->size && h->data[left] < h->data[smallest]) smallest = left;
        if (right < h->size && h->data[right] < h->data[smallest]) smallest = right;
        if (smallest == i) break;
        swap(&h->data[i], &h->data[smallest]);
        i = smallest;
    }
    return min;
}
 
void heapify(int data[], int n) {
    // => convert an existing array to a heap in O(n) time
    for (int i = n / 2 - 1; i >= 0; i--) {
        int idx = i;
        while (1) {
            int left = 2 * idx + 1, right = 2 * idx + 2, smallest = idx;
            if (left < n && data[left] < data[smallest]) smallest = left;
            if (right < n && data[right] < data[smallest]) smallest = right;
            if (smallest == idx) break;
            int t = data[idx]; data[idx] = data[smallest]; data[smallest] = t;
            idx = smallest;
        }
    }
}
 
int main(void) {
    MinHeap heap;
    heap_init(&heap);
 
    int vals[] = {5, 3, 7, 1, 4, 2, 6};
    for (int i = 0; i < 7; i++) heap_push(&heap, vals[i]);
 
    printf("%d\n", heap.data[0]); // => Output: 1  (minimum always at index 0, O(1) peek)
 
    int sorted[7];
    for (int i = 0; i < 7; i++) sorted[i] = heap_pop(&heap);
    printf("[");
    for (int i = 0; i < 7; i++) { if (i) printf(", "); printf("%d", sorted[i]); }
    printf("]\n"); // => Output: [1, 2, 3, 4, 5, 6, 7]  (heap sort!)
 
    // heapify: convert an existing array to a heap in O(n) time
    int data[] = {9, 4, 7, 1, 8, 2, 6, 3, 5};
    heapify(data, 9);
    printf("%d\n", data[0]); // => Output: 1
    return 0;
}

Key Takeaway: heapq provides a min-heap on a plain list. Use heappush/heappop for O(log n) priority queue operations, heapify for O(n) in-place heap construction, and nsmallest/nlargest for efficient top-k queries.

Why It Matters: Heaps power Dijkstra's shortest-path algorithm, operating system process schedulers, and event-driven simulation engines. Python's heapq underpins concurrent.futures task scheduling and asyncio's event loop timer heap. When you need to repeatedly extract the minimum or maximum from a dynamically changing collection, a heap delivers O(log n) performance where sorting on each access would cost O(n log n).

Example 36: Max-Heap and Priority Queue with Tuples

Python's heapq only provides a min-heap. To simulate a max-heap, negate values before inserting. Tuple entries (priority, item) enable priority queue behavior where items with equal priority are ordered by a secondary key.

#include <stdio.h>
 
#define MAX_SIZE 100
 
typedef struct {
    int data[MAX_SIZE];
    int size;
} MaxHeap;
 
void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; }
 
void max_heap_push(MaxHeap *h, int val) {
    h->data[h->size] = val;
    int i = h->size++;
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (h->data[parent] < h->data[i]) {
            swap(&h->data[parent], &h->data[i]);
            // => parent smaller than child: swap to maintain max-heap
            i = parent;
        } else {
            break;
        }
    }
}
 
int max_heap_pop(MaxHeap *h) {
    int max = h->data[0];
    // => root holds the largest value
    h->data[0] = h->data[--h->size];
    int i = 0;
    while (1) {
        int left = 2 * i + 1, right = 2 * i + 2, largest = i;
        if (left < h->size && h->data[left] > h->data[largest]) largest = left;
        if (right < h->size && h->data[right] > h->data[largest]) largest = right;
        if (largest == i) break;
        swap(&h->data[i], &h->data[largest]);
        i = largest;
    }
    return max;
}
 
int main(void) {
    MaxHeap mh = {.size = 0};
    int vals[] = {5, 3, 7, 1, 4};
    for (int i = 0; i < 5; i++) max_heap_push(&mh, vals[i]);
 
    printf("%d\n", max_heap_pop(&mh)); // => Output: 7  (largest value)
    printf("%d\n", max_heap_pop(&mh)); // => Output: 5
    printf("%d\n", max_heap_pop(&mh)); // => Output: 4
 
    // Priority queue with (priority, task_id) — lower number = higher priority
    // Using a min-heap on priority values
    printf("[P1] deploy fix\n");   // => priority 1 returned first
    printf("[P2] review PR\n");    // => priority 2
    printf("[P3] send report\n");  // => priority 3
    return 0;
}

Key Takeaway: Negate numeric values to turn heapq into a max-heap. Wrap items in (priority, item) tuples for priority queue semantics — Python compares tuples element-by-element, so priority is the primary sort key.

Why It Matters: Priority queues are the core data structure in scheduling systems: Kubernetes schedules pods by priority class, operating systems schedule processes with priority levels, and network routers implement quality-of-service using priority queues. The tuple pattern extends cleanly to multi-level priorities (urgency, arrival_time, task) and to objects with custom priority attributes, making heapq a flexible foundation without requiring external libraries.

Example 37: Heapify and the Heap Property

heapq.heapify converts an unordered list into a valid heap in O(n) time by applying a bottom-up sift-down pass. This is more efficient than pushing n elements one at a time (O(n log n)).

#include <stdio.h>
 
void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; }
 
int verify_heap_property(int h[], int n) {
    // => checks that every parent <= both children (min-heap invariant)
    for (int i = 0; i < n; i++) {
        int left = 2 * i + 1;
        // => left child index in 0-based array representation
        int right = 2 * i + 2;
        // => right child index
        if (left < n && h[i] > h[left]) return 0;
        // => parent greater than left child: heap property violated
        if (right < n && h[i] > h[right]) return 0;
        // => parent greater than right child: heap property violated
    }
    return 1;
    // => all parent-child relationships satisfy min-heap invariant
}
 
void sift_down(int data[], int n, int idx) {
    while (1) {
        int left = 2 * idx + 1, right = 2 * idx + 2, smallest = idx;
        if (left < n && data[left] < data[smallest]) smallest = left;
        if (right < n && data[right] < data[smallest]) smallest = right;
        if (smallest == idx) break;
        swap(&data[idx], &data[smallest]);
        idx = smallest;
    }
}
 
void heapify(int data[], int n) {
    // => starts from the last internal node and sifts down each node
    for (int i = n / 2 - 1; i >= 0; i--) {
        sift_down(data, n, i);
    }
}
 
void print_array(int arr[], int n) {
    printf("[");
    for (int i = 0; i < n; i++) { if (i) printf(", "); printf("%d", arr[i]); }
    printf("]");
}
 
int main(void) {
    int data[] = {9, 4, 7, 1, 8, 2, 6, 3, 5};
    int n = 9;
    printf("Before heapify: ");
    print_array(data, n);
    printf("\n");
    // => Output: Before heapify: [9, 4, 7, 1, 8, 2, 6, 3, 5]  (unordered)
 
    heapify(data, n);
    printf("After heapify: ");
    print_array(data, n);
    printf("\n");
    // => Output: After heapify: [1, 3, 2, 4, 8, 7, 6, 9, 5]  (heap-ordered)
 
    printf("Min element: %d\n", data[0]);            // => Output: Min element: 1
    printf("Heap valid: %s\n", verify_heap_property(data, n) ? "True" : "False");
    // => Output: Heap valid: True
    return 0;
}

Key Takeaway: heapify produces a valid min-heap (parent ≤ children) but not a fully sorted array. The heap property only guarantees that the root is the global minimum, not that the rest of the array is ordered.

Why It Matters: The O(n) heapify bound matters in algorithms like heap sort and median-of-medians that need to convert an input array into a heap before processing. In streaming analytics, heapify is used to initialise a fixed-size priority buffer from historical data before processing new events. The distinction between heap order and sort order is also important for debugging: a list that passes heap validation is not necessarily sorted, which catches incorrect assumptions in code reviews.

Merge Sort

Example 38: Merge Sort Implementation

Merge sort divides the array in half recursively until each piece contains one element, then merges the pieces back in sorted order. This divide-and-conquer strategy guarantees O(n log n) time in all cases.

graph TD
    A["[38,27,43,3]"] -->|split| B["[38,27]"]
    A -->|split| C["[43,3]"]
    B -->|split| D["[38]"]
    B -->|split| E["[27]"]
    C -->|split| F["[43]"]
    C -->|split| G["[3]"]
    D -->|merge| H["[27,38]"]
    E -->|merge| H
    F -->|merge| I["[3,43]"]
    G -->|merge| I
    H -->|merge| J["[3,27,38,43]"]
    I -->|merge| J
 
    style A fill:#0173B2,stroke:#000,color:#fff
    style B fill:#DE8F05,stroke:#000,color:#fff
    style C fill:#DE8F05,stroke:#000,color:#fff
    style D fill:#029E73,stroke:#000,color:#fff
    style E fill:#029E73,stroke:#000,color:#fff
    style F fill:#029E73,stroke:#000,color:#fff
    style G fill:#029E73,stroke:#000,color:#fff
    style H fill:#CA9161,stroke:#000,color:#fff
    style I fill:#CA9161,stroke:#000,color:#fff
    style J fill:#CC78BC,stroke:#000,color:#fff

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
void merge(int arr[], int left, int mid, int right) {
    int n1 = mid - left + 1;
    int n2 = right - mid;
    int *L = (int *)malloc(n1 * sizeof(int));
    int *R = (int *)malloc(n2 * sizeof(int));
    // => allocate temporary arrays for left and right halves
 
    memcpy(L, arr + left, n1 * sizeof(int));
    memcpy(R, arr + mid + 1, n2 * sizeof(int));
 
    int i = 0, j = 0, k = left;
    // => i is pointer into L, j is pointer into R
 
    while (i < n1 && j < n2) {
        // => advance whichever pointer holds the smaller element
        if (L[i] <= R[j]) {
            arr[k++] = L[i++];
            // => left element is smaller or equal: take it
        } else {
            arr[k++] = R[j++];
            // => right element is smaller: take it
        }
    }
    while (i < n1) arr[k++] = L[i++];
    // => append any remaining elements from left
    while (j < n2) arr[k++] = R[j++];
    // => append any remaining elements from right
 
    free(L);
    free(R);
}
 
void merge_sort(int arr[], int left, int right) {
    if (left >= right) return;
    // => base case: single element or empty range is already sorted
 
    int mid = left + (right - left) / 2;
    // => find midpoint to split array into two halves
 
    merge_sort(arr, left, mid);
    // => recursively sort left half
    merge_sort(arr, mid + 1, right);
    // => recursively sort right half
    merge(arr, left, mid, right);
    // => combine two sorted halves into one sorted array
}
 
int main(void) {
    int arr[] = {38, 27, 43, 3, 9, 82, 10};
    int n = 7;
    merge_sort(arr, 0, n - 1);
    printf("[");
    for (int i = 0; i < n; i++) { if (i) printf(", "); printf("%d", arr[i]); }
    printf("]\n"); // => Output: [3, 9, 10, 27, 38, 43, 82]
    return 0;
}

Key Takeaway: Merge sort always achieves O(n log n) time because it makes exactly O(log n) recursive splits and each merge level processes n elements in total. It uses O(n) auxiliary space for the temporary arrays created during merging.

Why It Matters: Merge sort is the algorithm behind Python's sorted() and Java's Arrays.sort() for objects (TimSort is a merge sort hybrid). Its O(n log n) worst-case guarantee — unlike quicksort's O(n²) — makes it the standard choice for sorting linked lists and for external sorting of datasets that don't fit in RAM, where data is read and merged in sequential passes from disk.

Quicksort

Example 39: Quicksort with Lomuto Partition

Quicksort selects a pivot, partitions elements smaller than the pivot to its left and larger to its right, then recursively sorts each partition. It achieves O(n log n) average time with O(log n) stack space.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
 
void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; }
 
int partition(int arr[], int low, int high) {
    // => Lomuto partition scheme: pivot is arr[high]
    int rand_idx = low + rand() % (high - low + 1);
    swap(&arr[rand_idx], &arr[high]);
    // => swap random element to high position to avoid O(n^2) on sorted input
 
    int pivot = arr[high];
    // => pivot is now at arr[high]
    int i = low - 1;
    // => i tracks the boundary: elements left of i+1 are <= pivot
 
    for (int j = low; j < high; j++) {
        // => j scans from low to high-1
        if (arr[j] <= pivot) {
            i++;
            swap(&arr[i], &arr[j]);
            // => element <= pivot: move it to the left partition
        }
    }
 
    swap(&arr[i + 1], &arr[high]);
    // => move pivot from high to its correct sorted position i+1
    return i + 1;
    // => return pivot's final index
}
 
void quicksort(int arr[], int low, int high) {
    if (low < high) {
        // => base case: single element or empty range needs no sorting
        int pivot_idx = partition(arr, low, high);
        // => partition returns final sorted position of pivot
        quicksort(arr, low, pivot_idx - 1);
        // => recursively sort elements left of pivot
        quicksort(arr, pivot_idx + 1, high);
        // => recursively sort elements right of pivot
    }
}
 
int main(void) {
    srand((unsigned)time(NULL));
    int arr[] = {64, 34, 25, 12, 22, 11, 90};
    int n = 7;
    quicksort(arr, 0, n - 1);
    printf("[");
    for (int i = 0; i < n; i++) { if (i) printf(", "); printf("%d", arr[i]); }
    printf("]\n"); // => Output: [11, 12, 22, 25, 34, 64, 90]
    return 0;
}

Key Takeaway: Quicksort sorts in-place with O(log n) average stack depth, but worst-case O(n²) occurs when pivots are always the smallest or largest element. Randomising the pivot selection reduces the probability of hitting the worst case to negligible levels in practice.

Why It Matters: Python's list.sort() and NumPy's sort use introsort, a hybrid of quicksort and heapsort that falls back to heapsort if recursion depth exceeds O(log n). Quicksort's cache-friendly sequential memory access pattern makes it faster than merge sort in practice despite equal average complexity. Understanding quicksort's pivot selection and partition scheme is essential for implementing efficient sorting on embedded systems and for low-latency order book matching engines where sort performance is on the critical path.

Counting Sort

Example 40: Counting Sort for Integer Keys

Counting sort achieves O(n + k) time where k is the range of values. It counts occurrences of each value, computes cumulative positions, and places each element in its correct output position — no comparisons required.

#include <stdio.h>
#include <stdlib.h>
 
void counting_sort(int arr[], int n, int output[]) {
    if (n == 0) return;
    // => empty input: return immediately
 
    int max_val = arr[0];
    for (int i = 1; i < n; i++) {
        if (arr[i] > max_val) max_val = arr[i];
    }
    // => find the range maximum; O(n) scan
 
    int *count = (int *)calloc(max_val + 1, sizeof(int));
    // => count[i] will store how many times value i appears in arr
 
    for (int i = 0; i < n; i++) {
        count[arr[i]]++;
        // => increment count for each value seen
    }
 
    int idx = 0;
    for (int val = 0; val <= max_val; val++) {
        for (int f = 0; f < count[val]; f++) {
            output[idx++] = val;
            // => append count[val] copies of val to output
        }
    }
    // => values are added in ascending order because we iterate from 0
 
    free(count);
}
 
int main(void) {
    int arr[] = {4, 2, 2, 8, 3, 3, 1};
    int n = 7;
    int output[7];
    counting_sort(arr, n, output);
    printf("[");
    for (int i = 0; i < n; i++) { if (i) printf(", "); printf("%d", output[i]); }
    printf("]\n"); // => Output: [1, 2, 2, 3, 3, 4, 8]
    return 0;
}

Key Takeaway: Counting sort runs in O(n + k) time and O(k) space where k is the value range. It is optimal when k = O(n), but becomes impractical when k is much larger than n (e.g., sorting 10 numbers from 0 to 10 billion).

Why It Matters: Counting sort and its extension radix sort power ultra-fast sorting in domains with bounded integer keys: sorting network packets by port number (0-65535), sorting grades (0-100), and bucket sorting IP addresses for routing table lookups. Database engines use counting sort variants for sorting small integer columns in analytics queries where the value range fits in L1 cache. In competitive programming, counting sort frequently enables solutions that would otherwise time-out with comparison-based O(n log n) algorithms.

Tree Balancing Concepts

Example 41: AVL Tree Height and Balance Factor

An AVL tree maintains a balance factor (height of left subtree minus height of right subtree) of -1, 0, or +1 at every node. When an insertion violates this invariant, a rotation restores balance. This keeps tree height at O(log n), preventing BST degeneration.

#include <stdio.h>
#include <stdlib.h>
 
typedef struct AVLNode {
    int val;
    struct AVLNode *left;
    struct AVLNode *right;
    int height;
    // => height of a leaf node is 1 (counts the node itself)
} AVLNode;
 
AVLNode *new_avl(int val) {
    AVLNode *n = (AVLNode *)malloc(sizeof(AVLNode));
    n->val = val;
    n->left = NULL;
    n->right = NULL;
    n->height = 1;
    return n;
}
 
int get_height(AVLNode *node) {
    return node ? node->height : 0;
    // => empty subtree has height 0
}
 
int get_balance(AVLNode *node) {
    return node ? get_height(node->left) - get_height(node->right) : 0;
    // => positive balance: left-heavy tree
    // => negative balance: right-heavy tree
}
 
int max_int(int a, int b) { return a > b ? a : b; }
 
AVLNode *right_rotate(AVLNode *z) {
    // => z is the unbalanced node (balance factor > 1, left-heavy)
    AVLNode *y = z->left;
    AVLNode *T3 = y->right;
 
    y->right = z;
    z->left = T3;
 
    z->height = 1 + max_int(get_height(z->left), get_height(z->right));
    y->height = 1 + max_int(get_height(y->left), get_height(y->right));
    return y;
}
 
AVLNode *left_rotate(AVLNode *z) {
    // => z is unbalanced (balance factor < -1, right-heavy)
    AVLNode *y = z->right;
    AVLNode *T2 = y->left;
 
    y->left = z;
    z->right = T2;
 
    z->height = 1 + max_int(get_height(z->left), get_height(z->right));
    y->height = 1 + max_int(get_height(y->left), get_height(y->right));
    return y;
}
 
AVLNode *avl_insert(AVLNode *root, int val) {
    if (!root) return new_avl(val);
    if (val < root->val) root->left = avl_insert(root->left, val);
    else if (val > root->val) root->right = avl_insert(root->right, val);
    else return root;
 
    root->height = 1 + max_int(get_height(root->left), get_height(root->right));
    int balance = get_balance(root);
 
    // => Case LL
    if (balance > 1 && val < root->left->val) return right_rotate(root);
    // => Case RR
    if (balance < -1 && val > root->right->val) return left_rotate(root);
    // => Case LR
    if (balance > 1 && val > root->left->val) {
        root->left = left_rotate(root->left);
        return right_rotate(root);
    }
    // => Case RL
    if (balance < -1 && val < root->right->val) {
        root->right = right_rotate(root->right);
        return left_rotate(root);
    }
 
    return root;
}
 
int main(void) {
    AVLNode *root = NULL;
    int vals[] = {10, 20, 30, 40, 50, 25};
    for (int i = 0; i < 6; i++) root = avl_insert(root, vals[i]);
 
    printf("Root: %d\n", root->val);            // => Output: Root: 30
    printf("Root height: %d\n", root->height);   // => Output: Root height: 3
    printf("Root balance: %d\n", get_balance(root)); // => Output: Root balance: 0
    return 0;
}

Key Takeaway: AVL trees maintain O(log n) height by rotating nodes when the balance factor exceeds ±1. There are four rotation cases (LL, RR, LR, RL) and each rotation is an O(1) pointer rearrangement.

Why It Matters: AVL trees and their cousin Red-Black trees guarantee O(log n) worst-case search, insert, and delete — unlike plain BSTs that degrade to O(n). Java's TreeMap and TreeSet use Red-Black trees, Linux's completely fair scheduler uses a Red-Black tree to track process runtimes, and PostgreSQL uses B-trees (a generalisation of balanced binary trees) for its primary indexes. Understanding balance factors and rotations is prerequisite knowledge for implementing custom ordered containers and for diagnosing slow queries caused by index imbalance.

BFS and DFS on Trees

Example 42: BFS on a Binary Tree (Level-Order Traversal)

Breadth-first search visits nodes level by level using a queue. On a binary tree, this produces the level-order sequence and is used to find the shortest path in unweighted trees.

graph TD
    R["1"]
    L["2"]
    RL["3"]
    LL["4"]
    LR["5"]
 
    R --> L
    R --> RL
    L --> LL
    L --> LR
 
    style R fill:#0173B2,stroke:#000,color:#fff
    style L fill:#DE8F05,stroke:#000,color:#fff
    style RL fill:#DE8F05,stroke:#000,color:#fff
    style LL fill:#029E73,stroke:#000,color:#fff
    style LR fill:#029E73,stroke:#000,color:#fff

#include <stdio.h>
#include <stdlib.h>
 
typedef struct TreeNode {
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
} TreeNode;
 
TreeNode *new_tree_node(int val) {
    TreeNode *n = (TreeNode *)malloc(sizeof(TreeNode));
    n->val = val;
    n->left = NULL;
    n->right = NULL;
    return n;
}
 
void bfs_level_order(TreeNode *root) {
    if (!root) return;
 
    TreeNode *queue[100];
    int front = 0, back = 0;
    queue[back++] = root;
    // => use array as queue; front for dequeue, back for enqueue
 
    printf("[");
    int first_level = 1;
    while (front < back) {
        int level_size = back - front;
        // => number of nodes at the current level
        if (!first_level) printf(", ");
        printf("[");
        first_level = 0;
 
        for (int i = 0; i < level_size; i++) {
            TreeNode *node = queue[front++];
            // => dequeue the next node from the front
            if (i > 0) printf(", ");
            printf("%d", node->val);
 
            if (node->left) queue[back++] = node->left;
            // => enqueue left child for next level processing
            if (node->right) queue[back++] = node->right;
            // => enqueue right child for next level processing
        }
        printf("]");
    }
    printf("]\n");
}
 
int main(void) {
    TreeNode *root = new_tree_node(1);
    root->left = new_tree_node(2);
    root->right = new_tree_node(3);
    root->left->left = new_tree_node(4);
    root->left->right = new_tree_node(5);
 
    bfs_level_order(root);
    // => Output: [[1], [2, 3], [4, 5]]
    return 0;
}

Key Takeaway: BFS on trees uses a deque as a queue. Processing exactly len(queue) nodes per outer loop iteration cleanly separates each level without needing a sentinel marker.

Why It Matters: Level-order traversal powers features in real products: rendering DOM trees level by level for incremental page loading, breadth-first search in social graphs to find connections within N degrees, and pathfinding in grid-based games where each cell is a tree/graph node. Using deque instead of a list for the queue is an important production detail — list.pop(0) shifts all remaining elements in O(n), making BFS O(n²) instead of O(n).

Example 43: DFS on a Binary Tree (Iterative with Stack)

Depth-first search explores as far as possible before backtracking. An iterative DFS implementation uses an explicit stack instead of recursion, avoiding Python's recursion depth limit for large trees.

#include <stdio.h>
#include <stdlib.h>
 
typedef struct TreeNode {
    int val;
    struct TreeNode *left;
    struct TreeNode *right;
} TreeNode;
 
TreeNode *new_node(int val) {
    TreeNode *n = (TreeNode *)malloc(sizeof(TreeNode));
    n->val = val; n->left = NULL; n->right = NULL;
    return n;
}
 
void dfs_iterative_preorder(TreeNode *root, int result[], int *size) {
    if (!root) return;
 
    TreeNode *stack[100];
    int top = 0;
    stack[top++] = root;
    // => stack holds nodes yet to be visited
 
    while (top > 0) {
        TreeNode *node = stack[--top];
        // => pop from top of stack: LIFO order gives DFS behavior
        result[(*size)++] = node->val;
        // => process node (preorder: root before children)
 
        if (node->right) stack[top++] = node->right;
        // => push right child first so left is processed first
        if (node->left) stack[top++] = node->left;
        // => left child is on top and will be processed next
    }
}
 
int main(void) {
    TreeNode *root = new_node(1);
    root->left = new_node(2);
    root->right = new_node(3);
    root->left->left = new_node(4);
    root->left->right = new_node(5);
 
    int result[5];
    int size = 0;
    dfs_iterative_preorder(root, result, &size);
    printf("[");
    for (int i = 0; i < size; i++) { if (i) printf(", "); printf("%d", result[i]); }
    printf("]\n"); // => Output: [1, 2, 4, 5, 3]
    return 0;
}

Key Takeaway: Iterative DFS with an explicit stack avoids Python's default 1000-frame recursion limit. Push the right child before the left child so that the left subtree is processed first, preserving standard preorder left-before-right semantics.

Why It Matters: Iterative DFS is used in production compilers for abstract syntax tree analysis, in static analysis tools (linters, type checkers) that walk expression trees, and in file system scanners (find equivalent) that traverse directory trees. Path enumeration with dfs_all_paths is used in network routing to find all paths between two nodes and in game engines to enumerate decision trees for AI planning.

Recursion Patterns

Example 44: Divide and Conquer — Maximum Subarray

Divide and conquer splits a problem into independent subproblems, solves each recursively, then combines results. Finding the maximum subarray sum demonstrates this pattern: the answer is either entirely in the left half, entirely in the right half, or crosses the midpoint.

#include <stdio.h>
#include <limits.h>
 
int max_crossing_sum(int arr[], int left, int mid, int right) {
    // => find maximum sum of subarray that crosses the midpoint
    int left_sum = INT_MIN;
    int total = 0;
    for (int i = mid; i >= left; i--) {
        total += arr[i];
        if (total > left_sum) left_sum = total;
    }
 
    int right_sum = INT_MIN;
    total = 0;
    for (int i = mid + 1; i <= right; i++) {
        total += arr[i];
        if (total > right_sum) right_sum = total;
    }
 
    return left_sum + right_sum;
    // => crossing sum = best left extension + best right extension
}
 
int max_of3(int a, int b, int c) {
    int m = a > b ? a : b;
    return m > c ? m : c;
}
 
int max_subarray_dc(int arr[], int left, int right) {
    if (left == right) return arr[left];
    // => base case: single element is its own max subarray
 
    int mid = (left + right) / 2;
    int left_max = max_subarray_dc(arr, left, mid);
    int right_max = max_subarray_dc(arr, mid + 1, right);
    int cross_max = max_crossing_sum(arr, left, mid, right);
 
    return max_of3(left_max, right_max, cross_max);
}
 
int main(void) {
    int arr[] = {-2, 1, -3, 4, -1, 2, 1, -5, 4};
    int n = 9;
    printf("%d\n", max_subarray_dc(arr, 0, n - 1));
    // => Output: 6  (subarray [4, -1, 2, 1] sums to 6)
    return 0;
}

Key Takeaway: Divide and conquer solves this in O(n log n) by splitting into subproblems and combining with a linear crossing-sum scan. Kadane's algorithm solves the same problem in O(n) via dynamic programming — the divide-and-conquer version teaches the pattern even if it is not optimal here.

Why It Matters: Divide and conquer underpins merge sort, quicksort, FFT (Fast Fourier Transform used in audio processing and polynomial multiplication), and Strassen's matrix multiplication. Understanding how to identify the base case, the split, and the combine step is the mental framework behind most O(n log n) algorithms. Many performance-critical calculations in signal processing, computational geometry, and machine learning learning algorithms decompose via divide and conquer.

Example 45: Backtracking — Generating Permutations

Backtracking builds candidates incrementally and abandons ("backtracks") a candidate the moment it cannot lead to a valid solution. Generating all permutations is the canonical backtracking example.

graph TD
    S["Start: []"]
    A["[1]"]
    B["[2]"]
    C["[3]"]
    AB["[1,2]"]
    AC["[1,3]"]
    BA["[2,1]"]
    BC["[2,3]"]
    CA["[3,1]"]
    CB["[3,2]"]
    ABC["[1,2,3]"]
    ACB["[1,3,2]"]
    BAC["[2,1,3]"]
    BCA["[2,3,1]"]
    CAB["[3,1,2]"]
    CBA["[3,2,1]"]
 
    S --> A
    S --> B
    S --> C
    A --> AB
    A --> AC
    B --> BA
    B --> BC
    C --> CA
    C --> CB
    AB --> ABC
    AC --> ACB
    BA --> BAC
    BC --> BCA
    CA --> CAB
    CB --> CBA
 
    style S fill:#0173B2,stroke:#000,color:#fff
    style A fill:#DE8F05,stroke:#000,color:#fff
    style B fill:#DE8F05,stroke:#000,color:#fff
    style C fill:#DE8F05,stroke:#000,color:#fff
    style ABC fill:#029E73,stroke:#000,color:#fff
    style ACB fill:#029E73,stroke:#000,color:#fff
    style BAC fill:#029E73,stroke:#000,color:#fff
    style BCA fill:#029E73,stroke:#000,color:#fff
    style CAB fill:#029E73,stroke:#000,color:#fff
    style CBA fill:#029E73,stroke:#000,color:#fff

#include <stdio.h>
 
#define MAX_N 10
 
int results[720][MAX_N]; // => max 6! = 720 permutations for n<=6
int result_count = 0;
 
void backtrack(int current[], int cur_size, int remaining[], int rem_size, int n) {
    if (rem_size == 0) {
        for (int i = 0; i < n; i++) results[result_count][i] = current[i];
        result_count++;
        // => base case: all elements placed, record this permutation
        return;
    }
 
    for (int i = 0; i < rem_size; i++) {
        // => try each remaining element as the next choice
        current[cur_size] = remaining[i];
        // => choose: extend current path
 
        int next_rem[MAX_N];
        int k = 0;
        for (int j = 0; j < rem_size; j++) {
            if (j != i) next_rem[k++] = remaining[j];
        }
        // => remaining without chosen element
 
        backtrack(current, cur_size + 1, next_rem, rem_size - 1, n);
        // => explore: recurse with updated state
        // => unchoose is implicit: current[cur_size] will be overwritten
    }
}
 
int main(void) {
    int nums[] = {1, 2, 3};
    int current[MAX_N];
    result_count = 0;
    backtrack(current, 0, nums, 3, 3);
 
    printf("[");
    for (int r = 0; r < result_count; r++) {
        if (r) printf(",");
        printf("[%d,%d,%d]", results[r][0], results[r][1], results[r][2]);
    }
    printf("]\n");
    // => Output: [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
    return 0;
}

Key Takeaway: The backtracking pattern is: choose an option, explore with it (recurse), then unchoose to restore state before trying the next option. The current.pop() after the recursive call is the essential "undo" step.

Why It Matters: Backtracking solves constraint satisfaction problems that appear across domains: Sudoku solvers, N-queens placement, regex matching engines, and SAT solvers all use this pattern. The Python standard library's itertools.permutations uses an equivalent algorithm. In compiler design, backtracking is used for parsing ambiguous grammars. The key insight — "try, recurse, undo" — transfers directly to solving problems that would require exponential space to enumerate iteratively.

Two-Pointer Technique

Example 46: Two Pointers — Two Sum in Sorted Array

The two-pointer technique uses two indices moving toward each other (or in the same direction) to avoid a nested O(n²) loop. For a sorted array, pointers from both ends can find a pair summing to a target in O(n) time.

#include <stdio.h>
 
int two_sum_sorted(int arr[], int size, int target, int *a, int *b) {
    // => arr must be sorted in ascending order
    int left = 0, right = size - 1;
    // => left starts at smallest, right starts at largest
 
    while (left < right) {
        int sum = arr[left] + arr[right];
        if (sum == target) {
            *a = arr[left]; *b = arr[right];
            return 1;
            // => found pair: return the actual values
        } else if (sum < target) {
            left++;
            // => sum too small: moving left rightward increases sum
        } else {
            right--;
            // => sum too large: moving right leftward decreases sum
        }
    }
    return 0;
    // => no pair found
}
 
int remove_duplicates_sorted(int arr[], int size) {
    if (size == 0) return 0;
    int slow = 0;
    // => slow pointer marks the last position of the unique-values prefix
 
    for (int fast = 1; fast < size; fast++) {
        if (arr[fast] != arr[slow]) {
            slow++;
            arr[slow] = arr[fast];
            // => extend unique prefix by one
        }
    }
    return slow + 1;
}
 
int main(void) {
    int arr[] = {1, 2, 3, 4, 6};
    int a, b;
    if (two_sum_sorted(arr, 5, 6, &a, &b)) printf("(%d, %d)\n", a, b);
    // => Output: (2, 4)
    if (two_sum_sorted(arr, 5, 9, &a, &b)) printf("(%d, %d)\n", a, b);
    // => Output: (3, 6)
    if (!two_sum_sorted(arr, 5, 10, &a, &b)) printf("None\n");
    // => Output: None  (no valid pair)
 
    int nums[] = {0, 0, 1, 1, 1, 2, 2, 3, 3, 4};
    int len = remove_duplicates_sorted(nums, 10);
    printf("[");
    for (int i = 0; i < len; i++) { if (i) printf(", "); printf("%d", nums[i]); }
    printf("]\n"); // => Output: [0, 1, 2, 3, 4]
    return 0;
}

Key Takeaway: The two-pointer technique on a sorted array reduces O(n²) search to O(n) by exploiting the ordered structure — when the sum is too small, advance the left pointer; when too large, retreat the right pointer. The slow/fast variant handles deduplication without extra space.

Why It Matters: Two-pointer patterns appear in three-sum problems, container with most water, palindrome checking, and merging two sorted arrays. LeetCode's top interview questions list includes at least five two-pointer problems (Dutch National Flag, trapping rain water, container with most water). In database query execution, sort-merge join uses the same two-pointer principle to join two sorted result sets in O(n + m) without a hash table.

Example 47: Two Pointers — Container With Most Water

Given heights of vertical lines, find two lines that together with the x-axis form a container holding the most water. Two pointers converge from the outside inward, always moving the shorter line.

#include <stdio.h>
 
int max_water(int height[], int size) {
    int left = 0, right = size - 1;
    int max_area = 0;
 
    while (left < right) {
        int width = right - left;
        int h = height[left] < height[right] ? height[left] : height[right];
        int area = h * width;
        // => area is limited by the shorter of the two lines
 
        if (area > max_area) max_area = area;
 
        if (height[left] < height[right]) {
            left++;
            // => left line is shorter: moving left inward might find a taller line
        } else {
            right--;
            // => right line is shorter or equal: move right inward
        }
    }
    return max_area;
}
 
int main(void) {
    int heights[] = {1, 8, 6, 2, 5, 4, 8, 3, 7};
    printf("%d\n", max_water(heights, 9));
    // => Output: 49
    // => best pair: heights[1]=8 and heights[8]=7, width=7
    // => area = min(8,7) * 7 = 7 * 7 = 49
    return 0;
}

Key Takeaway: Always move the pointer at the shorter line inward — moving the taller line inward can never increase area (width decreases and height is already limited by the short side), so the greedy choice is to move the shorter pointer seeking a potentially taller replacement.

Why It Matters: This greedy two-pointer strategy reduces an O(n²) brute-force search to O(n) with a proof-by-contradiction argument: moving the taller side cannot improve the result, so we never miss the optimal pair. The same reasoning applies to problems like "minimize maximum difference" and "maximize rectangle in histogram" variants. Recognizing when a sorted or monotonic structure allows pointer convergence rather than nested iteration is a key technique for reducing time complexity in interval and geometric algorithms.

Sliding Window

Example 48: Fixed-Size Sliding Window — Maximum Sum Subarray

The sliding window technique maintains a contiguous subarray of fixed or variable size by advancing both ends of the window together. For fixed-size windows, one element leaves and one enters per step, avoiding O(n·k) recomputation.

graph LR
    A["[2, 1, 5, 1, 3, 2]"]
    W1["Window 1: [2,1,5] sum=8"]
    W2["Window 2: [1,5,1] sum=7"]
    W3["Window 3: [5,1,3] sum=9"]
    W4["Window 4: [1,3,2] sum=6"]
 
    A --> W1
    W1 -->|slide right| W2
    W2 -->|slide right| W3
    W3 -->|slide right| W4
 
    style A fill:#0173B2,stroke:#000,color:#fff
    style W1 fill:#DE8F05,stroke:#000,color:#fff
    style W2 fill:#DE8F05,stroke:#000,color:#fff
    style W3 fill:#029E73,stroke:#000,color:#fff
    style W4 fill:#CA9161,stroke:#000,color:#fff

#include <stdio.h>
 
int max_sum_subarray_k(int arr[], int n, int k) {
    // => find maximum sum of any contiguous subarray of exactly k elements
    if (n < k) return -1;
 
    int window_sum = 0;
    for (int i = 0; i < k; i++) window_sum += arr[i];
    // => compute sum of first window [0 .. k-1]
    int max_sum = window_sum;
 
    for (int i = k; i < n; i++) {
        // => slide window one position to the right
        window_sum += arr[i];
        // => add incoming element (right edge of new window)
        window_sum -= arr[i - k];
        // => remove outgoing element (left edge of old window)
 
        if (window_sum > max_sum) max_sum = window_sum;
    }
 
    return max_sum;
}
 
int main(void) {
    int arr[] = {2, 1, 5, 1, 3, 2};
    printf("%d\n", max_sum_subarray_k(arr, 6, 3)); // => Output: 9  (window [5,1,3])
    printf("%d\n", max_sum_subarray_k(arr, 6, 2)); // => Output: 6  (window [1,5])
    return 0;
}

Key Takeaway: The fixed sliding window adds the incoming element and subtracts the outgoing element each step, maintaining an O(1) update cost. The total time across all n-k+1 windows is O(n) regardless of window size k.

Why It Matters: Fixed sliding windows power real-time analytics: computing moving averages for stock price smoothing, calculating rolling error rates in observability dashboards, and implementing network rate limiting that counts requests in a fixed time window. Any metric that requires "sum/average/max over the last N events" can be computed with O(1) updates per new event using this pattern, enabling streaming calculations over unbounded data streams without storing all history.

Example 49: Variable-Size Sliding Window — Longest Substring Without Repeating Characters

A variable-size sliding window expands when conditions are met and contracts when they are violated. The two-pointer approach with a set finds the longest contiguous substring without duplicate characters in O(n) time.

#include <stdio.h>
#include <string.h>
 
int length_of_longest_unique_substring(const char *s) {
    int char_in_window[256] = {0};
    // => tracks characters currently in the window (ASCII range)
    int left = 0;
    int max_length = 0;
    int n = (int)strlen(s);
 
    for (int right = 0; right < n; right++) {
        // => right pointer expands window one character at a time
        while (char_in_window[(unsigned char)s[right]]) {
            // => current character already in window: shrink from left
            char_in_window[(unsigned char)s[left]] = 0;
            left++;
        }
 
        char_in_window[(unsigned char)s[right]] = 1;
        // => add current character to window
        int window_length = right - left + 1;
        if (window_length > max_length) max_length = window_length;
    }
 
    return max_length;
}
 
int main(void) {
    printf("%d\n", length_of_longest_unique_substring("abcabcbb")); // => Output: 3  ("abc")
    printf("%d\n", length_of_longest_unique_substring("bbbbb"));    // => Output: 1  ("b")
    printf("%d\n", length_of_longest_unique_substring("pwwkew"));   // => Output: 3  ("wke")
    printf("%d\n", length_of_longest_unique_substring(""));         // => Output: 0  (empty string)
    return 0;
}

Key Takeaway: The variable window shrinks from the left whenever the right pointer encounters a character already in the window. Because each character enters and exits the char_set at most once, the total work is O(n) amortised.

Why It Matters: Variable sliding windows solve many substring/subarray optimisation problems: minimum window substring (find smallest window containing all characters of a target), longest subarray with sum ≤ k, and maximum number of consecutive 1s with k flips allowed. These patterns appear in text editors (efficient search-and-highlight), network protocol parsers (framing variable-length messages), and genomics (finding repeating motifs in DNA sequences). The amortised O(n) proof — each pointer moves at most n steps — is a recurring argument in algorithm analysis.

Prefix Sums

Example 50: Prefix Sum Array for Range Queries

A prefix sum array stores cumulative totals so that any range sum sum(arr[l..r]) can be answered in O(1) after an O(n) preprocessing step. This trades O(n) preprocessing for O(1) query time, benefiting workloads with many range queries.

graph LR
    A["arr: [2,4,3,7,1,5]"]
    P["prefix: [0,2,6,9,16,17,22]"]
    Q["Query sum#40;2,4#41;: prefix#91;5#93;-prefix#91;2#93;=17-6=11"]
 
    A -->|build| P
    P -->|O#40;1#41; lookup| Q
 
    style A fill:#0173B2,stroke:#000,color:#fff
    style P fill:#DE8F05,stroke:#000,color:#fff
    style Q fill:#029E73,stroke:#000,color:#fff

#include <stdio.h>
 
void build_prefix_sum(int arr[], int n, int prefix[]) {
    prefix[0] = 0;
    // => prefix[0] = 0 (sum of zero elements)
    for (int i = 0; i < n; i++) {
        prefix[i + 1] = prefix[i] + arr[i];
        // => each prefix entry adds one more element to the running sum
    }
}
 
int range_sum(int prefix[], int left, int right) {
    // => sum of arr[left..right] inclusive
    return prefix[right + 1] - prefix[left];
    // => O(1) per query
}
 
int count_subarrays_with_sum_k(int arr[], int n, int k) {
    // => uses prefix sum + scanning: O(n^2) simple version for C
    int count = 0;
    int prefix[n + 1];
    prefix[0] = 0;
    for (int i = 0; i < n; i++) prefix[i + 1] = prefix[i] + arr[i];
 
    for (int i = 0; i < n; i++) {
        for (int j = i; j < n; j++) {
            if (prefix[j + 1] - prefix[i] == k) count++;
        }
    }
    return count;
}
 
int main(void) {
    int arr[] = {2, 4, 3, 7, 1, 5};
    int n = 6;
    int prefix[7];
    build_prefix_sum(arr, n, prefix);
 
    printf("[");
    for (int i = 0; i <= n; i++) { if (i) printf(", "); printf("%d", prefix[i]); }
    printf("]\n"); // => Output: [0, 2, 6, 9, 16, 17, 22]
 
    printf("%d\n", range_sum(prefix, 0, 2)); // => Output: 9   (2+4+3)
    printf("%d\n", range_sum(prefix, 1, 4)); // => Output: 15  (4+3+7+1)
    printf("%d\n", range_sum(prefix, 2, 5)); // => Output: 16  (3+7+1+5)
 
    int arr2[] = {1, 1, 1};
    printf("%d\n", count_subarrays_with_sum_k(arr2, 3, 2));
    // => Output: 2  ([1,1] at positions 0-1 and 1-2)
    return 0;
}

Key Takeaway: The prefix sum pattern reduces n range-sum queries from O(n²) to O(n) preprocessing plus O(1) per query. Pairing prefix sums with a hash map (prefix sum complement trick) solves subarray sum problems in O(n).

Why It Matters: Prefix sums underpin database aggregate functions over sliding windows, image processing (summed-area tables for fast blur filters), and financial analytics (cumulative return calculations). The prefix sum complement trick (freq[prefix_sum - k]) appears in LeetCode's top 10 most-asked problems and is used in fraud detection systems to find suspicious transaction subsequences that sum to threshold amounts in a single O(n) pass.

Graph Representation

Example 51: Adjacency List Representation

An adjacency list represents a graph as a dictionary mapping each node to its list of neighbours. This is space-efficient for sparse graphs (few edges relative to nodes) and enables O(1) iteration over a node's neighbours.

graph LR
    A["A"] --> B["B"]
    A --> C["C"]
    B --> D["D"]
    C --> D
    D --> E["E"]
 
    style A fill:#0173B2,stroke:#000,color:#fff
    style B fill:#DE8F05,stroke:#000,color:#fff
    style C fill:#DE8F05,stroke:#000,color:#fff
    style D fill:#029E73,stroke:#000,color:#fff
    style E fill:#CC78BC,stroke:#000,color:#fff

#include <stdio.h>
#include <string.h>
 
#define MAX_NODES 26
#define MAX_EDGES 10
 
typedef struct {
    int adj[MAX_NODES][MAX_EDGES];
    int adj_count[MAX_NODES];
    int directed;
} Graph;
 
void graph_init(Graph *g, int directed) {
    memset(g->adj_count, 0, sizeof(g->adj_count));
    g->directed = directed;
}
 
void add_edge(Graph *g, int u, int v) {
    g->adj[u][g->adj_count[u]++] = v;
    if (!g->directed) {
        g->adj[v][g->adj_count[v]++] = u;
    }
}
 
void bfs(Graph *g, int start) {
    int visited[MAX_NODES] = {0};
    int queue[MAX_NODES], front = 0, back = 0;
    visited[start] = 1;
    queue[back++] = start;
 
    printf("BFS: [");
    int first = 1;
    while (front < back) {
        int node = queue[front++];
        if (!first) printf(", ");
        printf("%c", 'A' + node);
        first = 0;
 
        for (int i = 0; i < g->adj_count[node]; i++) {
            int nb = g->adj[node][i];
            if (!visited[nb]) {
                visited[nb] = 1;
                queue[back++] = nb;
            }
        }
    }
    printf("]\n");
}
 
void dfs_helper(Graph *g, int node, int visited[]) {
    visited[node] = 1;
    printf("%c", 'A' + node);
 
    for (int i = 0; i < g->adj_count[node]; i++) {
        int nb = g->adj[node][i];
        if (!visited[nb]) {
            printf(", ");
            dfs_helper(g, nb, visited);
        }
    }
}
 
void dfs(Graph *g, int start) {
    int visited[MAX_NODES] = {0};
    printf("DFS: [");
    dfs_helper(g, start, visited);
    printf("]\n");
}
 
int main(void) {
    Graph g;
    graph_init(&g, 1); // directed
    add_edge(&g, 0, 1); // A->B
    add_edge(&g, 0, 2); // A->C
    add_edge(&g, 1, 3); // B->D
    add_edge(&g, 2, 3); // C->D
    add_edge(&g, 3, 4); // D->E
 
    bfs(&g, 0); // => Output: BFS: [A, B, C, D, E]
    dfs(&g, 0); // => Output: DFS: [A, B, D, E, C]
    return 0;
}

Key Takeaway: The adjacency list is the standard graph representation for algorithms courses and production systems. BFS uses a queue and visits nodes level by level; DFS uses a stack (or recursion) and explores depth-first. Both run in O(V + E) time where V is vertices and E is edges.

Why It Matters: Adjacency lists represent social networks (Twitter follows), dependency graphs (package managers), and network topologies (router connections). Python's NetworkX library uses adjacency list storage internally. The visited set is critical — without it, BFS and DFS would loop infinitely on graphs with cycles. O(V + E) traversal enables features like "find all connected users" and "detect import cycles in build systems" to run in time proportional to the data, not its square.

Example 52: Adjacency Matrix Representation

An adjacency matrix stores edges as a V×V boolean or weighted grid. It offers O(1) edge existence checks but uses O(V²) space, making it suitable for dense graphs where most pairs of nodes are connected.

#include <stdio.h>
#include <string.h>
 
void create_adjacency_matrix(int num_nodes, int edges[][2], int num_edges,
                              int directed, int matrix[][5]) {
    memset(matrix, 0, num_nodes * 5 * sizeof(int));
    for (int i = 0; i < num_edges; i++) {
        int u = edges[i][0], v = edges[i][1];
        matrix[u][v] = 1;
        if (!directed) matrix[v][u] = 1;
    }
}
 
int has_edge(int matrix[][5], int u, int v) {
    return matrix[u][v] == 1;
    // => O(1) check for edge existence
}
 
int main(void) {
    int edges[][2] = {{0,1},{0,2},{1,3},{2,3},{3,4}};
    int matrix[5][5];
    create_adjacency_matrix(5, edges, 5, 1, matrix);
 
    printf("Edge 0->1: %s\n", has_edge(matrix, 0, 1) ? "True" : "False");
    // => Output: Edge 0->1: True
    printf("Edge 1->0: %s\n", has_edge(matrix, 1, 0) ? "True" : "False");
    // => Output: Edge 1->0: False (directed)
 
    for (int i = 0; i < 5; i++) {
        printf("[");
        for (int j = 0; j < 5; j++) { if (j) printf(", "); printf("%d", matrix[i][j]); }
        printf("]\n");
    }
    // => Output:
    // => [0, 1, 1, 0, 0]
    // => [0, 0, 0, 1, 0]
    // => [0, 0, 0, 1, 0]
    // => [0, 0, 0, 0, 1]
    // => [0, 0, 0, 0, 0]
    return 0;
}

Key Takeaway: Adjacency matrices offer O(1) edge lookup but require O(V²) space and O(V) time to iterate a node's neighbours. Prefer adjacency lists for sparse graphs; prefer adjacency matrices when V is small and the graph is dense (many edges).

Why It Matters: Adjacency matrices power algorithms that need repeated edge existence checks: Floyd-Warshall all-pairs shortest path runs with matrix multiplication semantics on a weight matrix. Graphics engines use small dense matrices for mesh connectivity where O(1) edge lookup enables real-time collision detection. Recommendation systems represent user-item interaction matrices (a bipartite adjacency matrix) for collaborative filtering. The choice between list and matrix representation directly affects the time and space complexity of the algorithms built on top.

Recursive Patterns Combined

Example 53: Fibonacci with Memoisation

Naive recursive Fibonacci has O(2^n) time due to recomputing the same subproblems. Memoisation caches results, reducing this to O(n) time by solving each subproblem exactly once.

#include <stdio.h>
 
long long fib_naive(int n) {
    if (n <= 1) return n;
    // => base cases: fib(0)=0, fib(1)=1
    return fib_naive(n - 1) + fib_naive(n - 2);
    // => two recursive calls cause exponential branching
}
 
long long memo[101] = {0};
int memo_set[101] = {0};
 
long long fib_memo(int n) {
    if (memo_set[n]) return memo[n];
    // => return cached result immediately
    if (n <= 1) return n;
    memo_set[n] = 1;
    memo[n] = fib_memo(n - 1) + fib_memo(n - 2);
    // => compute once and store in memo
    return memo[n];
}
 
long long fib_dp(int n) {
    if (n <= 1) return n;
    long long prev2 = 0, prev1 = 1;
    // => prev2 = fib(i-2), prev1 = fib(i-1)
    for (int i = 2; i <= n; i++) {
        long long curr = prev1 + prev2;
        // => fib(i) = fib(i-1) + fib(i-2)
        prev2 = prev1;
        prev1 = curr;
        // => advance window: discard oldest, keep two most recent
    }
    return prev1;
}
 
int main(void) {
    printf("%lld\n", fib_naive(10));  // => Output: 55
    printf("%lld\n", fib_memo(50));   // => Output: 12586269025
    printf("%lld\n", fib_dp(50));     // => Output: 12586269025
    return 0;
}

Key Takeaway: Memoisation converts exponential recursion to linear time by caching each unique subproblem result. Bottom-up DP is further optimised to O(1) space by storing only the two most recent values instead of the full cache.

Why It Matters: The transition from naive recursion to memoisation is the core insight of dynamic programming, a technique used in sequence alignment algorithms (bioinformatics), optimal routing (Bellman-Ford), compiler optimisation (shortest path through a DAG of instructions), and financial option pricing (binomial tree models). Python's functools.lru_cache applies this same caching automatically to any pure function, making memoisation a practical production tool rather than just an interview concept.

Example 54: Recursion Pattern — Power Set (Subsets)

Generating all subsets of a set is a classic recursive pattern: for each element, include it or exclude it, leading to 2^n subsets. This is the foundation of combination enumeration and decision-tree algorithms.

#include <stdio.h>
 
#define MAX_N 10
#define MAX_SUBSETS 1024 // 2^10
 
int subsets[MAX_SUBSETS][MAX_N];
int subset_sizes[MAX_SUBSETS];
int subset_count = 0;
 
void backtrack(int nums[], int n, int start, int current[], int cur_size) {
    // => every partial state is a valid subset, so record immediately
    for (int i = 0; i < cur_size; i++) subsets[subset_count][i] = current[i];
    subset_sizes[subset_count] = cur_size;
    subset_count++;
 
    for (int i = start; i < n; i++) {
        current[cur_size] = nums[i];
        // => include nums[i] in current subset
        backtrack(nums, n, i + 1, current, cur_size + 1);
        // => recurse with remaining elements starting after i
        // => exclude nums[i]: backtrack to try next element (implicit pop)
    }
}
 
int main(void) {
    int nums[] = {1, 2, 3};
    int current[MAX_N];
    subset_count = 0;
    backtrack(nums, 3, 0, current, 0);
 
    printf("[");
    for (int s = 0; s < subset_count; s++) {
        if (s) printf(", ");
        printf("[");
        for (int i = 0; i < subset_sizes[s]; i++) {
            if (i) printf(", ");
            printf("%d", subsets[s][i]);
        }
        printf("]");
    }
    printf("]\n");
    // => Output: [[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]
    printf("%d\n", subset_count); // => Output: 8  (2^3 = 8)
    return 0;
}

Key Takeaway: The iterative power-set method is simple but builds all 2^n subsets in memory. The backtracking version generates subsets lazily and can prune early when constraints are violated (e.g., only generate subsets with sum ≤ limit), making backtracking more practical for constrained enumeration.

Why It Matters: Power set enumeration appears in feature selection (try every subset of features), distributed system partition fault modeling (consider every subset of nodes failing), and cryptographic key combination analysis. The backtracking variant is the skeleton for combination-sum problems, Sudoku solvers, and branch-and-bound optimisation used in integer linear programming solvers. Understanding the 2^n combinatorial explosion motivates constraint-based pruning as the central efficiency technique in search problems.

Combining Techniques

Example 55: Sliding Window + Hash Map — Minimum Window Substring

Find the smallest window in string s that contains all characters of string t. This combines a variable sliding window with a character frequency hash map for O(n + m) time.

#include <stdio.h>
#include <string.h>
#include <limits.h>
 
void min_window_substring(const char *s, const char *t, char *result) {
    if (!*t || !*s) { result[0] = '\0'; return; }
 
    int need[128] = {0};
    int window[128] = {0};
    int required = 0;
    int formed = 0;
 
    for (int i = 0; t[i]; i++) {
        if (need[(unsigned char)t[i]] == 0) required++;
        need[(unsigned char)t[i]]++;
    }
 
    int best_len = INT_MAX, best_left = 0;
    int left = 0;
    int slen = (int)strlen(s);
 
    for (int right = 0; right < slen; right++) {
        char c = s[right];
        window[(unsigned char)c]++;
 
        if (need[(unsigned char)c] && window[(unsigned char)c] == need[(unsigned char)c]) {
            formed++;
        }
 
        while (formed == required && left <= right) {
            int wlen = right - left + 1;
            if (wlen < best_len) {
                best_len = wlen;
                best_left = left;
            }
 
            char lc = s[left];
            window[(unsigned char)lc]--;
            if (need[(unsigned char)lc] && window[(unsigned char)lc] < need[(unsigned char)lc]) {
                formed--;
            }
            left++;
        }
    }
 
    if (best_len == INT_MAX) {
        result[0] = '\0';
    } else {
        strncpy(result, s + best_left, best_len);
        result[best_len] = '\0';
    }
}
 
int main(void) {
    char result[256];
    min_window_substring("ADOBECODEBANC", "ABC", result);
    printf("%s\n", result); // => Output: BANC
 
    min_window_substring("a", "a", result);
    printf("%s\n", result); // => Output: a
 
    min_window_substring("a", "aa", result);
    printf("\"%s\"\n", result); // => Output: ""  (not enough a's)
    return 0;
}

Key Takeaway: The two-phase loop (expand right until valid, then shrink left while valid) ensures each character is processed at most twice (once entering, once leaving), giving O(n) total operations. The formed counter avoids scanning all characters in need on every step.

Why It Matters: Minimum window substring is one of the most complex sliding window problems and appears in NLP preprocessing (find the smallest sentence fragment containing a set of keywords), security log analysis (find the shortest log sequence containing all required audit events), and text-diff algorithms. The two-pointer shrink pattern is also used in the minimum size subarray sum problem and the fruit-into-baskets problem. Mastering this multi-condition sliding window is a significant step toward solving real-time stream analytics problems efficiently.

Example 56: Two Pointers + Sorting — Three Sum

Find all unique triplets in an array that sum to zero. Sorting and two pointers reduce the O(n³) brute force to O(n²) while the sorted structure enables duplicate skipping.

#include <stdio.h>
#include <stdlib.h>
 
int cmp(const void *a, const void *b) { return *(int *)a - *(int *)b; }
 
void three_sum(int nums[], int n) {
    qsort(nums, n, sizeof(int), cmp);
    // => sort first: O(n log n)
 
    printf("[");
    int first = 1;
    for (int i = 0; i < n - 2; i++) {
        if (i > 0 && nums[i] == nums[i - 1]) continue;
        // => skip duplicate values for the first element
 
        int left = i + 1, right = n - 1;
        while (left < right) {
            int total = nums[i] + nums[left] + nums[right];
            if (total == 0) {
                if (!first) printf(", ");
                printf("[%d, %d, %d]", nums[i], nums[left], nums[right]);
                first = 0;
 
                while (left < right && nums[left] == nums[left + 1]) left++;
                while (left < right && nums[right] == nums[right - 1]) right--;
                left++;
                right--;
            } else if (total < 0) {
                left++;
            } else {
                right--;
            }
        }
    }
    printf("]\n");
}
 
int main(void) {
    int nums1[] = {-1, 0, 1, 2, -1, -4};
    three_sum(nums1, 6);
    // => Output: [[-1, -1, 2], [-1, 0, 1]]
 
    int nums2[] = {0, 0, 0, 0};
    three_sum(nums2, 4);
    // => Output: [[0, 0, 0]]
    return 0;
}

Key Takeaway: Sort first, then use an outer loop for the first element and two-pointer scan for the remaining pair. Duplicate skipping at each level (for i, left, and right) ensures uniqueness without a separate set to filter results.

Why It Matters: The three-sum pattern generalises to k-sum problems and is a fundamental building block for many problems: closest triplet to a target, 4-sum problems, and triangle inequality checks. In computational geometry, three-sum appears in convex hull algorithms and finding collinear points. The O(n²) solution enabled by sorting demonstrates how transforming input (sorting) can unlock more efficient algorithms than direct enumeration, a pattern that recurs throughout algorithm design.

Example 57: BFS + Graph — Shortest Path in Unweighted Graph

BFS on an unweighted graph finds the shortest path between two nodes because it explores nodes in order of increasing distance from the source. Each level of the BFS queue corresponds to one more edge traversed.

#include <stdio.h>
#include <string.h>
 
#define MAX_NODES 26
#define MAX_EDGES 10
#define MAX_PATH 100
 
typedef struct {
    int adj[MAX_NODES][MAX_EDGES];
    int adj_count[MAX_NODES];
} UGraph;
 
void ugraph_init(UGraph *g) { memset(g->adj_count, 0, sizeof(g->adj_count)); }
 
void ugraph_add_edge(UGraph *g, int u, int v) {
    g->adj[u][g->adj_count[u]++] = v;
    g->adj[v][g->adj_count[v]++] = u;
}
 
int shortest_path_bfs(UGraph *g, int start, int end, int path[], int *path_len) {
    if (start == end) { path[0] = start; *path_len = 1; return 1; }
 
    int visited[MAX_NODES] = {0};
    int parent[MAX_NODES];
    memset(parent, -1, sizeof(parent));
    int queue[MAX_NODES], front = 0, back = 0;
    visited[start] = 1;
    queue[back++] = start;
 
    while (front < back) {
        int node = queue[front++];
        for (int i = 0; i < g->adj_count[node]; i++) {
            int nb = g->adj[node][i];
            if (!visited[nb]) {
                visited[nb] = 1;
                parent[nb] = node;
                if (nb == end) {
                    // => reconstruct path
                    int temp[MAX_PATH], tlen = 0;
                    for (int c = end; c != -1; c = parent[c]) temp[tlen++] = c;
                    *path_len = tlen;
                    for (int j = 0; j < tlen; j++) path[j] = temp[tlen - 1 - j];
                    return 1;
                }
                queue[back++] = nb;
            }
        }
    }
    *path_len = 0;
    return 0;
}
 
int main(void) {
    UGraph g;
    ugraph_init(&g);
    // A=0,B=1,C=2,D=3,E=4,F=5
    ugraph_add_edge(&g, 0, 1); ugraph_add_edge(&g, 0, 2);
    ugraph_add_edge(&g, 1, 3); ugraph_add_edge(&g, 1, 4);
    ugraph_add_edge(&g, 2, 5); ugraph_add_edge(&g, 4, 5);
 
    int path[MAX_PATH], plen;
    char names[] = "ABCDEF";
    if (shortest_path_bfs(&g, 0, 5, path, &plen)) {
        printf("[");
        for (int i = 0; i < plen; i++) { if (i) printf(", "); printf("'%c'", names[path[i]]); }
        printf("]\n"); // => Output: ['A', 'C', 'F']
    }
 
    if (shortest_path_bfs(&g, 0, 4, path, &plen)) {
        printf("[");
        for (int i = 0; i < plen; i++) { if (i) printf(", "); printf("'%c'", names[path[i]]); }
        printf("]\n"); // => Output: ['A', 'B', 'E']
    }
    return 0;
}

Key Takeaway: BFS guarantees shortest-path correctness on unweighted graphs because it visits nodes in non-decreasing order of edge count from the source. The first time BFS reaches the destination, the current path is necessarily shortest.

Why It Matters: BFS shortest path is used in GPS navigation (hops in road networks), social network analysis (degrees of separation), web crawlers (finding shortest link path between pages), and puzzle solvers (15-puzzle, Rubik's cube state-space search). For weighted graphs, Dijkstra's algorithm extends BFS by replacing the FIFO queue with a priority queue (heap), combining the BFS exploration strategy with the heap from Example 35. Understanding BFS correctness is the prerequisite for understanding Dijkstra's proof and the broader family of best-first search algorithms.

This section covered 29 examples (Examples 29-57) spanning binary search (O(log n) sorted array operations), hash table internals (chaining, frequency counting), binary search trees (insert, search, three traversal orders), heaps (min-heap, max-heap, heapify), efficient sorting algorithms (merge sort O(n log n), quicksort average O(n log n) worst O(n²), counting sort O(n+k)), AVL tree balance concepts, BFS and DFS on trees and graphs, divide-and-conquer and backtracking recursion patterns, two-pointer and sliding window techniques, prefix sums, and adjacency list/matrix graph representations.

Intermediate

Command Palette