Advanced
This section covers expert-level algorithms and data structures used in production systems and competitive programming. Each example demonstrates a technique that solves problems standard approaches cannot handle efficiently. Topics span dynamic programming, shortest-path algorithms, prefix trees, range queries, disjoint sets, constraint solving, string matching, bitwise methods, and monotonic data structures. All examples include implementations in C, Go, Python, and Java.
Dynamic Programming
Example 58: Memoization — Top-Down Fibonacci
Memoization caches the result of each subproblem the first time it is computed so that repeated calls return in O(1) instead of recomputing exponentially. It transforms a naive O(2^n) recursive solution into O(n) time with O(n) extra space.
#include <stdio.h>
#include <string.h>
// => Use long long for large Fibonacci values; fib(50) exceeds 32-bit range
long long cache[101];
// => cache[i] stores fib(i); -1 means "not yet computed"
int cache_init = 0;
long long fib_memo(int n) {
if (!cache_init) {
memset(cache, -1, sizeof(cache));
// => Initialize all entries to -1 (uncomputed sentinel)
cache_init = 1;
}
if (cache[n] != -1) {
return cache[n]; // => O(1) lookup: already solved this subproblem
}
if (n <= 1) {
return cache[n] = n; // => Base cases: fib(0)=0, fib(1)=1
}
cache[n] = fib_memo(n - 1) + fib_memo(n - 2);
// => Store result before returning — this is the memoization step
// => Without this, fib(50) would require ~2^50 recursive calls (~1 quadrillion)
return cache[n]; // => Return cached result
}
int main(void) {
printf("%lld\n", fib_memo(10)); // => Output: 55
printf("%lld\n", fib_memo(50)); // => Output: 12586269025 (computed instantly due to cache)
// => fib(100) exceeds long long range; Python handles arbitrary precision natively
return 0;
}Key Takeaway: Memoization eliminates redundant computation in recursive problems by caching results keyed on function arguments. Apply it whenever you see overlapping subproblems in a recursive call tree.
Why It Matters: Memoization is the first optimization technique to reach for when a brute-force recursion is correct but too slow. Production systems use it in route-planning, NLP parsing (CYK algorithm), game AI (minimax with alpha-beta pruning), and compiler optimization passes. The pattern extends beyond Fibonacci to any pure function called repeatedly with the same arguments, including database query result caching and HTTP response caching middleware.
Example 59: Tabulation — Bottom-Up Knapsack
Tabulation builds the solution table iteratively from the smallest subproblems up, avoiding recursion overhead and stack-depth limits. The 0/1 Knapsack problem asks: given items with weights and values, which subset fits in capacity W and maximizes value?
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Items + Capacity"] -->|fill table row by row| B["dp table n x W+1"]
B -->|read dp[n][W]| C["Maximum Value"]
B -->|backtrack through table| D["Selected Items"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
#include <stdio.h>
// => Returns max of two integers
int max(int a, int b) { return a > b ? a : b; }
int knapsack_01(int weights[], int values[], int n, int capacity) {
// => weights[i] and values[i] describe item i
// => capacity is the maximum weight the knapsack can hold
int dp[n + 1][capacity + 1];
// => dp[i][w] = max value using first i items with weight limit w
// => All initialized to 0: no items or no capacity -> zero value
for (int i = 0; i <= n; i++) {
for (int w = 0; w <= capacity; w++) {
if (i == 0 || w == 0) {
dp[i][w] = 0; // => Base case
} else {
// => Option 1: skip item i (don't include it)
dp[i][w] = dp[i - 1][w]; // => Inherit value from previous row
int wi = weights[i - 1]; // => Weight of current item (0-indexed)
int vi = values[i - 1]; // => Value of current item (0-indexed)
if (wi <= w) {
// => Option 2: include item i if it fits
int include = dp[i - 1][w - wi] + vi;
// => dp[i-1][w-wi]: best value with remaining capacity after taking item i
dp[i][w] = max(dp[i][w], include);
// => Take the better of skipping or including
}
}
}
}
return dp[n][capacity]; // => Maximum achievable value
}
int main(void) {
int weights[] = {2, 3, 4, 5};
int values[] = {3, 4, 5, 6};
int W = 5;
printf("%d\n", knapsack_01(weights, values, 4, W)); // => Output: 7
// => Best: item 0 (weight=2, value=3) + item 1 (weight=3, value=4) = total weight 5, value 7
return 0;
}Key Takeaway: The 0/1 Knapsack DP table runs in O(n·W) time and space. Each cell represents the optimal solution for a subproblem defined by item count and remaining capacity.
Why It Matters: Knapsack variants appear across scheduling (allocating CPU budget to jobs), finance (portfolio optimization under capital constraints), logistics (packing shipments), and feature selection in machine learning. Understanding the DP table construction lets you adapt the pattern to unbounded knapsack, multi-dimensional constraints, and fractional relaxations used in branch-and-bound solvers.
Example 60: Longest Common Subsequence (LCS)
LCS finds the longest sequence of characters (not necessarily contiguous) present in both strings in order. It underpins diff utilities, DNA sequence alignment, and plagiarism detection.
#include <stdio.h>
#include <string.h>
void lcs(const char *s1, const char *s2) {
int m = strlen(s1), n = strlen(s2);
int dp[m + 1][n + 1];
// => dp[i][j] = LCS length of s1[:i] and s2[:j]
// => Extra row/col for empty-string base cases (all zeros)
memset(dp, 0, sizeof(dp));
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1[i - 1] == s2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
// => Characters match: extend LCS by 1
} else {
dp[i][j] = dp[i - 1][j] > dp[i][j - 1] ? dp[i - 1][j] : dp[i][j - 1];
// => No match: take better of skipping a character from either string
}
}
}
// => Reconstruct the actual LCS string by backtracking through dp table
char lcs_str[m + n + 1];
int idx = 0;
int i = m, j = n; // => Start from bottom-right corner
while (i > 0 && j > 0) {
if (s1[i - 1] == s2[j - 1]) {
lcs_str[idx++] = s1[i - 1]; // => This character is in the LCS
i--;
j--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
i--; // => Move up: came from dp[i-1][j]
} else {
j--; // => Move left: came from dp[i][j-1]
}
}
// => Reverse because we backtracked
for (int l = 0; l < idx / 2; l++) {
char tmp = lcs_str[l];
lcs_str[l] = lcs_str[idx - 1 - l];
lcs_str[idx - 1 - l] = tmp;
}
lcs_str[idx] = '\0';
printf("%d\n", dp[m][n]); // => Output: 4
printf("%s\n", lcs_str); // => Output: BCBA
}
int main(void) {
lcs("ABCBDAB", "BDCABA");
return 0;
}Key Takeaway: LCS runs in O(m·n) time and space. The DP recurrence has two cases: characters match (extend by 1) or they don’t (take the max of two neighbors in the table).
Why It Matters: Git’s diff command, patch utilities, and code review tools all rely on LCS or edit-distance variants to show what changed between file versions. Bioinformatics tools like BLAST use LCS to align DNA and protein sequences across genomes. Understanding LCS gives you the foundation for edit distance (Levenshtein), which powers spell checkers, fuzzy search, and OCR post-correction in production systems.
Example 61: Longest Increasing Subsequence (LIS)
LIS finds the length of the longest strictly increasing subsequence in an array. The O(n log n) patience-sorting approach improves on the naive O(n²) DP.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Input Array"] -->|process each element| B["Tails Array<br/>patience sort piles"]
B -->|binary search for position| C["Replace or Append"]
C -->|length of tails array| D["LIS Length"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#CC78BC,stroke:#000,color:#fff
style D fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
// => Binary search: find leftmost position where tails[pos] >= num
int bisect_left(int tails[], int size, int num) {
int lo = 0, hi = size;
while (lo < hi) {
int mid = (lo + hi) / 2;
if (tails[mid] < num) {
lo = mid + 1;
} else {
hi = mid;
}
}
return lo;
}
int lis_length(int nums[], int n) {
int tails[n];
// => tails[i] = smallest tail element of all increasing subsequences of length i+1
// => tails is always sorted (maintained as invariant)
int size = 0;
for (int i = 0; i < n; i++) {
int pos = bisect_left(tails, size, nums[i]);
// => Binary search: find leftmost position where num can be inserted
// => O(log n) per element instead of O(n) linear scan
if (pos == size) {
tails[size++] = nums[i]; // => num extends the longest subsequence found so far
} else {
tails[pos] = nums[i]; // => Replace: num is a better (smaller) tail for length pos+1
// => This greedy replacement keeps tails as small as possible
// => Smaller tails allow more elements to extend subsequences later
}
}
return size; // => Length of tails = LIS length
}
int main(void) {
int nums[] = {10, 9, 2, 5, 3, 7, 101, 18};
printf("%d\n", lis_length(nums, 8)); // => Output: 4
// => One LIS is [2, 3, 7, 18] or [2, 5, 7, 18] or [2, 3, 7, 101]
return 0;
}Key Takeaway: The patience-sort approach achieves O(n log n) by maintaining a sorted tails array where binary search finds the correct position for each new element. The length of tails equals the LIS length.
Why It Matters: LIS underlies scheduling problems (scheduling non-overlapping tasks on a timeline), the Dilworth theorem (partitioning a poset into chains), and network packet reordering detection. The O(n log n) improvement over naive DP matters when processing large data streams or when called in a loop over many subarrays.
Example 62: Coin Change — Minimum Coins
Given coin denominations and a target amount, find the minimum number of coins needed to make that amount. This classic unbounded knapsack variant demonstrates bottom-up DP with reuse.
#include <stdio.h>
#include <limits.h>
int coin_change(int coins[], int num_coins, int amount) {
int dp[amount + 1];
// => dp[i] = minimum coins to make amount i
// => Initialize all to INT_MAX (impossible) except dp[0]
for (int i = 0; i <= amount; i++) dp[i] = INT_MAX;
dp[0] = 0; // => Base case: 0 coins needed for amount 0
for (int i = 1; i <= amount; i++) { // => Fill dp table from 1 to amount
for (int c = 0; c < num_coins; c++) {
if (coins[c] <= i && dp[i - coins[c]] != INT_MAX) {
// => coin fits: can we do better using this coin?
int candidate = dp[i - coins[c]] + 1;
if (candidate < dp[i]) dp[i] = candidate;
// => +1 for using this coin; dp[i-coin] subproblem already solved
}
}
}
return dp[amount] != INT_MAX ? dp[amount] : -1;
// => Return -1 if amount is unreachable with given coins
}
int main(void) {
int coins1[] = {1, 5, 6, 9};
printf("%d\n", coin_change(coins1, 4, 11)); // => Output: 2 (coins: 5+6)
int coins2[] = {2};
printf("%d\n", coin_change(coins2, 1, 3)); // => Output: -1 (3 unreachable with only even coins)
int coins3[] = {1, 2, 5};
printf("%d\n", coin_change(coins3, 3, 11)); // => Output: 3 (coins: 5+5+1)
return 0;
}Key Takeaway: Coin change DP fills a 1D table where each cell represents the best solution using all previously computed results. The recurrence is dp[i] = min(dp[i - coin] + 1) for each valid coin.
Why It Matters: Coin change generalizes to any problem of combining elements to reach a target with minimum count or cost: cutting stock problems in manufacturing, transaction fee minimization in payment systems, and tile-placement optimization in game engines. The unbounded variant (coins reusable) models real inventory scenarios better than the 0/1 variant.
Graph Algorithms
Example 63: Dijkstra’s Shortest Path
Dijkstra’s algorithm finds the shortest path from a source vertex to all other vertices in a weighted graph with non-negative edge weights. Using a min-heap achieves O((V + E) log V) time.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Source Node<br/>dist=0"] -->|relax neighbors| B["Min-Heap<br/>priority queue"]
B -->|pop smallest dist| C["Current Node"]
C -->|update dist[v] if shorter| D["Neighbor v"]
D -->|push updated dist| B
C -->|all nodes settled| E["Shortest Distances"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#CC78BC,stroke:#000,color:#fff
style D fill:#CA9161,stroke:#000,color:#fff
style E fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
#include <limits.h>
#define MAXV 100
#define INF INT_MAX
// => Simple min-heap for (distance, vertex) pairs
typedef struct { int dist; int node; } HeapItem;
typedef struct {
HeapItem data[MAXV * MAXV];
int size;
} MinHeap;
void heap_push(MinHeap *h, int dist, int node) {
int i = h->size++;
h->data[i] = (HeapItem){dist, node};
while (i > 0) {
int p = (i - 1) / 2;
if (h->data[p].dist <= h->data[i].dist) break;
HeapItem tmp = h->data[p]; h->data[p] = h->data[i]; h->data[i] = tmp;
i = p;
}
}
HeapItem heap_pop(MinHeap *h) {
HeapItem top = h->data[0];
h->data[0] = h->data[--h->size];
int i = 0;
while (1) {
int l = 2*i+1, r = 2*i+2, m = i;
if (l < h->size && h->data[l].dist < h->data[m].dist) m = l;
if (r < h->size && h->data[r].dist < h->data[m].dist) m = r;
if (m == i) break;
HeapItem tmp = h->data[i]; h->data[i] = h->data[m]; h->data[m] = tmp;
i = m;
}
return top;
}
// => Adjacency list representation
typedef struct { int weight; int to; } Edge;
Edge adj[MAXV][MAXV];
int adj_size[MAXV];
void dijkstra(int n, int source, int dist[]) {
// => dist[v] = current best known distance from source to v
for (int i = 0; i < n; i++) dist[i] = INF;
dist[source] = 0; // => Distance from source to itself is 0
MinHeap heap = {.size = 0};
heap_push(&heap, 0, source); // => Min-heap: (distance, node)
while (heap.size > 0) {
HeapItem item = heap_pop(&heap); // => Extract node with minimum distance; O(log V)
int d = item.dist, u = item.node;
if (d > dist[u]) continue; // => Stale entry: a shorter path was already found
for (int i = 0; i < adj_size[u]; i++) {
int w = adj[u][i].weight, v = adj[u][i].to;
int new_dist = dist[u] + w; // => Candidate distance to v via u
if (new_dist < dist[v]) {
dist[v] = new_dist; // => Found shorter path to v
heap_push(&heap, new_dist, v);
}
}
}
}
int main(void) {
// => Map: A=0, B=1, C=2, D=3
int n = 4;
for (int i = 0; i < n; i++) adj_size[i] = 0;
adj[0][adj_size[0]++] = (Edge){1, 1}; // => A -> B weight 1
adj[0][adj_size[0]++] = (Edge){4, 2}; // => A -> C weight 4
adj[1][adj_size[1]++] = (Edge){2, 2}; // => B -> C weight 2
adj[1][adj_size[1]++] = (Edge){5, 3}; // => B -> D weight 5
adj[2][adj_size[2]++] = (Edge){1, 3}; // => C -> D weight 1
int dist[MAXV];
dijkstra(n, 0, dist);
printf("A=%d B=%d C=%d D=%d\n", dist[0], dist[1], dist[2], dist[3]);
// => Output: A=0 B=1 C=3 D=4
return 0;
}Key Takeaway: Dijkstra requires non-negative weights. The min-heap ensures each node is settled at its true shortest distance the first time it’s popped. Stale heap entries are discarded via the if d > dist[u]: continue guard.
Why It Matters: Dijkstra powers GPS navigation (Google Maps, OpenStreetMap routing), network routing protocols (OSPF), and game pathfinding (A* is Dijkstra with a heuristic). At scale, production implementations use Fibonacci heaps for O(E + V log V) amortized complexity, or bidirectional Dijkstra to halve search space on road networks.
Example 64: Bellman-Ford — Shortest Path with Negative Weights
Bellman-Ford handles graphs with negative edge weights and detects negative-weight cycles. It relaxes all edges V-1 times in O(V·E) time.
#include <stdio.h>
#include <limits.h>
#include <string.h>
#define MAXV 100
#define INF INT_MAX
typedef struct { int u, v, w; } Edge;
// => Returns 1 on success, 0 if negative cycle detected
int bellman_ford(int nv, Edge edges[], int ne, int source, int dist[]) {
for (int i = 0; i < nv; i++) dist[i] = INF;
dist[source] = 0; // => Source distance is 0
// => Relax all edges V-1 times
for (int iter = 0; iter < nv - 1; iter++) {
int updated = 0;
for (int e = 0; e < ne; e++) {
int u = edges[e].u, v = edges[e].v, w = edges[e].w;
if (dist[u] != INF && dist[u] + w < dist[v]) {
dist[v] = dist[u] + w; // => Relax edge (u,v,w)
updated = 1;
}
}
if (!updated) break; // => Early exit: no changes, already optimal
}
// => Detect negative-weight cycles
for (int e = 0; e < ne; e++) {
int u = edges[e].u, v = edges[e].v, w = edges[e].w;
if (dist[u] != INF && dist[u] + w < dist[v]) {
return 0; // => Negative cycle detected
}
}
return 1;
}
int main(void) {
// => Map: A=0, B=1, C=2, D=3, E=4
Edge edges[] = {
{0,1,-1}, {0,2,4}, {1,2,3}, {1,3,2}, {1,4,2}, {3,1,1}, {3,2,5}, {4,3,-3}
};
int dist[MAXV];
bellman_ford(5, edges, 8, 0, dist);
printf("A=%d B=%d C=%d D=%d E=%d\n", dist[0], dist[1], dist[2], dist[3], dist[4]);
// => Output: A=0 B=-1 C=2 D=-2 E=1
return 0;
}Key Takeaway: Bellman-Ford handles negative weights but costs O(V·E) vs Dijkstra’s O((V+E) log V). Run the relaxation loop V-1 times, then check once more for negative cycles.
Why It Matters: Bellman-Ford is the foundation of BGP (Border Gateway Protocol), the routing protocol that holds the internet together. It handles arbitrary topologies including those with negative costs (modeled as subsidies or credits). Financial arbitrage detection — finding sequences of currency exchanges that produce profit — maps directly to negative-cycle detection in Bellman-Ford.
Example 65: Floyd-Warshall — All-Pairs Shortest Paths
Floyd-Warshall computes shortest distances between every pair of vertices in O(V³) time and O(V²) space using dynamic programming on intermediate vertices.
#include <stdio.h>
#include <limits.h>
#define MAXV 100
#define INF 999999999
void floyd_warshall(int n, int dist[MAXV][MAXV]) {
// => DP: for each intermediate vertex k, check if going through k improves i->j
for (int k = 0; k < n; k++) { // => k is the intermediate vertex
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
if (dist[i][k] != INF && dist[k][j] != INF &&
dist[i][k] + dist[k][j] < dist[i][j]) {
dist[i][j] = dist[i][k] + dist[k][j];
// => Path i->k->j is shorter than current best i->j
}
}
}
}
}
int main(void) {
int n = 4;
int dist[MAXV][MAXV];
// => Initialize all to INF
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
dist[i][j] = (i == j) ? 0 : INF;
// => edges: (u, v, weight)
int edges[][3] = {{0,1,3},{0,3,7},{1,0,8},{1,2,2},{2,0,5},{2,3,1},{3,0,2}};
for (int e = 0; e < 7; e++) {
int u = edges[e][0], v = edges[e][1], w = edges[e][2];
if (w < dist[u][v]) dist[u][v] = w; // => min handles parallel edges
}
floyd_warshall(n, dist);
for (int i = 0; i < n; i++) {
printf("[");
for (int j = 0; j < n; j++) {
printf("%d%s", dist[i][j], j < n-1 ? ", " : "");
}
printf("]\n");
}
// => Output: [0, 3, 5, 6] [5, 0, 2, 3] [4, 7, 0, 1] [2, 5, 7, 0]
return 0;
}Key Takeaway: Floyd-Warshall’s triple loop processes one intermediate vertex k at a time. The order of loops (k outermost) ensures dist[i][k] and dist[k][j] are already optimal when computing dist[i][j].
Why It Matters: Floyd-Warshall is preferred over running Dijkstra from every vertex when the graph is dense (many edges relative to vertices) because the O(V³) constant factor is smaller. Applications include routing table generation for small networks, transitive closure computation in databases, and network flow pre-computation. It also detects negative cycles: if dist[i][i] < 0 after the algorithm, vertex i lies on a negative cycle.
Example 66: Topological Sort (Kahn’s Algorithm)
Topological sort orders vertices of a directed acyclic graph (DAG) so that for every edge (u, v), u appears before v. Kahn’s BFS-based algorithm runs in O(V + E).
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Compute In-Degrees"] -->|nodes with in-degree=0| B["Queue"]
B -->|process node| C["Add to Result"]
C -->|decrement neighbor in-degrees| D["Check Neighbor In-Degree"]
D -->|in-degree becomes 0| B
C -->|queue empty| E["Result or Cycle Detected"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CA9161,stroke:#000,color:#fff
style E fill:#CC78BC,stroke:#000,color:#fff
#include <stdio.h>
#include <string.h>
#define MAXV 100
#define MAXE 1000
int adj[MAXV][MAXV];
int adj_size[MAXV];
int in_degree[MAXV];
int topological_sort(int n, int result[]) {
int queue[MAXV], front = 0, back = 0;
// => Start with all nodes that have no prerequisites (in-degree 0)
for (int i = 0; i < n; i++) {
if (in_degree[i] == 0) queue[back++] = i;
}
int idx = 0;
while (front < back) {
int u = queue[front++]; // => Process next node with no remaining prerequisites
result[idx++] = u; // => Add to topological order
for (int i = 0; i < adj_size[u]; i++) {
int v = adj[u][i];
in_degree[v]--; // => Remove edge u->v
if (in_degree[v] == 0) {
queue[back++] = v; // => v is now ready
}
}
}
return idx == n ? idx : -1; // => -1 if cycle detected
}
int main(void) {
int n = 6;
memset(adj_size, 0, sizeof(adj_size));
memset(in_degree, 0, sizeof(in_degree));
int edges[][2] = {{5,2},{5,0},{4,0},{4,1},{2,3},{3,1}};
for (int e = 0; e < 6; e++) {
int u = edges[e][0], v = edges[e][1];
adj[u][adj_size[u]++] = v;
in_degree[v]++;
}
int result[MAXV];
topological_sort(n, result);
for (int i = 0; i < n; i++) printf("%d ", result[i]);
printf("\n");
// => Output: 4 5 0 2 3 1 (one valid order)
return 0;
}Key Takeaway: Kahn’s algorithm processes nodes in waves: each wave contains nodes whose all prerequisites are satisfied. If the result contains fewer nodes than the graph has vertices, a cycle exists.
Why It Matters: Build systems (Make, Bazel, Gradle) use topological sort to determine compilation order. Package managers (npm, pip, cargo) use it to install dependencies in the right sequence. Spreadsheet engines execute cell formulas in topological order of their dependency graph. Any workflow engine where tasks depend on other tasks relies on this algorithm.
Example 67: Kruskal’s Minimum Spanning Tree
Kruskal’s algorithm builds a minimum spanning tree (MST) by greedily adding the cheapest edges that don’t form a cycle. It uses Union-Find to detect cycles in O(E log E) time.
#include <stdio.h>
#include <stdlib.h>
typedef struct { int w, u, v; } Edge;
int parent[100], rnk[100];
int find(int x) {
if (parent[x] != x) parent[x] = find(parent[x]); // => Path compression
return parent[x];
}
int unite(int x, int y) {
int px = find(x), py = find(y);
if (px == py) return 0; // => Same component: cycle
if (rnk[px] < rnk[py]) { int t = px; px = py; py = t; }
parent[py] = px; // => Union by rank
if (rnk[px] == rnk[py]) rnk[px]++;
return 1;
}
int cmp_edge(const void *a, const void *b) {
return ((Edge*)a)->w - ((Edge*)b)->w;
}
int kruskal(int n, Edge edges[], int ne) {
for (int i = 0; i < n; i++) { parent[i] = i; rnk[i] = 0; }
qsort(edges, ne, sizeof(Edge), cmp_edge); // => Sort edges by weight ascending; O(E log E)
int mst_weight = 0, count = 0;
for (int i = 0; i < ne && count < n - 1; i++) {
if (unite(edges[i].u, edges[i].v)) { // => Add edge if it doesn't create a cycle
mst_weight += edges[i].w;
count++;
}
}
return mst_weight;
}
int main(void) {
Edge edges[] = {
{4,0,1},{8,0,7},{11,1,7},{7,1,2},{4,7,8},{9,7,6},
{2,8,2},{6,8,6},{7,2,5},{14,2,3},{10,3,4},{9,3,5},{2,5,4},{6,6,5}
};
printf("%d\n", kruskal(9, edges, 14)); // => Output: 37
return 0;
}Key Takeaway: Kruskal’s greedy strategy works because the cut property guarantees that the minimum-weight edge crossing any cut belongs to some MST. Union-Find with path compression and union by rank achieves near-O(1) amortized per operation.
Why It Matters: MST algorithms design physical networks — laying fiber-optic cable, building road networks, designing circuit boards — where you want to connect all nodes with minimum total wire length. Approximation algorithms for NP-hard problems (like the TSP 2-approximation) use MST as a building block. Cluster analysis in machine learning uses MST to find natural groupings without specifying the number of clusters.
Example 68: Prim’s Minimum Spanning Tree
Prim’s algorithm grows the MST from a starting vertex, always adding the cheapest edge connecting the current tree to a new vertex. With a min-heap it runs in O((V + E) log V).
#include <stdio.h>
#include <string.h>
#define MAXV 100
#define MAXE 200
typedef struct { int w, v, from; } HItem;
typedef struct { HItem d[MAXE*2]; int sz; } Heap;
void hpush(Heap *h, HItem it) {
int i = h->sz++;
h->d[i] = it;
while (i > 0) {
int p = (i-1)/2;
if (h->d[p].w <= h->d[i].w) break;
HItem t = h->d[p]; h->d[p] = h->d[i]; h->d[i] = t;
i = p;
}
}
HItem hpop(Heap *h) {
HItem top = h->d[0];
h->d[0] = h->d[--h->sz];
int i = 0;
while (1) {
int l=2*i+1, r=2*i+2, m=i;
if (l < h->sz && h->d[l].w < h->d[m].w) m = l;
if (r < h->sz && h->d[r].w < h->d[m].w) m = r;
if (m == i) break;
HItem t = h->d[i]; h->d[i] = h->d[m]; h->d[m] = t;
i = m;
}
return top;
}
typedef struct { int w, v; } AdjE;
AdjE adj[MAXV][MAXE];
int asz[MAXV];
int prim(int n) {
int visited[MAXV]; memset(visited, 0, sizeof(visited));
Heap heap = {.sz = 0};
hpush(&heap, (HItem){0, 0, -1}); // => Start at vertex 0
int total = 0;
while (heap.sz > 0) {
HItem it = hpop(&heap);
if (visited[it.v]) continue; // => Already in MST
visited[it.v] = 1;
total += it.w;
for (int i = 0; i < asz[it.v]; i++) {
if (!visited[adj[it.v][i].v]) {
hpush(&heap, (HItem){adj[it.v][i].w, adj[it.v][i].v, it.v});
}
}
}
return total;
}
int main(void) {
memset(asz, 0, sizeof(asz));
int raw[][3] = {
{4,0,1},{8,0,7},{11,1,7},{7,1,2},{4,7,8},{9,7,6},
{2,8,2},{6,8,6},{7,2,5},{14,2,3},{10,3,4},{9,3,5},{2,5,4},{6,6,5}
};
for (int i = 0; i < 14; i++) {
int w = raw[i][0], u = raw[i][1], v = raw[i][2];
adj[u][asz[u]++] = (AdjE){w, v}; // => Undirected: add both directions
adj[v][asz[v]++] = (AdjE){w, u};
}
printf("%d\n", prim(9)); // => Output: 37
return 0;
}Key Takeaway: Prim’s and Kruskal’s always produce the same MST total weight (though edge sets may differ on ties). Prim’s is better for dense graphs (many edges), while Kruskal’s suits sparse graphs because sorting E edges is cheaper when E is small.
Why It Matters: Prim’s algorithm is the basis for network design tools in CAD software for VLSI circuit layout. When designing sensor networks or wireless mesh networks, Prim’s helps find the minimum-cost spanning connectivity. The lazy deletion technique (leaving stale entries in the heap) is a general pattern reused in Dijkstra and other priority-queue algorithms.
Advanced Data Structures
Example 69: Trie — Prefix Tree
A trie stores strings character-by-character, enabling O(m) insert, search, and prefix queries where m is the word length — far faster than storing strings in a hash set for prefix queries.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph TD
R["Root"] -->|a| A["a"]
R -->|b| B["b"]
A -->|p| AP["p"]
AP -->|p| APP["p*<br/>app"]
APP -->|l| APPL["l"]
APPL -->|e| APPLE["e*<br/>apple"]
B -->|a| BA["a"]
BA -->|t| BAT["t*<br/>bat"]
style R fill:#0173B2,stroke:#000,color:#fff
style APP fill:#029E73,stroke:#000,color:#fff
style APPLE fill:#029E73,stroke:#000,color:#fff
style BAT fill:#029E73,stroke:#000,color:#fff
style A fill:#DE8F05,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style AP fill:#CA9161,stroke:#000,color:#fff
style APPL fill:#CA9161,stroke:#000,color:#fff
style BA fill:#CA9161,stroke:#000,color:#fff
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ALPHABET 26
typedef struct TrieNode {
struct TrieNode *children[ALPHABET]; // => char -> TrieNode mapping (a=0..z=25)
int is_end; // => 1 if a word ends at this node
} TrieNode;
TrieNode *new_node(void) {
TrieNode *n = calloc(1, sizeof(TrieNode)); // => calloc zeros all fields
return n; // => Root node represents empty prefix
}
void trie_insert(TrieNode *root, const char *word) {
TrieNode *node = root;
for (int i = 0; word[i]; i++) {
int idx = word[i] - 'a';
if (!node->children[idx]) {
node->children[idx] = new_node(); // => Create node for new character
}
node = node->children[idx]; // => Traverse to child
}
node->is_end = 1; // => Mark end of word
}
int trie_search(TrieNode *root, const char *word) {
TrieNode *node = root;
for (int i = 0; word[i]; i++) {
int idx = word[i] - 'a';
if (!node->children[idx]) return 0; // => Character not found -> word absent
node = node->children[idx];
}
return node->is_end; // => True only if exact word was inserted
}
int trie_starts_with(TrieNode *root, const char *prefix) {
TrieNode *node = root;
for (int i = 0; prefix[i]; i++) {
int idx = prefix[i] - 'a';
if (!node->children[idx]) return 0; // => Prefix not in trie
node = node->children[idx];
}
return 1; // => Reached end of prefix -> prefix exists
}
int main(void) {
TrieNode *root = new_node();
const char *words[] = {"apple", "app", "bat", "ball"};
for (int i = 0; i < 4; i++) trie_insert(root, words[i]);
printf("%d\n", trie_search(root, "apple")); // => Output: 1 (True)
printf("%d\n", trie_search(root, "app")); // => Output: 1 (True)
printf("%d\n", trie_search(root, "ap")); // => Output: 0 (False)
printf("%d\n", trie_starts_with(root, "ap")); // => Output: 1 (True)
printf("%d\n", trie_starts_with(root, "xyz")); // => Output: 0 (False)
return 0;
}Key Takeaway: A trie’s power lies in prefix queries: finding all words with a given prefix requires only traversing the prefix path (O(m)) then enumerating the subtree. This makes it dramatically faster than scanning a hash set for autocomplete.
Why It Matters: Search engines, IDE autocomplete, spell checkers, IP routing (longest-prefix matching), and DNS resolution all use tries or compressed-trie variants (PATRICIA tries, radix trees). Mobile keyboards that suggest next words use tries combined with frequency counts. Production tries compress single-child chains into single nodes (compact/radix trie) to reduce memory from O(alphabet × total_chars) to O(total_chars).
Example 70: Segment Tree — Range Sum and Point Update
A segment tree supports range queries (sum, min, max) and point updates in O(log n) time, compared to O(n) for a plain array. It stores aggregate values for each tree node covering a range.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph TD
R["[0..7] sum=36"] -->|left| L["[0..3] sum=10"]
R -->|right| RR["[4..7] sum=26"]
L -->|left| LL["[0..1] sum=3"]
L -->|right| LR["[2..3] sum=7"]
RR -->|left| RL["[4..5] sum=11"]
RR -->|right| RRR["[6..7] sum=15"]
style R fill:#0173B2,stroke:#000,color:#fff
style L fill:#DE8F05,stroke:#000,color:#fff
style RR fill:#DE8F05,stroke:#000,color:#fff
style LL fill:#029E73,stroke:#000,color:#fff
style LR fill:#029E73,stroke:#000,color:#fff
style RL fill:#029E73,stroke:#000,color:#fff
style RRR fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
#define MAXN 100
int tree[4 * MAXN]; // => Allocate 4n nodes (safe upper bound)
int n;
void build(int data[], int node, int start, int end) {
if (start == end) {
tree[node] = data[start]; // => Leaf node holds the array value
return;
}
int mid = (start + end) / 2;
build(data, 2 * node, start, mid); // => Build left child
build(data, 2 * node + 1, mid + 1, end); // => Build right child
tree[node] = tree[2 * node] + tree[2 * node + 1]; // => Internal node = sum of children
}
void update(int node, int start, int end, int idx, int val) {
if (start == end) {
tree[node] = val; // => Found the leaf: update it
return;
}
int mid = (start + end) / 2;
if (idx <= mid) update(2 * node, start, mid, idx, val);
else update(2 * node + 1, mid + 1, end, idx, val);
tree[node] = tree[2 * node] + tree[2 * node + 1]; // => Recompute sum
}
int query(int node, int start, int end, int l, int r) {
if (r < start || end < l) return 0; // => No overlap: return identity (0 for sum)
if (l <= start && end <= r) return tree[node]; // => Total overlap
int mid = (start + end) / 2;
return query(2 * node, start, mid, l, r) +
query(2 * node + 1, mid + 1, end, l, r); // => Partial overlap
}
int main(void) {
int data[] = {1, 2, 3, 4, 5, 6, 7, 8};
n = 8;
build(data, 1, 0, n - 1);
printf("%d\n", query(1, 0, 7, 1, 5)); // => Output: 20 (2+3+4+5+6)
update(1, 0, 7, 3, 10); // => Update index 3 from 4 to 10
printf("%d\n", query(1, 0, 7, 1, 5)); // => Output: 26 (2+3+10+5+6)
return 0;
}Key Takeaway: Segment trees answer range queries and accept point updates in O(log n) each. The tree has O(n) nodes, with each internal node storing the aggregate of its range.
Why It Matters: Segment trees are fundamental in competitive programming and production systems that require dynamic range queries: financial systems computing running totals over time windows, time-series databases (InfluxDB, Prometheus) aggregating metrics, 2D graphics engines computing bounding boxes, and game engines implementing efficient collision queries on spatial ranges.
Example 71: Union-Find (Disjoint Set Union)
Union-Find maintains a collection of disjoint sets with near-O(1) amortized union and find operations using path compression and union by rank. It answers “are these two elements in the same set?” efficiently.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph TD
A["Initial: each node<br/>is its own root"] -->|union operations| B["Merged Components<br/>with rank-based trees"]
B -->|find with path compression| C["Flat Tree<br/>all point to root"]
C -->|O(α(n)) per operation| D["Near-Constant Time"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
#include <stdio.h>
#define MAXN 100
int parent[MAXN], rnk[MAXN], components;
void uf_init(int n) {
for (int i = 0; i < n; i++) { parent[i] = i; rnk[i] = 0; }
components = n; // => Track number of disjoint sets
}
int uf_find(int x) {
if (parent[x] != x) parent[x] = uf_find(parent[x]);
// => Path compression: make x point directly to root
return parent[x];
}
int uf_union(int x, int y) {
int rx = uf_find(x), ry = uf_find(y);
if (rx == ry) return 0; // => Already in same set
if (rnk[rx] < rnk[ry]) { int t = rx; rx = ry; ry = t; }
parent[ry] = rx; // => Merge: attach smaller under larger
if (rnk[rx] == rnk[ry]) rnk[rx]++;
components--;
return 1;
}
int uf_connected(int x, int y) { return uf_find(x) == uf_find(y); }
int main(void) {
uf_init(6); // => 6 components: {0},{1},{2},{3},{4},{5}
uf_union(0, 1); // => {0,1},{2},{3},{4},{5}
uf_union(1, 2); // => {0,1,2},{3},{4},{5}
uf_union(3, 4); // => {0,1,2},{3,4},{5}
printf("%d\n", uf_connected(0, 2)); // => Output: 1 (True)
printf("%d\n", uf_connected(0, 3)); // => Output: 0 (False)
printf("%d\n", components); // => Output: 3
return 0;
}Key Takeaway: Union-Find with path compression and union by rank achieves O(α(n)) amortized per operation, where α is the inverse Ackermann function — effectively constant for all practical input sizes (α(n) ≤ 4 for n < 10^600).
Why It Matters: Union-Find is the core data structure in Kruskal’s MST, connected-components labeling in image processing, network connectivity analysis, and online algorithms for dynamic graph connectivity. Percolation simulations (modeling fluid flow, epidemic spread) and social network friend-group detection both reduce to Union-Find operations on large datasets processed in real time.
Advanced Backtracking
Example 72: N-Queens Problem
Place N non-attacking queens on an N×N chessboard. Backtracking prunes branches as soon as a placement conflicts, making it far faster than brute force over all N^N possible placements.
#include <stdio.h>
#include <string.h>
#define MAXN 20
int cols[MAXN], diag1[2*MAXN], diag2[2*MAXN]; // => Track occupied columns and diagonals
int solutions[100][MAXN];
int sol_count;
void backtrack(int n, int row, int placement[]) {
if (row == n) {
memcpy(solutions[sol_count], placement, n * sizeof(int));
sol_count++; // => All n queens placed; record this solution
return;
}
for (int col = 0; col < n; col++) {
int d1 = row - col + n, d2 = row + col; // => Offset d1 to avoid negative index
if (cols[col] || diag1[d1] || diag2[d2]) continue; // => Prune: conflict
cols[col] = diag1[d1] = diag2[d2] = 1;
placement[row] = col;
backtrack(n, row + 1, placement);
cols[col] = diag1[d1] = diag2[d2] = 0; // => Undo placement (backtrack)
}
}
int main(void) {
int n = 4;
int placement[MAXN];
sol_count = 0;
memset(cols, 0, sizeof(cols));
memset(diag1, 0, sizeof(diag1));
memset(diag2, 0, sizeof(diag2));
backtrack(n, 0, placement);
printf("%d\n", sol_count); // => Output: 2
for (int s = 0; s < sol_count; s++) {
printf("[");
for (int i = 0; i < n; i++) printf("%d%s", solutions[s][i], i<n-1?", ":"");
printf("]\n");
}
// => Output: [1, 3, 0, 2] and [2, 0, 3, 1]
return 0;
}Key Takeaway: N-Queens uses three sets to check queen conflicts in O(1) instead of scanning the board. The key insight is that diagonals have constant row - col and anti-diagonals have constant row + col.
Why It Matters: N-Queens is the canonical constraint-satisfaction problem. Techniques from its solution — constraint propagation, arc consistency, forward checking — underpin production SAT solvers (used in hardware verification), CSP solvers (used in scheduling and planning), and logic programming languages like Prolog. The chess problem is small-scale; the same algorithmic pattern solves protein folding, timetabling, and register allocation in compilers.
Example 73: Sudoku Solver
Sudoku solving by backtracking demonstrates how to combine constraint propagation (eliminate candidates) with backtracking (try remaining candidates) to solve combinatorial puzzles efficiently.
#include <stdio.h>
int board[9][9];
int is_valid(int row, int col, int num) {
for (int i = 0; i < 9; i++) {
if (board[row][i] == num) return 0; // => num already in this row
if (board[i][col] == num) return 0; // => num already in this column
}
int br = 3 * (row / 3), bc = 3 * (col / 3);
for (int r = br; r < br + 3; r++)
for (int c = bc; c < bc + 3; c++)
if (board[r][c] == num) return 0; // => num already in 3x3 box
return 1;
}
int backtrack(void) {
for (int row = 0; row < 9; row++) {
for (int col = 0; col < 9; col++) {
if (board[row][col] == 0) { // => Found an empty cell
for (int num = 1; num <= 9; num++) {
if (is_valid(row, col, num)) {
board[row][col] = num;
if (backtrack()) return 1; // => Solution found
board[row][col] = 0; // => Backtrack
}
}
return 0; // => No valid digit works here
}
}
}
return 1; // => All cells filled
}
int main(void) {
int puzzle[9][9] = {
{5,3,0,0,7,0,0,0,0}, {6,0,0,1,9,5,0,0,0},
{0,9,8,0,0,0,0,6,0}, {8,0,0,0,6,0,0,0,3},
{4,0,0,8,0,3,0,0,1}, {7,0,0,0,2,0,0,0,6},
{0,6,0,0,0,0,2,8,0}, {0,0,0,4,1,9,0,0,5},
{0,0,0,0,8,0,0,7,9}
};
for (int i = 0; i < 9; i++)
for (int j = 0; j < 9; j++) board[i][j] = puzzle[i][j];
backtrack();
for (int j = 0; j < 9; j++) printf("%d ", board[0][j]);
printf("\n"); // => Output: 5 3 4 6 7 8 9 1 2
for (int j = 0; j < 9; j++) printf("%d ", board[1][j]);
printf("\n"); // => Output: 6 7 2 1 9 5 3 4 8
return 0;
}Key Takeaway: The solver tries each digit in each empty cell, validates placement against row/column/box constraints, and backtracks immediately when no digit works. This prunes the search space dramatically compared to trying all 9^81 possible boards.
Why It Matters: Sudoku solving generalizes to exact-cover problems, which Donald Knuth’s Algorithm X (implemented via Dancing Links) solves optimally. Exact-cover appears in tiling problems, packing problems, and combinatorial design. Industrial constraint solvers (IBM CPLEX, Google OR-Tools) use similar search-with-constraint-propagation strategies to solve scheduling, routing, and resource-allocation problems with thousands of variables.
String Algorithms
Example 74: KMP String Search
The Knuth-Morris-Pratt (KMP) algorithm finds all occurrences of a pattern in a text in O(n + m) time by precomputing a failure function that avoids redundant character comparisons.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Pattern"] -->|build failure function| B["LPS Array<br/>longest proper prefix-suffix"]
C["Text"] -->|slide with LPS on mismatch| D["KMP Search"]
B --> D
D -->|match found| E["Record position"]
D -->|shift by LPS[j-1]| D
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#CC78BC,stroke:#000,color:#fff
style D fill:#CA9161,stroke:#000,color:#fff
style E fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
#include <string.h>
void kmp_search(const char *text, const char *pattern) {
int n = strlen(text), m = strlen(pattern);
if (m == 0) return;
// => Build the LPS array
int lps[m];
lps[0] = 0;
int length = 0, i = 1;
while (i < m) {
if (pattern[i] == pattern[length]) {
lps[i++] = ++length; // => Found a prefix-suffix
} else if (length > 0) {
length = lps[length - 1]; // => Fall back using lps
} else {
lps[i++] = 0;
}
}
// => KMP search
i = 0; int j = 0;
while (i < n) {
if (text[i] == pattern[j]) { i++; j++; }
if (j == m) {
printf("%d ", i - j); // => Match found at index i-j
j = lps[j - 1];
} else if (i < n && text[i] != pattern[j]) {
if (j > 0) j = lps[j - 1]; // => Shift using LPS
else i++;
}
}
printf("\n");
}
int main(void) {
kmp_search("AABAACAADAABAABA", "AABA"); // => Output: 0 9 12
kmp_search("aababab", "abab"); // => Output: 1 3
return 0;
}Key Takeaway: KMP’s failure function (LPS array) encodes the longest proper prefix of the pattern that matches a suffix, allowing the algorithm to resume matching from a non-zero position after a mismatch instead of restarting from the beginning.
Why It Matters: KMP achieves O(n + m) compared to the naive O(n·m) approach, which matters when searching terabytes of log data for error patterns, scanning genome sequences for gene markers, or implementing search in text editors. The LPS construction generalizes to the Z-function and Aho-Corasick automaton, which simultaneously searches for thousands of patterns in a single O(n + Σm) pass — essential in antivirus engines and intrusion detection systems.
Example 75: Rabin-Karp Rolling Hash
Rabin-Karp uses polynomial rolling hashes to find pattern occurrences in O(n + m) expected time. It is especially valuable for multi-pattern search and plagiarism detection.
#include <stdio.h>
#include <string.h>
#define BASE 256
#define MOD 1000000007L
void rabin_karp(const char *text, const char *pattern) {
int n = strlen(text), m = strlen(pattern);
if (m > n) return;
// => Precompute BASE^(m-1) mod MOD
long high_pow = 1;
for (int i = 0; i < m - 1; i++) high_pow = (high_pow * BASE) % MOD;
long pat_hash = 0, win_hash = 0;
for (int i = 0; i < m; i++) {
pat_hash = (pat_hash * BASE + pattern[i]) % MOD;
win_hash = (win_hash * BASE + text[i]) % MOD;
}
for (int i = 0; i <= n - m; i++) {
if (win_hash == pat_hash) {
if (strncmp(text + i, pattern, m) == 0) { // => Verify to guard against collisions
printf("%d ", i);
}
}
if (i < n - m) {
// => Roll hash: remove leading char, add trailing char; O(1) per step
win_hash = (win_hash - text[i] * high_pow % MOD + MOD) % MOD;
win_hash = (win_hash * BASE + text[i + m]) % MOD;
}
}
printf("\n");
}
int main(void) {
rabin_karp("GEEKS FOR GEEKS", "GEEKS"); // => Output: 0 10
rabin_karp("aaaa", "aa"); // => Output: 0 1 2
return 0;
}Key Takeaway: Rolling hash updates in O(1) by subtracting the contribution of the outgoing character and adding the incoming one, giving O(n + m) expected time. Always verify with string comparison when hashes match to handle collisions.
Why It Matters: Rabin-Karp’s rolling-hash technique extends naturally to multi-pattern search (compute one hash, compare against a hash set of all patterns) and 2D pattern matching (hashing rows then columns). Plagiarism detection services like Turnitin hash document shingles (overlapping k-grams) and compare across a database — a direct application of Rabin-Karp’s rolling hash at massive scale.
Bit Manipulation
Example 76: Bit Manipulation Fundamentals
Bitwise operations manipulate individual bits directly, enabling O(1) solutions to problems that otherwise require O(n) loops. Python integers have arbitrary precision, so bitmasks work on any size.
#include <stdio.h>
int popcount(unsigned int x) {
int count = 0;
while (x) {
x &= x - 1; // => Clear the lowest set bit
count++;
}
return count;
}
int main(void) {
unsigned int n = 0xB6; // => n = 182 = 0b10110110
// => Check if k-th bit is set (0-indexed from right)
int k = 4;
int is_set = (n & (1 << k)) != 0; // => 1 << 4 = 0b10000; AND extracts bit k
printf("%d\n", is_set); // => Output: 1 (True, bit 4 is 1)
// => Set the k-th bit
unsigned int n_set = n | (1 << k); // => OR sets bit k to 1
// => 0b10110110 | 0b00010000 = 0b10110110 (already set; unchanged)
// => Clear the k-th bit
unsigned int n_cleared = n & ~(1 << k); // => ~(1 << k) has all bits 1 except bit k
// => 0b10110110 & 0b11101111 = 0b10100110 = 166
// => Toggle the k-th bit
unsigned int n_toggled = n ^ (1 << k); // => XOR flips bit k
// => Count set bits (popcount / Hamming weight)
printf("%d\n", popcount(182)); // => Output: 5
// => Isolate lowest set bit (used in Fenwick trees)
int x = 12; // => x = 0b1100
int lsb = x & (-x); // => -x is two's complement
printf("%d\n", lsb); // => Output: 4 (0b0100)
(void)n_set; (void)n_cleared; (void)n_toggled;
return 0;
}Key Takeaway: Bit manipulation replaces loops with constant-time arithmetic. The expression x & (x-1) clears the lowest set bit, and x & (-x) isolates it — two patterns that appear throughout low-level algorithms.
Why It Matters: Bit manipulation is pervasive in systems programming: operating systems use bitmasks for permission flags, network code manipulates IP headers at the bit level, graphics engines pack color channels into 32-bit integers, and database engines use bitmap indices for fast set intersection. Python’s arbitrary-precision integers make bitmask-based set operations feasible for representing sets of up to millions of elements without any additional library.
Example 77: XOR Tricks — Finding the Unique Element
XOR’s self-inverse property (a ^ a = 0, a ^ 0 = a) enables elegant solutions to problems about finding unique elements or pairs that would otherwise require hash maps or sorting.
#include <stdio.h>
int find_single(int nums[], int n) {
int result = 0;
for (int i = 0; i < n; i++) {
result ^= nums[i]; // => XOR cancels pairs: a^a=0; unique survives
}
return result;
}
void find_two_singles(int nums[], int n, int *a, int *b) {
int xor_all = 0;
for (int i = 0; i < n; i++) xor_all ^= nums[i]; // => xor_all = a ^ b
int diff_bit = xor_all & (-xor_all); // => Isolate lowest differing bit
*a = 0;
for (int i = 0; i < n; i++) {
if (nums[i] & diff_bit) *a ^= nums[i]; // => XOR group with diff_bit set
}
*b = xor_all ^ *a; // => b = (a^b) ^ a = b
}
int main(void) {
int nums1[] = {4, 1, 2, 1, 2};
printf("%d\n", find_single(nums1, 5)); // => Output: 4
int nums2[] = {1, 2, 3, 2, 3, 4};
int a, b;
find_two_singles(nums2, 6, &a, &b);
printf("%d %d\n", a, b); // => Output: 1 4 (in some order)
return 0;
}Key Takeaway: XOR-based tricks work because XOR is commutative, associative, self-inverse (a^a=0), and identity-preserving (a^0=a). These four properties let pairs cancel regardless of their order in the array.
Why It Matters: XOR tricks appear in cryptography (one-time pad encryption uses XOR), hardware parity checking (RAID parity blocks use XOR across disk stripes for error recovery), checksum computation, and interview problems. The two-singles trick generalizes to finding elements appearing odd numbers of times using multi-bit grouping, a pattern used in error-correcting codes.
Monotonic Structures
Example 78: Monotonic Stack — Next Greater Element
A monotonic stack maintains elements in monotone order (increasing or decreasing). It solves “next greater element” and similar problems in O(n) time instead of the naive O(n²).
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Input Array"] -->|push to stack| B["Monotonic Stack<br/>decreasing order"]
B -->|current > stack top| C["Pop: stack top's NGE<br/>is current element"]
B -->|current <= stack top| D["Push current"]
C -->|all elements popped| E["NGE Result Array"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#029E73,stroke:#000,color:#fff
style D fill:#CA9161,stroke:#000,color:#fff
style E fill:#CC78BC,stroke:#000,color:#fff
#include <stdio.h>
void next_greater_element(int nums[], int n, int result[]) {
int stack[n], top = -1; // => Stack holds indices
for (int i = 0; i < n; i++) result[i] = -1; // => Default: no greater element
for (int i = 0; i < n; i++) {
while (top >= 0 && nums[i] > nums[stack[top]]) {
result[stack[top--]] = nums[i]; // => idx's next greater element is nums[i]
}
stack[++top] = i; // => Push current index
}
}
int main(void) {
int nums1[] = {4, 1, 2, 3};
int result1[4];
next_greater_element(nums1, 4, result1);
for (int i = 0; i < 4; i++) printf("%d ", result1[i]);
printf("\n"); // => Output: -1 2 3 -1
int nums2[] = {2, 1, 2, 4, 3};
int result2[5];
next_greater_element(nums2, 5, result2);
for (int i = 0; i < 5; i++) printf("%d ", result2[i]);
printf("\n"); // => Output: 4 2 4 -1 -1
return 0;
}Key Takeaway: Each element is pushed and popped at most once, giving O(n) total time. The stack stores indices of elements waiting for their “next greater” answer — whenever a larger element arrives, it settles all pending elements smaller than it.
Why It Matters: Monotonic stacks solve a family of problems in O(n): largest rectangle in a histogram (Leetcode 84, used in image processing), trapping rainwater (modeling terrain drainage), daily temperatures, stock span, and visibility problems. These appear frequently in technical interviews and in terrain analysis, financial charting libraries, and compiler optimization (constant folding over expression trees).
Example 79: Monotonic Deque — Sliding Window Maximum
A monotonic deque (double-ended queue) maintains a decreasing window of candidates, giving O(n) sliding-window maximum instead of O(n·k) with a naive approach.
#include <stdio.h>
void sliding_window_max(int nums[], int n, int k, int result[], int *rsize) {
int dq[n], front = 0, back = 0; // => Stores indices; values in dq are decreasing
*rsize = 0;
for (int i = 0; i < n; i++) {
// => Remove indices that are out of the current window
while (front < back && dq[front] < i - k + 1) front++;
// => Maintain decreasing order: remove smaller values from back
while (front < back && nums[dq[back - 1]] < nums[i]) back--;
dq[back++] = i; // => Add current index
if (i >= k - 1) {
result[(*rsize)++] = nums[dq[front]]; // => Front is current window max
}
}
}
int main(void) {
int nums[] = {1, 3, -1, -3, 5, 3, 6, 7};
int result[6];
int rsize;
sliding_window_max(nums, 8, 3, result, &rsize);
for (int i = 0; i < rsize; i++) printf("%d ", result[i]);
printf("\n"); // => Output: 3 3 5 5 6 7
return 0;
}Key Takeaway: The monotonic deque ensures that the maximum of the current window is always at the front in O(1). Each element is enqueued and dequeued at most once, giving O(n) overall instead of O(n·k).
Why It Matters: Sliding-window maximum appears in computational finance (rolling maximum for drawdown analysis), image processing (morphological dilation), real-time signal processing (envelope detection), and network congestion control algorithms. The deque-based O(n) solution replaces a segment tree when the window moves only forward — a common constraint that makes the deque the right tool.
Advanced Sorting
Example 80: Radix Sort — Non-Comparative Linear Sort
Radix sort sorts integers by processing individual digits from least significant to most significant. It achieves O(d·(n + b)) time where d is the number of digits and b is the base — faster than O(n log n) comparison sorts for bounded integers.
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Input Array"] -->|sort by digit 0 LSD| B["Pass 1<br/>sort by ones"]
B -->|sort by digit 1| C["Pass 2<br/>sort by tens"]
C -->|sort by digit 2| D["Pass 3<br/>sort by hundreds"]
D -->|all digits processed| E["Sorted Array"]
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#CA9161,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
style E fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
#include <string.h>
void counting_sort_by_digit(int arr[], int n, int exp) {
int output[n], count[10];
memset(count, 0, sizeof(count));
for (int i = 0; i < n; i++) count[(arr[i] / exp) % 10]++;
for (int i = 1; i < 10; i++) count[i] += count[i - 1]; // => Cumulative count
for (int i = n - 1; i >= 0; i--) { // => Backwards for stable sort
int digit = (arr[i] / exp) % 10;
output[--count[digit]] = arr[i];
}
memcpy(arr, output, n * sizeof(int));
}
void radix_sort(int arr[], int n) {
int max_val = arr[0];
for (int i = 1; i < n; i++) if (arr[i] > max_val) max_val = arr[i];
for (int exp = 1; max_val / exp > 0; exp *= 10) {
counting_sort_by_digit(arr, n, exp); // => Each pass: stable sort by one digit
}
}
int main(void) {
int nums[] = {170, 45, 75, 90, 802, 24, 2, 66};
radix_sort(nums, 8);
for (int i = 0; i < 8; i++) printf("%d ", nums[i]);
printf("\n"); // => Output: 2 24 45 66 75 90 170 802
return 0;
}Key Takeaway: Radix sort beats comparison-based sorts for integers with bounded values because each pass runs in O(n + b) using counting sort. Stability is critical — processing from LSD to MSD with a stable inner sort guarantees correct final order.
Why It Matters: Radix sort powers sorting in systems with bounded key spaces: sorting IP addresses (32-bit integers), phone numbers, ZIP codes, or fixed-length hashed keys. Database engines use radix sort internally for integer column sorting. GPU implementations of radix sort achieve extremely high throughput for parallel data processing because each pass is embarrassingly parallelizable across independent counting stages.
Amortized Analysis
Example 81: Amortized O(1) — Dynamic Array Doubling
A dynamic array amortizes the cost of resizing across many appends. Though individual resize operations are O(n), the amortized cost per append is O(1) using the accounting method.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int *data;
int capacity;
int size;
} DynamicArray;
DynamicArray da_new(void) {
DynamicArray da;
da.data = malloc(sizeof(int)); // => Internal fixed-size array (capacity=1)
da.capacity = 1;
da.size = 0;
return da;
}
void da_append(DynamicArray *da, int val) {
if (da->size == da->capacity) {
da->capacity *= 2; // => Double the capacity: 1->2->4->8->...
int *new_data = malloc(da->capacity * sizeof(int));
for (int i = 0; i < da->size; i++) new_data[i] = da->data[i]; // => Copy all
free(da->data);
da->data = new_data;
}
da->data[da->size++] = val; // => Insert at next free position; O(1)
}
int da_get(DynamicArray *da, int idx) { return da->data[idx]; }
double da_load_factor(DynamicArray *da) {
return (double)da->size / da->capacity; // => Always between 0.5 and 1.0 after any append
}
int main(void) {
DynamicArray da = da_new();
for (int i = 0; i < 10; i++) da_append(&da, i);
printf("%d\n", da.size); // => Output: 10
printf("%d\n", da_get(&da, 5)); // => Output: 5
printf("%d\n", da.capacity); // => Output: 16
printf("%.3f\n", da_load_factor(&da)); // => Output: 0.625
free(da.data);
return 0;
}Key Takeaway: The doubling strategy guarantees that at most half the capacity is wasted and that the total copy work across all appends is bounded by 2n. This gives O(1) amortized append even though individual resizes are O(n).
Why It Matters: Python’s list, Java’s ArrayList, C++’s std::vector, and Go’s slices all use this doubling strategy internally. Understanding amortized analysis explains why list.append in Python is safe to use in tight loops, and why pre-allocating with [None] * n is faster than repeated appends when n is known in advance (avoiding the log n resize events).
Example 82: Amortized O(1) — Splay Tree Intuition via Two-Stack Queue
Two stacks can simulate a queue with O(1) amortized enqueue and dequeue, demonstrating amortized analysis: each element is pushed and popped exactly once across all operations, giving O(1) amortized per operation.
#include <stdio.h>
#define MAXN 1000
typedef struct {
int inbox[MAXN], outbox[MAXN];
int in_top, out_top; // => Stack tops (-1 means empty)
} TwoStackQueue;
void tsq_init(TwoStackQueue *q) { q->in_top = q->out_top = -1; }
void tsq_enqueue(TwoStackQueue *q, int val) {
q->inbox[++q->in_top] = val; // => Always O(1): push to inbox
}
int tsq_dequeue(TwoStackQueue *q) {
if (q->out_top < 0) {
while (q->in_top >= 0) {
q->outbox[++q->out_top] = q->inbox[q->in_top--];
// => Transfer: reverse inbox into outbox; each element transferred at most once
}
}
return q->outbox[q->out_top--]; // => O(1): pop from outbox
}
int main(void) {
TwoStackQueue q;
tsq_init(&q);
for (int i = 0; i < 5; i++) tsq_enqueue(&q, i);
printf("%d\n", tsq_dequeue(&q)); // => Output: 0
printf("%d\n", tsq_dequeue(&q)); // => Output: 1
tsq_enqueue(&q, 5);
printf("%d\n", tsq_dequeue(&q)); // => Output: 2
printf("%d\n", tsq_dequeue(&q)); // => Output: 3
printf("%d\n", tsq_dequeue(&q)); // => Output: 4
printf("%d\n", tsq_dequeue(&q)); // => Output: 5
return 0;
}Key Takeaway: Amortized O(1) means the total work across n operations is O(n), even if individual operations occasionally cost more. The two-stack queue achieves this because each element crosses from inbox to outbox exactly once.
Why It Matters: The two-stack queue is used in functional programming languages (where lists are immutable and random access is expensive) to implement efficient queues. Amortized analysis is the mathematical tool that justifies why Python’s list, hash tables, splay trees, and union-find are fast in practice despite occasional expensive operations — understanding it prevents premature optimization and guides correct data structure selection in production systems.
Advanced Patterns
Example 83: Fenwick Tree (Binary Indexed Tree) — Prefix Sums
A Fenwick tree (BIT) supports prefix-sum queries and point updates in O(log n) time with only O(n) space and very simple code. It is more cache-friendly than a segment tree for this specific use case.
#include <stdio.h>
#include <string.h>
#define MAXN 200
int tree[MAXN];
int n;
void ft_update(int i, int delta) {
for (; i <= n; i += i & (-i)) // => Move to next responsible ancestor
tree[i] += delta;
}
int ft_prefix_sum(int i) {
int s = 0;
for (; i > 0; i -= i & (-i)) // => Move to parent: remove lowest set bit
s += tree[i];
return s;
}
int ft_range_sum(int l, int r) {
return ft_prefix_sum(r) - ft_prefix_sum(l - 1);
}
int main(void) {
int data[] = {3, 2, -1, 6, 5, 4, -3, 3, 7, 2, 3};
n = 11;
memset(tree, 0, sizeof(tree));
for (int i = 0; i < n; i++) ft_update(i + 1, data[i]); // => Build tree (1-indexed)
printf("%d\n", ft_prefix_sum(6)); // => Output: 19 (3+2-1+6+5+4)
printf("%d\n", ft_range_sum(2, 6)); // => Output: 16 (2-1+6+5+4)
ft_update(3, 10); // => Change position 3: add 10
printf("%d\n", ft_range_sum(2, 6)); // => Output: 26 (16 + 10)
return 0;
}Key Takeaway: Fenwick trees use the binary representation of indices to determine which tree nodes cover which ranges. The i += i & (-i) update traversal and i -= i & (-i) query traversal each visit O(log n) nodes.
Why It Matters: Fenwick trees are 2-3x faster in practice than segment trees for prefix-sum problems due to simpler code and better cache locality. They power order-statistic operations in competitive programming, frequency counting in streaming data, and inversion counting in sorting algorithms. Database systems use BIT-like structures for maintaining running aggregates over append-only event logs.
Example 84: Flood Fill — Connected Component Labeling
Flood fill visits all cells reachable from a starting cell via BFS or DFS. It is the algorithm behind paint-bucket tools, island counting, and connected-component labeling.
#include <stdio.h>
#define MAXR 100
#define MAXC 100
int grid[MAXR][MAXC];
int rows, cols;
void flood_fill(int sr, int sc, int new_color) {
int old_color = grid[sr][sc];
if (old_color == new_color) return; // => Prevents infinite loop
int qr[MAXR*MAXC], qc[MAXR*MAXC]; // => BFS queue
int front = 0, back = 0;
qr[back] = sr; qc[back++] = sc;
grid[sr][sc] = new_color; // => Mark start immediately
int dr[] = {0,0,1,-1}, dc[] = {1,-1,0,0}; // => 4-directional
while (front < back) {
int r = qr[front], c = qc[front++];
for (int d = 0; d < 4; d++) {
int nr = r + dr[d], nc = c + dc[d];
if (nr >= 0 && nr < rows && nc >= 0 && nc < cols && grid[nr][nc] == old_color) {
grid[nr][nc] = new_color; // => Mark before enqueuing
qr[back] = nr; qc[back++] = nc;
}
}
}
}
int main(void) {
rows = 3; cols = 3;
int init[3][3] = {{1,1,1},{1,1,0},{1,0,1}};
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++) grid[i][j] = init[i][j];
flood_fill(1, 1, 2);
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) printf("%d ", grid[i][j]);
printf("\n");
}
// => Output: 2 2 2 / 2 2 0 / 2 0 1
return 0;
}Key Takeaway: Flood fill marks cells as visited by changing their color before enqueuing them — this prevents exponential duplicate processing. BFS fills level by level (shortest path), while DFS is simpler to implement recursively.
Why It Matters: Flood fill is the algorithm in every raster graphics editor’s paint-bucket tool (Photoshop, GIMP, Figma). It counts islands in grid problems (a common interview problem), labels connected components in binary images (essential for document layout analysis and OCR), and fills enclosed regions in game maps. Production implementations use iterative BFS rather than recursive DFS to avoid stack overflow on large grids.
Example 85: A* Search — Heuristic Pathfinding
A* combines Dijkstra’s guaranteed shortest path with a heuristic estimate to the goal, focusing the search toward the destination. It runs optimally when the heuristic never overestimates (admissible heuristic).
%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
graph LR
A["Start Node<br/>f=g+h"] -->|pop min f| B["Open Set<br/>min-heap by f"]
B -->|explore neighbors| C["Update g-scores"]
C -->|compute h heuristic| D["Push to Open Set<br/>with new f"]
D -->|goal reached| E["Reconstruct Path"]
D -->|not goal| B
style A fill:#0173B2,stroke:#000,color:#fff
style B fill:#DE8F05,stroke:#000,color:#fff
style C fill:#CA9161,stroke:#000,color:#fff
style D fill:#CC78BC,stroke:#000,color:#fff
style E fill:#029E73,stroke:#000,color:#fff
#include <stdio.h>
#include <math.h>
#include <string.h>
#define MAXR 50
#define MAXC 50
#define MAXH 10000
#define INF 1e18
typedef struct { double f; int r, c; } HNode;
HNode heap[MAXH]; int hsz;
void hpush(double f, int r, int c) {
int i = hsz++;
heap[i] = (HNode){f, r, c};
while (i > 0) {
int p = (i-1)/2;
if (heap[p].f <= heap[i].f) break;
HNode t = heap[p]; heap[p] = heap[i]; heap[i] = t;
i = p;
}
}
HNode hpop(void) {
HNode top = heap[0]; heap[0] = heap[--hsz];
int i = 0;
while (1) {
int l=2*i+1, r=2*i+2, m=i;
if (l < hsz && heap[l].f < heap[m].f) m = l;
if (r < hsz && heap[r].f < heap[m].f) m = r;
if (m == i) break;
HNode t = heap[i]; heap[i] = heap[m]; heap[m] = t;
i = m;
}
return top;
}
int grid[MAXR][MAXC];
double g_score[MAXR][MAXC];
int from_r[MAXR][MAXC], from_c[MAXR][MAXC];
int closed[MAXR][MAXC];
int rows, cols;
double heuristic(int r1, int c1, int r2, int c2) {
double dr = r1-r2, dc = c1-c2;
return sqrt(dr*dr + dc*dc); // => Euclidean distance heuristic
}
int astar(int sr, int sc, int gr, int gc) {
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++) {
g_score[i][j] = INF; closed[i][j] = 0;
from_r[i][j] = -1; from_c[i][j] = -1;
}
g_score[sr][sc] = 0;
hsz = 0;
hpush(heuristic(sr, sc, gr, gc), sr, sc);
int dr[] = {-1,1,0,0,-1,-1,1,1}, dc[] = {0,0,-1,1,-1,1,-1,1};
while (hsz > 0) {
HNode cur = hpop();
int r = cur.r, c = cur.c;
if (closed[r][c]) continue;
if (r == gr && c == gc) return 1; // => Goal reached
closed[r][c] = 1;
for (int d = 0; d < 8; d++) {
int nr = r+dr[d], nc = c+dc[d];
if (nr < 0 || nr >= rows || nc < 0 || nc >= cols) continue;
if (grid[nr][nc] == 1 || closed[nr][nc]) continue;
double step = sqrt(dr[d]*dr[d] + dc[d]*dc[d]);
double ng = g_score[r][c] + step;
if (ng < g_score[nr][nc]) {
g_score[nr][nc] = ng;
from_r[nr][nc] = r; from_c[nr][nc] = c;
hpush(ng + heuristic(nr, nc, gr, gc), nr, nc);
}
}
}
return 0;
}
int main(void) {
rows = 5; cols = 5;
int init[5][5] = {
{0,0,0,0,1},{0,1,1,0,0},{0,0,0,1,0},{0,1,0,0,0},{0,0,0,1,0}
};
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++) grid[i][j] = init[i][j];
if (astar(0, 0, 4, 4)) {
// => Reconstruct path
int pr[50], pc[50], plen = 0;
int r = 4, c = 4;
while (r != -1) {
pr[plen] = r; pc[plen++] = c;
int tr = from_r[r][c], tc = from_c[r][c];
r = tr; c = tc;
}
for (int i = plen-1; i >= 0; i--) printf("(%d,%d) ", pr[i], pc[i]);
printf("\n");
}
// => Output: path from (0,0) to (4,4), e.g. (0,0) (1,0) (2,0) (2,1) (2,2) (3,2) (4,2) (4,3) (4,4)
return 0;
}Key Takeaway: A* is optimal and complete when the heuristic is admissible (never overestimates). The heuristic guides the search toward the goal, dramatically reducing nodes explored compared to Dijkstra, which expands uniformly in all directions.
Why It Matters: A* is the gold-standard pathfinding algorithm in game development (every pathfinding NPC in open-world games), robotics (ROS navigation stack), and GPS routing for small maps. Production systems use Hierarchical A* (HPA*), Theta* for any-angle paths, or JPS (Jump Point Search) for grid-specific 10-40x speedups. Understanding A* provides the foundation for all heuristic search algorithms including beam search (used in NLP decoders) and Monte Carlo Tree Search (used in AlphaGo).