Streaming Algorithms Explained with Use Cases
1. Misra-Gries Algorithm
Purpose: Identify elements in a data stream that occur frequently (above a given threshold).
Real-Life Use: Detecting most common queries in a search engine, finding trending topics on Twitter.
Pseudocode:
Initialize an empty dictionary C
Set k (maximum number of counters)
For each element x in the stream:
If x in C:
C[x] += 1
Else if len(C) < k - 1:
C[x] = 1
Else:
Decrease all counts in C by 1
Remove entries with count 0
2. Lossy Counting
Purpose: Track frequent items approximately with error tolerance.
Real-Life Use: Online ad clickstream analysis, monitoring frequently accessed web pages.
Pseudocode:
Set epsilon (error parameter), N = 0, empty dict C
For each x:
N += 1
If x in C: C[x][0] += 1
Else: C[x] = [1, current_bucket - 1]
Every bucket_width items: remove entries if C[key][0] + C[key][1] <= current_bucket
3. Reservoir Sampling
Streaming Algorithms Explained with Use Cases
Purpose: Random sampling from a stream of unknown size.
Real-Life Use: Select random user sessions, logs, or tweets from a firehose for analysis.
Pseudocode:
Fill reservoir of size k with first k items
For each i > k, replace item at random with probability k/i
4. KMV (K Minimum Values)
Purpose: Estimate number of distinct elements (cardinality).
Real-Life Use: Estimate unique website visitors, count distinct IP addresses in logs.
Pseudocode:
Hash elements to [0,1]; maintain k smallest values
Estimate = (k - 1) / max(k smallest hash values)
5. Boyer-Moore Majority Vote
Purpose: Find majority element (>50% frequency).
Real-Life Use: Detect most dominant behavior in logs, top-voted answer in feedback.
Pseudocode:
candidate = None, count = 0
For each x:
If count == 0: candidate = x
If x == candidate: count += 1 else: count -= 1
6. Space-Saving Algorithm
Purpose: Find most frequent elements using fixed memory.
Real-Life Use: Top-k search terms, most active users on a platform.
Streaming Algorithms Explained with Use Cases
Pseudocode:
Keep k counters. If x is new and full, replace min entry and increase count.
7. Count Sketch
Purpose: Approximate frequency counts with negative noise correction.
Real-Life Use: Network traffic monitoring, approximate counting in DBMS.
Pseudocode:
Use d hash + sign functions. Update multiple counters using signs.
Estimate = median of (counter * sign) for each row.
8. Count-Min Sketch
Purpose: Estimate frequencies (overestimate only).
Real-Life Use: Spam detection, streaming logs, cache eviction policies.
Pseudocode:
Update count in d hash buckets; estimate = min(counts across all hashes)
9. Bloom Filter
Purpose: Test membership in a set with false positives.
Real-Life Use: Caches, databases (avoid unnecessary disk lookups), network security.
Pseudocode:
Hash x to k bit positions and set them to 1
To query: check all k bits are 1 (else not in set)
10. Sliding Window Model
Purpose: Analyze most recent data (e.g., last 1 min/hour).
Streaming Algorithms Explained with Use Cases
Real-Life Use: Real-time alerts, CPU/memory usage, fraud detection.
Pseudocode:
Maintain deque of last N elements, slide as new ones arrive