[go: up one dir, main page]

0% found this document useful (0 votes)
13 views4 pages

Streaming Algorithms Explained

The document explains various streaming algorithms, including Misra-Gries, Lossy Counting, Reservoir Sampling, and others, detailing their purposes and real-life use cases. Each algorithm is accompanied by pseudocode to illustrate its implementation. The algorithms are designed for tasks such as identifying frequent elements, estimating distinct counts, and analyzing real-time data.

Uploaded by

Yoshi Hao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Streaming Algorithms Explained

The document explains various streaming algorithms, including Misra-Gries, Lossy Counting, Reservoir Sampling, and others, detailing their purposes and real-life use cases. Each algorithm is accompanied by pseudocode to illustrate its implementation. The algorithms are designed for tasks such as identifying frequent elements, estimating distinct counts, and analyzing real-time data.

Uploaded by

Yoshi Hao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Streaming Algorithms Explained with Use Cases

1. Misra-Gries Algorithm

Purpose: Identify elements in a data stream that occur frequently (above a given threshold).

Real-Life Use: Detecting most common queries in a search engine, finding trending topics on Twitter.

Pseudocode:

Initialize an empty dictionary C

Set k (maximum number of counters)

For each element x in the stream:

If x in C:

C[x] += 1

Else if len(C) < k - 1:

C[x] = 1

Else:

Decrease all counts in C by 1

Remove entries with count 0

2. Lossy Counting

Purpose: Track frequent items approximately with error tolerance.

Real-Life Use: Online ad clickstream analysis, monitoring frequently accessed web pages.

Pseudocode:

Set epsilon (error parameter), N = 0, empty dict C

For each x:

N += 1

If x in C: C[x][0] += 1

Else: C[x] = [1, current_bucket - 1]

Every bucket_width items: remove entries if C[key][0] + C[key][1] <= current_bucket

3. Reservoir Sampling
Streaming Algorithms Explained with Use Cases

Purpose: Random sampling from a stream of unknown size.

Real-Life Use: Select random user sessions, logs, or tweets from a firehose for analysis.

Pseudocode:

Fill reservoir of size k with first k items

For each i > k, replace item at random with probability k/i

4. KMV (K Minimum Values)

Purpose: Estimate number of distinct elements (cardinality).

Real-Life Use: Estimate unique website visitors, count distinct IP addresses in logs.

Pseudocode:

Hash elements to [0,1]; maintain k smallest values

Estimate = (k - 1) / max(k smallest hash values)

5. Boyer-Moore Majority Vote

Purpose: Find majority element (>50% frequency).

Real-Life Use: Detect most dominant behavior in logs, top-voted answer in feedback.

Pseudocode:

candidate = None, count = 0

For each x:

If count == 0: candidate = x

If x == candidate: count += 1 else: count -= 1

6. Space-Saving Algorithm

Purpose: Find most frequent elements using fixed memory.

Real-Life Use: Top-k search terms, most active users on a platform.


Streaming Algorithms Explained with Use Cases

Pseudocode:

Keep k counters. If x is new and full, replace min entry and increase count.

7. Count Sketch

Purpose: Approximate frequency counts with negative noise correction.

Real-Life Use: Network traffic monitoring, approximate counting in DBMS.

Pseudocode:

Use d hash + sign functions. Update multiple counters using signs.

Estimate = median of (counter * sign) for each row.

8. Count-Min Sketch

Purpose: Estimate frequencies (overestimate only).

Real-Life Use: Spam detection, streaming logs, cache eviction policies.

Pseudocode:

Update count in d hash buckets; estimate = min(counts across all hashes)

9. Bloom Filter

Purpose: Test membership in a set with false positives.

Real-Life Use: Caches, databases (avoid unnecessary disk lookups), network security.

Pseudocode:

Hash x to k bit positions and set them to 1

To query: check all k bits are 1 (else not in set)

10. Sliding Window Model

Purpose: Analyze most recent data (e.g., last 1 min/hour).


Streaming Algorithms Explained with Use Cases

Real-Life Use: Real-time alerts, CPU/memory usage, fraud detection.

Pseudocode:

Maintain deque of last N elements, slide as new ones arrive

You might also like