0% found this document useful (0 votes)

16 views5 pages

BDA Notes Part 2

The Decaying Window Algorithm identifies trending elements in data streams by assigning weights that decay over time, allowing it to discount random spikes and focus on newer elements. The E-Stream algorithm adapts to evolving data by fading the weight of clusters and managing their appearance, disappearance, merging, and splitting based on incoming data. The Clique algorithm is a density-based and grid-based subspace clustering method that identifies dense clusters in high-dimensional spaces, utilizing the Apriori principle for better clustering outcomes.

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

BDA Notes Part 2

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Decaying Window Algorithm:

This algorithm allows you to identify the most popular elements (trending, in other words) in
an incoming data stream.

The decaying window algorithm not only tracks the most recurring elements in an incoming
data stream, but also discounts any random spikes or spam requests that might have
boosted an element’s frequency.

In a decaying window, you assign a score/weight to every element of the incoming data
stream. Further, you need to calculate the aggregate sum for each distinct element by
adding all the weights assigned to that element. The element with the highest total score is
listed as trending or the most popular.

In a decaying window algorithm, you assign more weight to newer elements. For a new
element, you first reduce the weight of all the existing elements by a constant factor k and
then assign the new element with a specific weight. The aggregate sum of the decaying
exponential weights can be calculated using the following formula:

Here, c is usually a small constant of the order 10 -6 or 10 -9. Whenever a new element, say
at+1, arrives in the data stream you perform the following steps to achieve an updated sum:

1. Multiply the current sum/score by the value (1 − c).

2. Add the weight corresponding to the new element.

In a data stream consisting of various elements, you maintain a separate sum for each
distinct element. For every incoming element, you multiply the sum of all the existing
elements by a value of (1 − c). Further, you add the weight of the incoming element to its
corresponding aggregate sum.

A threshold can be kept to, ignore elements of weight lesser than that. Finally, the element
with the highest aggregate score is listed as the most popular element.
Advantages of Decaying Window Algorithm:

1. Sudden spikes or spam data is taken care.

2. New element is given more weight by this mechanism, to achieve right trending output.
E-Stream: Evolution-Based Technique for Stream Clustering

Basic Concepts and Definitions:

1. Fading decreases weight of data over time. In a data stream that has evolving data; older
data should have lesser weight. We decrease weight of every cluster over time to
achieve a fast adaptive cluster model. Let λ be the decay rate and t be elapsed time, the
fading function is
f(t) = 2 -λt
2. Weight of a cluster is the number of data elements in a cluster. Weight is determined
according to the fading function. Initially, each data element has a weight of 1. A cluster
can be increased its weight by assembling incoming data points or merging with other
clusters.

The behaviour of data in a data stream can evolve over time. We can classify this evolution
into five categories:

1. Appearance: A new cluster can appear if there is a sufficiently dense group of data
points in one area. Initially, such elements appear as a group of outliers, but (as more
data appears in the neighbourhood), they are recognized as a cluster.
2. Disappearance: Existing clusters can disappear because the existence of data is
diminished over time. Clusters that contain only old data are faded and eventually
disappear because they do not represent the presence of data.
3. Self-Evolution: Data can change their behaviours, which cause size or position of a
cluster to evolve. Evolution can be done faster if the data can fade.
4. Merge: A pair of clusters can be merged if their characteristics are very similar. Merged
clusters must cover the behaviour of the pair.
5. Split: A cluster can be split into two smaller clusters if the behaviour inside the cluster is
obviously separated.

E-Stream Algorithm:
In line 1, the algorithm starts by retrieving a new data point. In line 2, it fades all clusters and
deletes those having insufficient weight. Line 3 performs a histogram analysis and cluster
split. Then line 4 checks for overlap clusters and merges them. Line 5 checks the number of
clusters and merges the closest pairs if the cluster count exceeds the limit. Line 6 checks all
clusters whether their status is active. Lines 7-10 find the closest cluster to the incoming
data point. If the distance is less than or equal to radius_factor then the point is assigned to
the cluster, otherwise it is an isolated data point. The flow of control then returns to the top
of the algorithm and waits for a new data point.

FadingAll. The algorithm performs fading of all clusters and deletes the clusters whose
weight is less than remove_threshold.

CheckSplit is used to verify the splitting criteria in each cluster using the histogram. If a
splitting point is found in any cluster, then it is split. And store the index pairs of split clusters
in S.

CheckMerge is an algorithm for merging pairs of similar clusters. This algorithm checks every
pair of clusters and computes the cluster-cluster distance. If the distance is less than
merge_threshold and the merged pair is not in S then merge the pair.

LimitMaximumCluster is used to limit the number of clusters. This algorithm checks

whether the number of clusters is not greater than maximum_cluster (an input parameter);
if it exceeds then the closest pair of clusters is merged until the number of remaining
clusters is less than or equal to the threshold.

FlagActiveCluster is used to check the current active cluster. If the weight of any cluster is
greater or equal to active_threshold then it is flagged as an active cluster. Otherwise, the flag
is cleared.

FindClosestCluster is used to find the distance and index of the closest active cluster for an
incoming data point.

Clique (Clustering in Quest) Algorithm:

 Clique is a density-based and grid-based subspace clustering algorithm.

o Grid-based: It discretizes the data space through a grid and estimates the density by
counting the number of points in a grid cell.
o Density-based: A cluster is a maximal set of connected dense units in a subspace.
 A unit is dense if the fraction of total data points contained in the unit exceeds
the input model parameter.
 Subspace Clustering: A subspace cluster is a set of neighbouring dense cells in an
arbitrary subspace. It also discovers some minimal descriptions of the clusters.
 It automatically identifies subspaces of a high dimensional data space that allow better
clustering than original space using the Apriori principle.
Apriori Principle: If a collection of points S is a cluster in a k-dimensional space, then S is also
a part of a cluster in any (k - 1) dimensional projections of this space.

Decaying Window
No ratings yet
Decaying Window
16 pages
LNAI 4632 - E-Stream - Evolution-Based Technique For Stream Clustering
No ratings yet
LNAI 4632 - E-Stream - Evolution-Based Technique For Stream Clustering
11 pages
GFJHFN
No ratings yet
GFJHFN
21 pages
Module 4
No ratings yet
Module 4
20 pages
1.1 The Problem of Most-Common Elements
No ratings yet
1.1 The Problem of Most-Common Elements
3 pages
E-Stream Evolution-Based Technique For Stream Clus
No ratings yet
E-Stream Evolution-Based Technique For Stream Clus
12 pages
Stream
No ratings yet
Stream
30 pages
Requirements For Clustering Data Streams: Dbarbara@gmu - Edu
No ratings yet
Requirements For Clustering Data Streams: Dbarbara@gmu - Edu
5 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
38 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Chapter 9 Clustering
No ratings yet
Chapter 9 Clustering
6 pages
Bonus Tema Grupiranje Tijekovnih Podataka
No ratings yet
Bonus Tema Grupiranje Tijekovnih Podataka
36 pages
Streams 1
No ratings yet
Streams 1
33 pages
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
No ratings yet
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
47 pages
Unit 4
No ratings yet
Unit 4
4 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
24 pages
Mining Data Streams
No ratings yet
Mining Data Streams
34 pages
BDA Assignment2 BE6 20
No ratings yet
BDA Assignment2 BE6 20
9 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
Clustering Data Stream Based On Shared Density Graph: Algorithm Explanation
No ratings yet
Clustering Data Stream Based On Shared Density Graph: Algorithm Explanation
2 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
Efficient Clustering Algorithm For Large Data Set
No ratings yet
Efficient Clustering Algorithm For Large Data Set
5 pages
Unit-4th Question-Bank Solution
No ratings yet
Unit-4th Question-Bank Solution
52 pages
Clustering
No ratings yet
Clustering
34 pages
A Framework For Clustering Evolving Data Streams
No ratings yet
A Framework For Clustering Evolving Data Streams
12 pages
Clustering Algorithm: A Fundamental Operation in Data Mining
No ratings yet
Clustering Algorithm: A Fundamental Operation in Data Mining
44 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
27 pages
DM Unit-5 Notes
No ratings yet
DM Unit-5 Notes
16 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
A Study On Weather Forecast Using Data Streams
No ratings yet
A Study On Weather Forecast Using Data Streams
11 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
A Framework For Clustering Evolving Data Streams
No ratings yet
A Framework For Clustering Evolving Data Streams
21 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Evolving Fuzzy Model
No ratings yet
Evolving Fuzzy Model
9 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Unit 4
No ratings yet
Unit 4
25 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Unit VII
No ratings yet
Unit VII
30 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
DA Unit II
No ratings yet
DA Unit II
21 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
21 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Econometrics for Analysts
No ratings yet
Econometrics for Analysts
4 pages
Iterative Methods - Linear Systems
No ratings yet
Iterative Methods - Linear Systems
32 pages
Linearization
No ratings yet
Linearization
23 pages
Cst401 Artificial Intelligence, November 2024
No ratings yet
Cst401 Artificial Intelligence, November 2024
3 pages
Chap. Simultaneous Equation
No ratings yet
Chap. Simultaneous Equation
14 pages
Memory Based Hardware Efficient Implementation of FIR Filters
No ratings yet
Memory Based Hardware Efficient Implementation of FIR Filters
9 pages
Lab 08 PDF
No ratings yet
Lab 08 PDF
10 pages
Transmitter/Receiver Code Description Introduction To Digital Communication Receiver Design
No ratings yet
Transmitter/Receiver Code Description Introduction To Digital Communication Receiver Design
14 pages
Audio Quality Evaluation Tools
No ratings yet
Audio Quality Evaluation Tools
10 pages
CS205 Data Structures SYLLABUS
No ratings yet
CS205 Data Structures SYLLABUS
3 pages
Simplex Method
No ratings yet
Simplex Method
45 pages
Algorithm 907: KLU, A Direct Sparse Solver For Circuit Simulation Problems
No ratings yet
Algorithm 907: KLU, A Direct Sparse Solver For Circuit Simulation Problems
17 pages
Laguerre MPC
No ratings yet
Laguerre MPC
6 pages
Digital Channelized Receiver Based On Time Frequency Analysis For Signal Interception
No ratings yet
Digital Channelized Receiver Based On Time Frequency Analysis For Signal Interception
20 pages
Collaborative Clustering for RSISC
No ratings yet
Collaborative Clustering for RSISC
11 pages
Warehause Managed by EWM
No ratings yet
Warehause Managed by EWM
2 pages
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
0% (2)
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
689 pages
Dsa Hashingppt
No ratings yet
Dsa Hashingppt
8 pages
OM Notes PDF
No ratings yet
OM Notes PDF
278 pages
Quicksort: Pseudo Code For Recursive Quicksort Function
No ratings yet
Quicksort: Pseudo Code For Recursive Quicksort Function
11 pages
Dip Practical File
No ratings yet
Dip Practical File
16 pages
Foundations of Cryptography (CYS 602) : Lecture #2 Stream Ciphers
No ratings yet
Foundations of Cryptography (CYS 602) : Lecture #2 Stream Ciphers
18 pages
Ese - Ty B Tech - Sem 5 - Computer Engg - Distributed Computing - Dec 2023
No ratings yet
Ese - Ty B Tech - Sem 5 - Computer Engg - Distributed Computing - Dec 2023
2 pages
MATLAB Image Filtering Guide
No ratings yet
MATLAB Image Filtering Guide
10 pages
Hopfield Net
No ratings yet
Hopfield Net
7 pages
S1 Winter Session Chapter 6 Manipulations of Polynomials: Exercise 6.1
No ratings yet
S1 Winter Session Chapter 6 Manipulations of Polynomials: Exercise 6.1
16 pages
D1 Specimen
No ratings yet
D1 Specimen
28 pages
Value Function Approximation SEO Guide
No ratings yet
Value Function Approximation SEO Guide
59 pages
Advanced Signal Encoding Techniques
No ratings yet
Advanced Signal Encoding Techniques
21 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages

BDA Notes Part 2

Uploaded by

BDA Notes Part 2

Uploaded by

Decaying Window Algorithm:

1. Multiply the current sum/score by the value (1 − c).

1. Sudden spikes or spam data is taken care.

Basic Concepts and Definitions:

LimitMaximumCluster is used to limit the number of clusters. This algorithm checks

Clique (Clustering in Quest) Algorithm:

 Clique is a density-based and grid-based subspace clustering algorithm.

You might also like