[go: up one dir, main page]

0% found this document useful (0 votes)
15 views42 pages

Improving Adaptive Bagging Methods For Evolving Data Streams

This document proposes two new ensemble methods for evolving data streams: Adaptive-Size Hoeffding Tree (ASHT) Bagging and ADWIN Bagging. ASHT Bagging uses an ensemble of decision trees of varying sizes, where smaller trees can adapt more quickly to changes and larger trees perform better with little change. ADWIN Bagging replaces the worst performing classifier with a new one when the ADWIN change detector detects a change. The document evaluates these methods on synthetic and real-world data streams and finds they improve accuracy on data streams experiencing concept drift compared to standard bagging.

Uploaded by

Kevin Mondragon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views42 pages

Improving Adaptive Bagging Methods For Evolving Data Streams

This document proposes two new ensemble methods for evolving data streams: Adaptive-Size Hoeffding Tree (ASHT) Bagging and ADWIN Bagging. ASHT Bagging uses an ensemble of decision trees of varying sizes, where smaller trees can adapt more quickly to changes and larger trees perform better with little change. ADWIN Bagging replaces the worst performing classifier with a new one when the ADWIN change detector detects a change. The document evaluates these methods on synthetic and real-world data streams and finds they improve accuracy on data streams experiencing concept drift compared to standard bagging.

Uploaded by

Kevin Mondragon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Improving Adaptive Bagging Methods

for Evolving Data Streams

A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà

University of Waikato
Hamilton, New Zealand

Laboratory for Relational Algorithmics, Complexity and Learning LARCA


UPC-Barcelona Tech, Catalonia

Nanjing, 4 November 2009


1st Asian Conference on Machine Learning (ACML’09)
Motivation

MOA Software for Mining Data Streams


Build a useful software mining for massive data sets

Bagging for data streams


Improve accuracy on classification methods for data
streams

2 / 26
Data stream classification cycle

1 Process an example at a time,


and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point

3 / 26
Realtime analytics: from Databases to Dataflows
Data streams
Data streams are ordered datasets
Not all datasets are data streams
All dataset may be processed incrementally as a data
stream

MOA: Massive Online Analysis


Faster Mining Software using less resources

Instant mining: more for less

4 / 26
What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learning


from data streams.

It is closely related to WEKA


It includes a collection of offline and online as well as tools
for evaluation:
boosting and bagging
Hoeffding Trees
with and without Naïve Bayes classifiers at the leaves.

5 / 26
WEKA

Waikato Environment for Knowledge Analysis


Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java
Released under the GPL
Support for the whole process of experimental data mining
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning

Used for education, research and applications


Complements “Data Mining” by Witten & Frank

6 / 26
WEKA: the bird

7 / 26
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.

8 / 26
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.

8 / 26
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.

8 / 26
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.

8 / 26
Concept Drift Framework
f (t) f (t)
1

α
0.5

α
t
t0
W
Definition
Given two data streams a, b, we define c = a ⊕W t0 b as the data
stream built joining the two data streams a and b
Pr[c(t) = b(t)] = 1/(1 + e−4(t−t0 )/W ).
Pr[c(t) = a(t)] = 1 − Pr[c(t) = b(t)]

9 / 26
Concept Drift Framework
f (t) f (t)
1

α
0.5

α
t
t0
W
Example
(((a ⊕W W1 W2
t0 b) ⊕t1 c) ⊕t2 d) . . .
0

(((SEA9 ⊕W W W
t0 SEA8 ) ⊕2t0 SEA7 ) ⊕3t0 SEA9.5 )

CovPokElec = (CoverType ⊕5,000 5,000


581,012 Poker) ⊕1,000,000 ELEC2

9 / 26
New Ensemble Methods For Evolving Data Streams

New Ensemble Methods For Evolving Streams (KDD’09)


a new experimental data stream framework for studying
concept drift
two new variants of Bagging:
ADWIN Bagging
Adaptive-Size Hoeffding Tree (ASHT) Bagging.
an evaluation study on synthetic and real-world datasets

10 / 26
Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

11 / 26
Adaptive-Size Hoeffding Tree
T1 T2 T3 T4

Ensemble of trees of different size


each tree has a maximum size
after one node splits, it deletes some nodes to reduce its
size if the size of the tree is higher than the maximum value

12 / 26
Adaptive-Size Hoeffding Tree
T1 T2 T3 T4

Ensemble of trees of different size


smaller trees adapt more quickly to changes,
larger trees do better during periods with little change
diversity

12 / 26
Adaptive-Size Hoeffding Tree

0,3 0,28

0,29
0,275
0,28

0,27
0,27
0,26
Error

Error
0,25 0,265

0,24
0,26
0,23

0,22
0,255
0,21

0,2 0,25
0 0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3
Kappa Kappa

Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging


(right) on dataset RandomRBF with drift, plotting 90 pairs of
classifiers.

13 / 26
Improvement for ASHT Bagging Method

Improvement for ASHT Bagging ensemble method


Bagging using trees of different size
add a change detector for each tree in the ensemble
DDM: Gama et al.
EDDM: Baena, del Campo, Fidalgo et al.

14 / 26
Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

15 / 26
ADWIN Bagging

ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.

ADWIN has rigorous guarantees (theorems)


On ratio of false positives and negatives
On the relation of the size of the current window and
change rates

ADWIN Bagging
When a change is detected, the worst classifier is removed and
a new classifier is added.

16 / 26
Optimal Change Detector and Predictor
ADWIN

High accuracy
Fast detection of change
Low false positives and false negatives ratios
Low computational cost: minimum space and time needed
Theoretical guarantees
No parameters needed
Estimator with Memory and Change Detector

17 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 1

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 1 W1 = 01010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 10 W1 = 1010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 101 W1 = 010110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 1010 W1 = 10110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 10101 W1 = 0110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 101010 W1 = 110111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 1010101 W1 = 10111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111
W0 = 10101011 W1 = 0111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111 |µ̂W0 − µ̂W1 | ≥ εc : CHANGE DET.!
W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 101010110111111 Drop elements from the tail of W
W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN
Example
W = 01010110111111 Drop elements from the tail of W
W0 = 101010110 W1 = 111111

ADWIN: A DAPTIVE W INDOWING A LGORITHM


1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µ̂W0 − µ̂W1 | < εc holds
6 for every split of W into W = W0 · W1
7 Output µ̂W

18 / 26
Algorithm ADaptive Sliding WINdow
ADWIN

Theorem
At every time step we have:
1 (False positive rate bound). If µt remains constant within
W , the probability that ADWIN shrinks the window at this
step is at most δ .
2 (False negative rate bound). Suppose that for some
partition of W in two parts W0 W1 (where W1 contains the
most recent items) we have |µW0 − µW1 | > 2εc . Then with
probability 1 − δ ADWIN shrinks W to W1 , or shorter.

ADWIN tunes itself to the data stream at hand, with no need for
the user to hardwire or precompute parameters.

19 / 26
Algorithm ADaptive Sliding WINdow
ADWIN

ADWIN using a Data Stream Sliding Window Model,


can provide the exact counts of 1’s in O(1) time per point.
tries O(log W ) cutpoints
uses O( 1ε log W ) memory words
the processing time per example is O(log W ) (amortized
and worst-case).

Sliding Window Model

1010101 101 11 1 1
Content: 4 2 2 1 1
Capacity: 7 3 2 1 1

20 / 26
ADWIN bagging using Hoeffding Adaptive Trees
Decision Trees: Hoeffding Adaptive Tree
CVFDT: Hulten, Spencer and Domingos
No theoretical guarantees on the error rate of CVFDT
Parameters needed : size of window, number of
examples,...

Hoeffding Adaptive Tree:


replace frequency statistics counters by estimators
don’t need a window to store examples
use a change detector with theoretical guarantees to
substitute trees

Advantages:
1 Theoretical guarantees
2 No Parameters

21 / 26
Outline

1 Adaptive-Size Hoeffding Tree bagging

2 ADWIN Bagging

3 Empirical evaluation

22 / 26
Empirical evaluation

Dataset Most Accurate Method


Hyperplane Drift 0.0001 Bag10 ASHT W+R
Hyperplane Drift 0.001 DDM Bag10 ASHT W
SEA W = 50 BagADWIN 10 HAT
SEA W = 50000 BagADWIN 10 HAT
RandomRBF No Drift 50 centers Bag 10 HT
RandomRBF Drift .0001 50 centers BagADWIN 10 HAT
RandomRBF Drift .001 50 centers DDM Bag10 ASHT W
RandomRBF Drift .001 10 centers BagADWIN 10 HAT
Cover Type DDM Bag10 ASHT W
Poker BagADWIN 10 HAT
Electricity DDM Bag10 ASHT W
CovPokElec BagADWIN 10 HAT

23 / 26
Empirical evaluation

SEA
W= 50000
Time Acc. Mem.
BagADWIN 10 HAT 154.91 88.88 ± 0.05 2.35
DDM Bag10 ASHT W 44.02 88.72 ± 0.05 0.65
NaiveBayes 5.52 84.60 ± 0.03 0.00
NBADWIN 12.40 87.83 ± 0.07 0.02
HT 7.20 85.02 ± 0.11 0.33
HT DDM 7.88 88.17 ± 0.18 0.16
HAT 20.96 88.40 ± 0.07 0.18
BagADWIN 10 HT 53.15 88.58 ± 0.10 0.88
Bag10 HT 30.88 85.38 ± 0.06 3.36
Bag10 ASHT W+R 33.56 88.51 ± 0.06 0.84

24 / 26
Empirical evaluation

Accuracy

73,9

73,4 BagAdwin HAT


Accuracy (%)

DDM BagHAST
EDDM BagHast
72,9
DDM HT
EDDM HT
72,4 BagAdwin HT
BagHAST

71,9

71,4
10.000 140.000 270.000 400.000 530.000 660.000 790.000 920.000
Instances

Figure: Accuracy on dataset LED with three concept drifts.


25 / 26
Summary

http://www.cs.waikato.ac.nz/∼abifet/MOA/

Conclusions
New improvements for ensemble bagging methods:
Adaptive-Size Hoeffding Tree bagging using change
detection methods
ADWIN bagging using Hoeffding Adaptive Trees
MOA is easy to use and extend

Future Work
Extend MOA to more data mining and learning methods.

26 / 26

You might also like