[WIP] Bagging ensemble meta-estimator #2198

glouppe · 2013-07-23T15:26:20Z

Hi,

This is a very early PR for a meta-estimator ensemble implementing ensemble averaging/voting. The idea is to make a meta-estimator that can take as input any type of base estimator (not only trees) and make an ensemble out of it. This should work quite well for estimators with high variance (trees, gbrt, neural networks typically).

TODO list:

pprett · 2013-07-23T16:13:59Z

For me, RandomPatches sounds good - extracting random patches from the FX matrix...

pprett · 2013-07-23T16:18:55Z

sklearn/ensemble/random_patches.py

+        for estimator in self.estimators_:
+            mask = np.ones(n_samples, dtype=np.bool)
+            mask[estimator.indices_] = False
+


could you add some comments on what's going on below - seems like you have two cases depending on whether or not estimator supports predict_proba.

Basically, if predict_proba is not supported, I make the base estimators to vote. I'll add some comments to clarify things.

pprett · 2013-07-23T17:40:03Z

code looks good to me - documentation is missing - would be great if you could add an example and/or incorporate RandomPatch into one of our examples

ogrisel · 2013-07-25T13:26:54Z

If think the RandomPatches name is too confusing as people might expect that it's only relevant to 2D structured data like images or other computer vision related tasks.

ResampledClassifiers sounds both concise and explicit to me.

amueller · 2013-07-25T14:17:34Z

how about ResampledEnsemble?

amueller · 2013-07-25T14:18:16Z

actually maybe ResampledClassifiers is better

GaelVaroquaux · 2013-08-04T15:41:00Z

If think the RandomPatches name is too confusing as people might expect that it's only relevant to 2D structured data like images or other computer vision related tasks.

Fully agreed.

ResampledClassifiers sounds both concise and explicit to me.

Sounds good. Or maybe BaggingClassifier: I don't care about the
mathematical exactitudes of little details like the fact that this does
more than bootstrap: bagging has captured the popular imagination beyond
bootstrap.

ogrisel · 2013-08-04T15:49:37Z

Sounds good. Or maybe BaggingClassifier: I don't care about the mathematical exactitudes of little details like the fact that this does more than bootstrap: bagging has captured the popular imagination beyond bootstrap.

Also the recent paper on Google ad click prediction calls the "feature subsampling" strategy (without replacement) "Feature Bagging". So indeed Bagging ain't what it used to be. Therefore I am ok with abusing the name as well.

To sum up I am ok with either ResampledClassifiers or BaggingClassifier.

agramfort · 2013-08-05T06:38:54Z

I think that @glouppe did not like Bagging so we went for
ResampledClassifiers but I am fine with BaggingClassifier too.

- Added 'cosine_distances' function to sklearn.metrics.pairwise. - Added 'cosine' as metric in 'pairwise_distances' function. - Corrected doc string of same function, because all metrics based on the 'manhattan_distances' function (i.e. 'cityblock', 'l1', and 'manhattan') do currently NOT support sparse matrices. - Added corresponding corresponding unit test.

Cosine distance metric for sparse matrices

[MRG] remove warnings in univariate feature selection

… random-patches Conflicts: sklearn/ensemble/__init__.py

glouppe · 2013-08-20T13:41:43Z

Ping. Just to let you know, I am making progress on this. I have renamed the classes to BaggingClassifier (resp Resgressor) and added everything that I wanted. This meta-estimator can now handle pasting, bagging, random subspaces or all of them at once (i.e., random patches).

The only things left are the narrative documentation and writing an example :)

glouppe · 2013-08-20T13:43:57Z

Also, I have rebased on top of master, but the history in the pull request seems to be kinda screwed up :s It contains dupplicate or unrelated commits. Any guess on how to clean that?

GaelVaroquaux · 2013-08-20T14:01:41Z

Also, I have rebased on top of master, but the history in the pull request
seems to be kinda screwed up :s It contains dupplicate or unrelated commits.
Any guess on how to clean that?

Yeah, rebasing confuses git. Unfortunately, the only way out of this,
AFAIK is to create a new PR.

glouppe added 6 commits July 23, 2013 12:59

First pass at a bagging meta-estimator

ac173e4

Second pass at bagging meta-estimator

b1f51c9

FIX: _set_oob_score

ed3b527

Added tests for regression

a3d2a17

Rename to Random Patches

477b5d9

Rename to Random Patches (2)

c7690c6

pprett reviewed Jul 23, 2013
View reviewed changes

emsrc and others added 15 commits August 18, 2013 19:17

added missing assert in unit test

1428f6d

Fixed doc string; compute cosine distance without copying matrix.

fc91c33

Merge pull request scikit-learn#2368 from emsrc/cosine_distance

dcf827e

Cosine distance metric for sparse matrices

Merge pull request scikit-learn#2369 from larsmans/no-warnings-in-fs

9033baf

[MRG] remove warnings in univariate feature selection

First pass at a bagging meta-estimator

d52a191

Second pass at bagging meta-estimator

0cb29f3

FIX: _set_oob_score

e37453c

Added tests for regression

89459ec

Rename to Random Patches

056999d

8000

Rename to Random Patches (2)

a656907

WIP: rename to BaggingClassifier/Regressor

1e28d4b

Merge branch 'random-patches' of github.com:glouppe/scikit-learn into…

090029b

… random-patches Conflicts: sklearn/ensemble/__init__.py

FIX: conflicts

b3b8904

DEL: random_patches files

7eb90aa

glouppe added 7 commits August 20, 2013 11:10

Added max_samples, max_features, bootstrap_features

855db58

Cosmits

7111752

Check parameters

d9b06e4

WIP: improved tests

839afab

WIP: add some more tests

f171f4e

WIP: test oob scores

91b56ef

Added docstrings

e9c6716

glouppe added 2 commits August 20, 2013 15:44

Typo

a1f1708

FIX: tests

85ebacb

glouppe added 2 commits August 20, 2013 16:02

FIX: tests (?)

5247696

Typo

1af446f

glouppe closed this Aug 20, 2013

glouppe mentioned this pull request Aug 20, 2013

[MRG] Bagging meta-estimator #2375

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Bagging ensemble meta-estimator #2198

[WIP] Bagging ensemble meta-estimator #2198

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] Bagging ensemble meta-estimator #2198

[WIP] Bagging ensemble meta-estimator #2198

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!