-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] Bagging ensemble meta-estimator #2198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For me, |
for estimator in self.estimators_: | ||
mask = np.ones(n_samples, dtype=np.bool) | ||
mask[estimator.indices_] = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some comments on what's going on below - seems like you have two cases depending on whether or not estimator supports predict_proba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, if predict_proba
is not supported, I make the base estimators to vote. I'll add some comments to clarify things.
code looks good to me - documentation is missing - would be great if you could add an example and/or incorporate RandomPatch into one of our examples |
If think the
|
how about |
actually maybe |
Fully agreed.
Sounds good. Or maybe BaggingClassifier: I don't care about the |
Also the recent paper on Google ad click prediction calls the "feature subsampling" strategy (without replacement) "Feature Bagging". So indeed Bagging ain't what it used to be. Therefore I am ok with abusing the name as well. To sum up I am ok with either |
I think that @glouppe did not like Bagging so we went for |
- Added 'cosine_distances' function to sklearn.metrics.pairwise. - Added 'cosine' as metric in 'pairwise_distances' function. - Corrected doc string of same function, because all metrics based on the 'manhattan_distances' function (i.e. 'cityblock', 'l1', and 'manhattan') do currently NOT support sparse matrices. - Added corresponding corresponding unit test.
Cosine distance metric for sparse matrices
[MRG] remove warnings in univariate feature selection
… random-patches Conflicts: sklearn/ensemble/__init__.py
Ping. Just to let you know, I am making progress on this. I have renamed the classes to The only things left are the narrative documentation and writing an example :) |
Also, I have rebased on top of master, but the history in the pull request seems to be kinda screwed up :s It contains dupplicate or unrelated commits. Any guess on how to clean that? |
Yeah, rebasing confuses git. Unfortunately, the only way out of this, |
Hi,
This is a very early PR for a meta-estimator ensemble implementing ensemble averaging/voting. The idea is to make a meta-estimator that can take as input any type of base estimator (not only trees) and make an ensemble out of it. This should work quite well for estimators with high variance (trees, gbrt, neural networks typically).
TODO list:
BaggingClassifier
andBaggingRegressor
.subsampling
hyper-parameter.subsampling_features
hyper-parameter.