8000 [MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor by lodurality · Pull Request #9368 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 16, 2017

Conversation

lodurality
Copy link
Contributor

Reference Issue

Addresses #3846

What does this implement/fix? Explain your changes.

Adds examples for RandomForestClassifier and RandomForestRegressor.

Any other comments?

@agramfort
Copy link
Member

LGTM

@agramfort agramfort changed the title Added examples to RandomForestClassifier and RandomForestRegressor [MRG+1] Added examples to RandomForestClassifier and RandomForestRegressor Jul 15, 2017
@amueller
Copy link
Member
amueller commented Jul 15, 2017

Looks ok. maybe a datasets where it's a bit more clear what's learned would be nice? Like the feature importances and predictions are not super obvious, are they? But I'm also fine to merge as-is.

@lodurality
Copy link
Contributor Author

@amueller I choose feature importances because in my experience people use this attribute pretty often. But I agree on clearness, for classifier I used XOR train, but regressor data is pretty random. I can add data generation to example to make data intuition more clear. Also, if it is okay to use data from datasets (like Boston), it can make example more intuitive (and bigger).

@amueller
Copy link
Member

It was not really clear to me why the feature importances are so different for the XOR datasets. Maybe it's just because it's so few trees? Still a bit odd... also @jmschrei pointed out they don't sum to one, which was confusing both of us, but that's unrelated I guess?

@jmschrei
Copy link
Member
jmschrei commented Jul 15, 2017

I think you should avoid using the XOR dataset because it's an example where trees traditionally do fairly poorly. Testing it out, there is an issue with using too many trees. We actually encountered a variant of this bug in #7406. Try just using sklearn.datasets.make_classification to make some dummy datasets, with n_informative less than n_features to show some features are not good.

@lodurality
Copy link
Contributor Author

Yes, I agree it's strange. I've added one point to the examples and now feature importances sum to 1. I can commit this or create an example with Boston dataset for regression and iris data for classification.

@lodurality
Copy link
Contributor Author

@jmschrei Okay.

min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=0, verbose=0, warm_start=False)
>>> print(clf.feature_importances_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean i don't want to overcomplicate this, but which are the important features? I think we can either get that information out of the make_classification or no shuffle the features so we know? And then maybe set something in the RF so that it's actually visible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The informative features allegedly always come first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked it to make feature importance more clear (removed shuffle, tweaked parameters).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you turn off shuffle, I mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10000

Yes, I meant set shuffle to False.

@jmschrei
Copy link
Member

This LGTM as well. Just making sure the tests pass and I'll merge. Thanks!

@amueller
Copy link
Member

lgtm, thanks :)

@amueller amueller changed the title [MRG+1] Added examples to RandomForestClassifier and RandomForestRegressor [MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor Jul 15, 2017
@NelleV NelleV merged commit 92e38f2 into scikit-learn:master Jul 16, 2017
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…essor (scikit-learn#9368)

* added examples to RandomForestClassifier and RandomForestRegressor

* changed example for RandomForestClassifier using make_classification

* changed example for RandomForestRegressor using make_regression

* made more clear which features are important in examples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0