-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM |
Looks ok. maybe a datasets where it's a bit more clear what's learned would be nice? Like the feature importances and predictions are not super obvious, are they? But I'm also fine to merge as-is. |
@amueller I choose feature importances because in my experience people use this attribute pretty often. But I agree on clearness, for classifier I used XOR train, but regressor data is pretty random. I can add data generation to example to make data intuition more clear. Also, if it is okay to use data from datasets (like Boston), it can make example more intuitive (and bigger). |
It was not really clear to me why the feature importances are so different for the XOR datasets. Maybe it's just because it's so few trees? Still a bit odd... also @jmschrei pointed out they don't sum to one, which was confusing both of us, but that's unrelated I guess? |
I think you should avoid using the XOR dataset because it's an example where trees traditionally do fairly poorly. Testing it out, there is an issue with using too many trees. We actually encountered a variant of this bug in #7406. Try just using |
Yes, I agree it's strange. I've added one point to the examples and now feature importances sum to 1. I can commit this or create an example with Boston dataset for regression and iris data for classification. |
@jmschrei Okay. |
min_samples_leaf=1, min_samples_split=2, | ||
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, | ||
oob_score=False, random_state=0, verbose=0, warm_start=False) | ||
>>> print(clf.feature_importances_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean i don't want to overcomplicate this, but which are the important features? I think we can either get that information out of the make_classification or no shuffle the features so we know? And then maybe set something in the RF so that it's actually visible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The informative features allegedly always come first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked it to make feature importance more clear (removed shuffle, tweaked parameters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you turn off shuffle, I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant set shuffle to False.
This LGTM as well. Just making sure the tests pass and I'll merge. Thanks! |
lgtm, thanks :) |
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples
Reference Issue
Addresses #3846
What does this implement/fix? Explain your changes.
Adds examples for RandomForestClassifier and RandomForestRegressor.
Any other comments?