[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

lodurality · 2017-07-15T19:28:38Z

Reference Issue

Addresses #3846

What does this implement/fix? Explain your changes.

Adds examples for RandomForestClassifier and RandomForestRegressor.

Any other comments?

agramfort · 2017-07-15T19:48:27Z

LGTM

amueller · 2017-07-15T20:06:58Z

Looks ok. maybe a datasets where it's a bit more clear what's learned would be nice? Like the feature importances and predictions are not super obvious, are they? But I'm also fine to merge as-is.

lodurality · 2017-07-15T20:20:04Z

@amueller I choose feature importances because in my experience people use this attribute pretty often. But I agree on clearness, for classifier I used XOR train, but regressor data is pretty random. I can add data generation to example to make data intuition more clear. Also, if it is okay to use data from datasets (like Boston), it can make example more intuitive (and bigger).

amueller · 2017-07-15T20:21:42Z

It was not really clear to me why the feature importances are so different for the XOR datasets. Maybe it's just because it's so few trees? Still a bit odd... also @jmschrei pointed out they don't sum to one, which was confusing both of us, but that's unrelated I guess?

jmschrei · 2017-07-15T20:34:17Z

I think you should avoid using the XOR dataset because it's an example where trees traditionally do fairly poorly. Testing it out, there is an issue with using too many trees. We actually encountered a variant of this bug in #7406. Try just using sklearn.datasets.make_classification to make some dummy datasets, with n_informative less than n_features to show some features are not good.

lodurality · 2017-07-15T20:38:31Z

Yes, I agree it's strange. I've added one point to the examples and now feature importances sum to 1. I can commit this or create an example with Boston dataset for regression and iris data for classification.

lodurality · 2017-07-15T20:39:13Z

@jmschrei Okay.

amueller · 2017-07-15T21:10:54Z

sklearn/ensemble/forest.py

+                min_samples_leaf=1, min_samples_split=2,
+                min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
+                oob_score=False, random_state=0, verbose=0, warm_start=False)
+    >>> print(clf.feature_importances_)


I mean i don't want to overcomplicate this, but which are the important features? I think we can either get that information out of the make_classification or no shuffle the features so we know? And then maybe set something in the RF so that it's actually visible?

The informative features allegedly always come first.

Reworked it to make feature importance more clear (removed shuffle, tweaked parameters).

Assuming you turn off shuffle, I mean.

Yes, I meant set shuffle to False.

jmschrei · 2017-07-15T21:28:02Z

This LGTM as well. Just making sure the tests pass and I'll merge. Thanks!

amueller · 2017-07-15T21:28:23Z

lgtm, thanks :)

…essor (scikit-learn#9368) * added examples to RandomForestClassifier and RandomForestRegressor * changed example for RandomForestClassifier using make_classification * changed example for RandomForestRegressor using make_regression * made more clear which features are important in examples

added examples to RandomForestClassifier and RandomForestRegressor

b5b4459

agramfort changed the title ~~Added examples to RandomForestClassifier and RandomForestRegressor~~ [MRG+1] Added examples to RandomForestClassifier and RandomForestRegressor Jul 15, 2017

changed example for RandomForestClassifier using make_classification

cf4e90d

amueller reviewed Jul 15, 2017

View reviewed changes

lodurality added 2 commits July 15, 2017 16:14

changed example for RandomForestRegressor using make_regression

d2311e4

made more clear which features are important in examples

e8ce7de

amueller changed the title ~~[MRG+1] Added examples to RandomForestClassifier and RandomForestRegressor~~ [MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor Jul 15, 2017

NelleV merged commit 92e38f2 into scikit-learn:master Jul 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

[MRG+3] Added examples to RandomForestClassifier and RandomForestRegressor #9368

Uh oh!

Conversation

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!