10000 DOC use Ames housing for transformed_target example by lucyleeow · Pull Request #16741 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC use Ames housing for transformed_target example #16741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 14, 2020

Conversation

lucyleeow
Copy link
Member

Towards #16155

Use Ames housing data for plot_transformed_target.py.

Old plots:
image
image

New plots:
image

Hopefully n_quantiles I used is reasonable. Ames data has 1460 samples.

@lucyleeow
Copy link
Member Author

doc-min-dependencies is failing because the pad parameter of matplotlib.axes.Axes.set_title was introduced in matplotlib 2.2.0 whereas the min-dep env uses matplotlib 2.1.1

An alternative is just to use

ax1.text(s='Ridge regression \n with target transformation', x=-5e4, y=8e5, fontsize=12, multialignment='center')

instead. Though I note that the matplotlib recommended way to add a title to subplot is with matplotlib.axes.Axes.set_title.

@ogrisel
Copy link
Member
ogrisel commented Mar 22, 2020

This looks good. Personally I think it's more common to have y_pred on the x axis and y_true on the y axis for the scatter plot.

Could you please add a residual plot?

  • y_pred - y_true on the y axis
  • y_pred on the x axis.

I expect the residual plot without the TargetTransform to be "reverse-smile"/banana shaped which is a bad sign. With the target quantile transform, the banana should go away which means that the new model has a better fit.

However one should observe heteroschedastic noise on the residual plots (larger residuals absolute values for larger y_pred) which means that the least square loss modeling assumption are not meant. This hints that a better model would expect the variance of the residuals to increase with the expected mean value (y_pred). This could probably be better modeled via a Tweedie loss with p in range [1, 2].

@ogrisel
Copy link
Member
ogrisel commented Mar 22, 2020

Actually my second point on heteroschedastic noise is not that obvious with the Ames dataset. Maybe leave that analysis out. I would still love to see the residual plots :)

ax0.set_ylabel('Target predicted')
ax0.set_xlabel('True Target')
ax0.set_title('Ridge regression \n without target transformation')
ax0.text(1, 9, r'$R^2$=%.2f, MAE=%.2f' % (
ax0.set_title('Ridge regression \n without target transformation', pad=18)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pad keyword argument is causing the doc build to fail with older, yet supported versions of matplotlib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see my comment: #16741 (comment)

I don't understand why people can't seem to see my comments on PRs - this is the second time this has happened! Do you think I changed some setting accidentally?

@lucyleeow
Copy link
Member Author

@ogrisel does this look okay?

image

@lucyleeow
Copy link
Member Author

whoops, wrong x axis!

image

@ogrisel

@lucyleeow
Copy link
Member Author

ping @ogrisel

@cmarmo
Copy link
Contributor
cmarmo commented May 5, 2020

Hi @lucyleeow rendering has some issues:

@lucyleeow
Copy link
Member Author

Thanks @cmarmo, I think i've fixed the plot problems!

@glemaitre
Copy link
Member

The banana went away, that's cool :)

@glemaitre glemaitre merged commit 78a213b into scikit-learn:master May 14, 2020
@glemaitre
Copy link
Member

Thanks @lucyleeow

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
@lucyleeow lucyleeow deleted the doc_trans_target branch October 21, 2023 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0