Description
Currently most of the plotting tools available in scikit-learn are related to classification (https://scikit-learn.org/stable/modules/classes.html#id3). It would be good to add more visualizers for regressions.
Plotting y_true as a function of y_pred, or the residuals, as done frequently is simple enough and probably doesn't need a dedicated plot function.
Other approaches worth considering were discussed in #10084,
-
Lorentz curves, and the associated Gini coefficient computed as AUC: see example in Minimal Generalized linear models implementation (L2 + lbfgs) #14300. This was earlier proposed Added gini coefficient to ranking and scorer #10084 and more recently in [WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176 (and in particular a608c70). It's more of a ranking metric than a regression metric.
-
The name varies, but the idea is to select a numerical feature, bin it, and compare the mean of y_true with the mean of y_pred for each bin. This allows to evaluate how well the model performs for different sub-categories of data. Example in https://95549-843222-gh.circle-artifacts.com/0/doc/auto_examples/linear_model/plot_poisson_regression_non_normal_loss.html Somewhat related to partial dependence plots.
There are a few additional ones in yellowbrick, I'm not convinced they would be that useful in scikit-learn.
Not saying we need all of them, but just wanted to move this discussion from outside the GLM PR.
Metadata
Metadata
Assignees
Type
Projects
Status