You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No this issue need discussion before starting the implementation. You could look for issues with "help wanted" and "good first issue" labels, there are also a number of PRs marked as "stalled" that could be continued.
Currently most of the plotting tools available in scikit-learn are related to classification (https://scikit-learn.org/stable/modules/classes.html#id3). It would be good to add more visualizers for regressions.
Plotting y_true as a function of y_pred, or the residuals, as done frequently is simple enough and probably doesn't need a dedicated plot function.
Other approaches worth considering were discussed in #10084,
Lorentz curves, and the associated Gini coefficient computed as AUC: see example in Minimal Generalized linear models implementation (L2 + lbfgs) #14300. This was earlier proposed Added gini coefficient to ranking and scorer #10084 and more recently in [WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176 (and in particular a608c70). It's more of a ranking metric than a regression metric.
The name varies, but the idea is to select a numerical feature, bin it, and compare the mean of y_true with the mean of y_pred for each bin. This allows to evaluate how well the model performs for different sub-categories of data. Example in https://95549-843222-gh.circle-artifacts.com/0/doc/auto_examples/linear_model/plot_poisson_regression_non_normal_loss.html Somewhat related to partial dependence plots.
There are a few additional ones in yellowbrick, I'm not convinced they would be that useful in scikit-learn.
Not saying we need all of them, but just wanted to move this discussion from outside the GLM PR.
The text was updated successfully, but these errors were encountered: