-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Describe the workflow you want to enable
While reviewing @AnneBeyer's #33015 PR, I thought it would be nice to be able to display the decision boundaries of a classifier along with the data points with consistent color maps (at least for classifiers, outlier detectors and regressors) instead of having to a manually scatter plot on the same ax instance.
Describe your proposed solution
Add a new show_data boolean kwarg to the from_estimator method that already accepts the X values as argument. The from_estimator method would then call estimator.predict(X) to find the values of the target to be used to set the colors of the dots of the scatter plot.
The from_estimator could also take an optional y argument with true class labels. When provided, the scatter plot enabled from the show_data=True param of from_estimator would use the provided y values instead of using estimator.predict(X).
The __init__ of the display would then also accept the X and y values.
Ideally, the legend would make it explicit whether the scatter dots represent predictions or true labels, and could be overridden by a data_label kwarg passed to from_estimator.
Maybe show_data=True could even be the default.
Describe alternatives you've considered, if relevant
While it's possible to do scatter plot manually, it's quite verbose and not necessarily trivial to understand if the colors used for the two plots are expected to be consistent or not.
I think people often want to overlay the training set with the decision boundary when generating such plots for educational purposes.
Additional context
@AnneBeyer if you think this is a good feature to add, would be interested in contributing it (as a follow-up to #33015 and related PRs)?
Also cc @lucyleeow @glemaitre in case you already discussed this in the past.