-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FEA Add DecisionBoundaryDisplay #16061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Add DecisionBoundaryDisplay #16061
Conversation
I appreciate the benefit to reduce code duplication in our examples, but is
this useful outside of teaching?
|
I had the same discussion with @amueller a few months ago regarding including this plot function. I think it is useful only in teaching, but we do a bunch of teaching in our examples. |
and it will make life much easier for those teaching with sklearn (though that's admittedly a slippery slope). It certainly declutters the examples. |
supersedes #5070 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this.
Not just useful for teaching but also for people learning / experimenting on their own, which represent a huge part of our users I believe
A few comments but looks good.
@thomasjpfan There seem however to be a significant difference between the plots in the examples eg
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
versus
https://88180-843222-gh.circle-artifacts.com/0/doc/auto_examples/classification/plot_classifier_comparison.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise, I think this completely fine. Just some questions regarding the API, the location of the function, and what we expect when we are using it.
estimator : estimator instance | ||
Trained estimator. | ||
|
||
X : array-like of shape (n_samples, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are passing X
, maybe we could have an option to make the scatter
plot at the same time and to set the x_min/x_max/y_min/y_max limits. Basically, I think that most of the time, we plotting both the boundary and the scatter plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included this at the start, but I felt like it inflated the API. As in, we may need a scater_kwargs
in the plot_decisions_boundary
.
In its current form, plotting the scatter plot isn't too bad:
fig, ax = plt.subplots()
plot_decision_boundary(est, X, y, ax=ax)
ax.scatter(X[:, 0], X[:, 1], ...)
It's common to want to adjust the alpha when there are too many points (or subsample).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be fine to have scatter_kwargs
in fact, I would like to have to write the minimum lines of code, especially xlim
, ylim
, xlabel
, ylabel
, alpha
, etc., for the largest number of our examples. Then, when one wants to fine-tune it will come back as efficient to write some matplotlib.
Axes object to plot on. If `None`, a new figure and axes is | ||
created. | ||
|
||
**kwargs : dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also thinking that we should have a way to decorate xlabel and ylabel if we are passing a dataframe using the column names.
@pytest.fixture(scope="module") | ||
def data(): | ||
X, y = make_classification(n_informative=1, n_redundant=1, | ||
n_clusters_per_class=1, n_features=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_clusters_per_class=1, n_features=2) | |
n_clusters_per_class=1, n_features=2 | |
random_state=42) |
sklearn/setup.py
Outdated
@@ -39,6 +39,8 @@ def configuration(parent_package='', top_path=None): | |||
config.add_subpackage('impute/tests') | |||
config.add_subpackage('inspection') | |||
config.add_subpackage('inspection/tests') | |||
config.add_subpackage('inspection/_plot') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary because I added a setup.py
in the inspection
module.
CI failing |
I'm giving this PR another pass. I see there are still some comments left to address. |
I updated this PR addressing most of the remaining concerns. The remaining comment is to support |
4cc6c51
to
fdbd493
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM The PR still looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The resolution of the grid seems a little bit lower than the current one in several examples, which makes the boundary a little bit more dented but it's a really subtle difference.
I agree with what was said previously, adding the possibility to pass the pos_label can be done in a separate PR.
Does anyone have second thoughts before we merge ?
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
This PR adds
BoundaryDecisionDisplay
to the inspection module.What does this implement/fix? Explain your changes.
Many examples were touched because they can take advantage of the new function.