8000 FEA Add DecisionBoundaryDisplay by thomasjpfan · Pull Request #16061 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Add DecisionBoundaryDisplay #16061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Mar 29, 2022

Conversation

thomasjpfan
Copy link
Member
@thomasjpfan thomasjpfan commented Jan 8, 2020

This PR adds BoundaryDecisionDisplay to the inspection module.

What does this implement/fix? Explain your changes.

Many examples were touched because they can take advantage of the new function.

@jnothman
Copy link
Member
jnothman commented Jan 8, 2020 via email

@thomasjpfan
Copy link
Member Author

I had the same discussion with @amueller a few months ago regarding including this plot function. I think it is useful only in teaching, but we do a bunch of teaching in our examples.

@amueller
Copy link
Member
amueller commented Jan 9, 2020

and it will make life much easier for those teaching with sklearn (though that's admittedly a slippery slope). It certainly declutters the examples.

@amueller
Copy link
Member
amueller commented Jan 9, 2020

supersedes #5070

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this.

Not just useful for teaching but also for people learning / experimenting on their own, which represent a huge part of our users I believe

A few comments but looks good.

@thomasjpfan There seem however to be a significant difference between the plots in the examples eg

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
versus
https://88180-843222-gh.circle-artifacts.com/0/doc/auto_examples/classification/plot_classifier_comparison.html

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise, I think this completely fine. Just some questions regarding the API, the location of the function, and what we expect when we are using it.

estimator : estimator instance
Trained estimator.

X : array-like of shape (n_samples, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are passing X, maybe we could have an option to make the scatter plot at the same time and to set the x_min/x_max/y_min/y_max limits. Basically, I think that most of the time, we plotting both the boundary and the scatter plot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this at the start, but I felt like it inflated the API. As in, we may need a scater_kwargs in the plot_decisions_boundary.

In its current form, plotting the scatter plot isn't too bad:

fig, ax = plt.subplots()
plot_decision_boundary(est, X, y, ax=ax)
ax.scatter(X[:, 0], X[:, 1], ...)

It's common to want to adjust the alpha when there are too many points (or subsample).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be fine to have scatter_kwargs in fact, I would like to have to write the minimum lines of code, especially xlim, ylim, xlabel, ylabel, alpha, etc., for the largest number of our examples. Then, when one wants to fine-tune it will come back as efficient to write some matplotlib.

Axes object to plot on. If `None`, a new figure and axes is
created.

**kwargs : dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking that we should have a way to decorate xlabel and ylabel if we are passing a dataframe using the column names.

@pytest.fixture(scope="module")
def data():
X, y = make_classification(n_informative=1, n_redundant=1,
n_clusters_per_class=1, n_features=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n_clusters_per_class=1, n_features=2)
n_clusters_per_class=1, n_features=2
random_state=42)

sklearn/setup.py Outdated
@@ -39,6 +39,8 @@ def configuration(parent_package='', top_path=None):
config.add_subpackage('impute/tests')
config.add_subpackage('inspection')
config.add_subpackage('inspection/tests')
config.add_subpackage('inspection/_plot')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary because I added a setup.py in the inspection module.

@amueller
Copy link
Member

CI failing

@thomasjpfan
Copy link
Member Author

There were 2 approvals, I fixed the conflicts and pushed, let's see what happens

I'm giving this PR another pass. I see there are still some comments left to address.

@thomasjpfan
Copy link
Member Author

I updated this PR addressing most of the remaining concerns. The remaining comment is to support pos_label #16061 (comment) which can be done a follow up PR.

@thomasjpfan thomasjpfan force-pushed the plot_decision_bound branch from 4cc6c51 to fdbd493 Compare March 7, 2022 19:49
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM The PR still looks good.

F987
Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The resolution of the grid seems a little bit lower than the current one in several examples, which makes the boundary a little bit more dented but it's a really subtle difference.

I agree with what was said previously, adding the possibility to pass the pos_label can be done in a separate PR.

Does anyone have second thoughts before we merge ?

@jeremiedbb jeremiedbb merged commit d400723 into scikit-learn:main Mar 29, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0