FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405

kyouma · 2020-09-15T20:29:36Z

Although the docstring and the API guide of sklearn.metrics.plot_confusion_matrix() say about the labels argument the following: "If 'None' is given, those that appear at least once in 'y_true' or 'y_pred' are used in sorted order", the estimator.classes_ field was used.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This change fixes errors when y_true and y_pred doesn't have some values from estimator.classes_.

Any other comments?

Although the docstring and the API guide say "If 'None' is given, those that appear at least once in `y_true` or `y_pred` are used in sorted order", the "estimator.classes_" field was used.

Although the docstring and the API guide say "If 'None' is given, those that appear at least once in 'y_true' or 'y_pred' are used in sorted order", the 'estimator.classes_' field was used.

thomasjpfan

Thank you for the PR!

Please add a non-regression test that would fail at master but pass in this PR.

A test for plot_confusion_matrix() behaviour when 'labels=None' and the dataset with true labels contains labels previously unseen by the classifier (and therefore not present in its 'classes_') attribute. According to the function description, it must create a union of the predicted labels and the true labels.

An update to the 'test_error_on_a_dataset_with_unseen_labels()' function to fix 'E501 line too long' errors.

kyouma · 2020-09-17T21:51:59Z

Thank you for the PR!

Please add a non-regression test that would fail at master but pass in this PR.

Thank you for the review, @thomasjpfan! This is my first pull request, I will try to do my best to implement and prepare everything correctly.

I have added the test test_error_on_a_dataset_with_unseen_labels() that checks tick labels of the confusion matrix plot.
In iPython console matplotlib doesn't throw exceptions on this test, so I had to add this check. And in Jupyter Notebook the very call of plot_confusion_matrix() would raise the exception ValueError("The number of FixedLocator locations (...), usually from a call to set_ticks, does not match the number of ticklabels (...).").
The updated plot_confusion_matrix() function is intended to pass this test.

thomasjpfan · 2020-09-24T18:03:28Z

sklearn/metrics/_plot/confusion_matrix.py

+                raise TypeError(
+                    f"Labels in y_true and y_pred should be of the same type. "
+                    f"Got y_true={np.unique(y_true)} and "
+                    f"y_pred={np.unique(y_pred)}. Make sure that the "
+                    f"predictions provided by the classifier coincides with "
+                    f"the true labels."


Do we have a test to make sure this error is raised?

I have removed the Try-Except wrapping as the function confusion_matrix(), which is used above to get the matrix itself, contains the same unique_labels() call that was wrapped by the Try-Except block, and the function unique_labels() raises an exception with a description when the true and predicted labels have different types. So if the execution arrives at the line, it will not make any problems.

thomasjpfan · 2020-09-24T18:04:52Z

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

+                                 labels=None, display_labels=None)
+
+    disp_labels = set([tick.get_text() for tick in disp.ax_.get_xticklabels()])
+    expected_labels = unique_labels(y, fitted_clf.predict(X))


In this case, we can list the labels:

display_labels = [tick.get_text() for tick in disp.ax_.get_xticklabels()] expected_labels = [f'{i}' for range(6)] assert_array_equal(expected_labels, display_labels)

Thank you, I have replaced these lines and the assertion check with your code.

This Try-Catch is not necessary, as the same unique_labels() function is called inside confusion_matrix() above and raises an exception with a description if the types of true and predicted labels differ.

…seen_labels()

thomasjpfan

Please add an entry to the change log at doc/whats_new/v0.24.rst with tag |Fix|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

thomasjpfan · 2020-09-24T18:44:46Z

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

@@ -314,3 +315,16 @@ def test_default_labels(pyplot, display_labels, expected_labels):

    assert_array_equal(x_ticks, expected_labels)
    assert_array_equal(y_ticks, expected_labels)
+
+
+def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data):


We may need to wrap this to be <= 79:

Suggested change

def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data):

def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data, n_classes):

…seen_labels() - Replaced the assertion check - Removed the unused import

Mentioned the PR scikit-learn#18405.

The `labels` and `display_labels` parameters have been set to thier default values. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

kyouma · 2020-09-24T20:04:05Z

Thank you very much, I have implemented your suggestions and corrections. I have also added the |Fix| entry to doc/whats_new/v0.24.rst.

thomasjpfan

Minor comments, otherwise LGTM

doc/whats_new/v0.24.rst

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

…nfusion_matrix.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

kyouma · 2020-10-01T15:06:43Z

Minor comments, otherwise LGTM

I have applied the suggested changes. Thank you for your guidance!

glemaitre

LGTM. I will merge when the CIs will turn green.

…nstead of estimator.classes_ (scikit-learn#18405) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Fix for the "labels=None" behaviour

0726bc7

Although the docstring and the API guide say "If 'None' is given, those that appear at least once in `y_true` or `y_pred` are used in sorted order", the "estimator.classes_" field was used.

10000

github-actions bot added the module:metrics label Sep 15, 2020

FIX for plot_confusion_matrix() when 'labels' is equal to 'None'

63ef490

Although the docstring and the API guide say "If 'None' is given, those that appear at least once in 'y_true' or 'y_pred' are used in sorted order", the 'estimator.classes_' field was used.

thomasjpfan reviewed Sep 17, 2020

View reviewed changes

kyouma added 4 commits September 17, 2020 23:21

Update 'test_error_on_a_dataset_with_unseen_labels()'

174a428

An update to the 'test_error_on_a_dataset_with_unseen_labels()' function to fix 'E501 line too long' errors.

Fixes for test_error_on_a_dataset_with_unseen_labels()

885f126

Fix for test_error_on_a_dataset_with_unseen_labels()

8f30e58

kyouma requested a review from thomasjpfan September 17, 2020 21:53

thomasjpfan reviewed Sep 24, 2020

View reviewed changes

kyouma added 3 commits September 24, 2020 21:33

Remove unnecessary try-catch

71b21ea

This Try-Catch is not necessary, as the same unique_labels() function is called inside confusion_matrix() above and raises an exception with a description if the types of true and predicted labels differ.

Improvement of the assertion check in test_error_on_a_dataset_with_un…

8e0fd3d

…seen_labels()

Improvement of the assertion check in test_error_on_a_dataset_with_un…

525d4ed

…seen_labels()

thomasjpfan reviewed Sep 24, 2020

View reviewed changes

kyouma and others added 4 commits September 24, 2020 21:45

Improvement of the assertion check in test_error_on_a_dataset_with_un…

9b5d36c

…seen_labels() - Replaced the assertion check - Removed the unused import

Improvements and fixes for test_error_on_a_dataset_with_unseen_labels()

c69e97f

Mentioned the fix for plot_confusion_matrix()

d2a0bf9

Mentioned the PR scikit-learn#18405.

Improvement for test_error_on_a_dataset_with_unseen_labels()

4e51c43

The `labels` and `display_labels` parameters have been set to thier default values. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

kyouma requested a review from thomasjpfan October 1, 2020 11:43

thomasjpfan approved these changes Oct 1, 2020

View reviewed changes

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py Outdated Show resolved Hide resolved

kyouma and others added 3 commits October 1, 2020 16:28

Minor updates for doc/whats_new/v0.24.rst

a72470f

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Minor updates for doc/whats_new/v0.24.rst

908eb5d

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Minor update for comments in sklearn/metrics/_plot/tests/test_plot_co…

fd1024a

…nfusion_matrix.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

cmarmo added the Waiting for Reviewer label Oct 8, 2020

glemaitre self-requested a review October 21, 2020 13:19

glemaitre added 2 commits October 21, 2020 15:52

nitpicks

deee9c4

PEP8

0bb7c8b

glemaitre approved these changes Oct 21, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into pr/kyouma/18405

7d288a4

glemaitre changed the title ~~FIX for the behaviour of plot_confusion_matrix() with the argument 'labels' equal to 'None'~~ FIX use unique values of y_true and y_pred in plot_confusion_matrix() instead of estimator.classes_ Oct 21, 2020

glemaitre changed the title ~~FIX use unique values of y_true and y_pred in plot_confusion_matrix() instead of estimator.classes_~~ FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ Oct 21, 2020

glemaitre merged commit 90b9b5d into scikit-learn:master Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405

FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405

	def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data):
	def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data, n_classes):

FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405

FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment