RFC DecisionBoundaryDisplay levels overlap with 'contour'

@AnneBeyer

Warning

This is a request for comment (RFC), please do not open a pull request for this issue. If you wish to contribute to scikit-learn please have a look at our contributing doc and in particular the section New contributors.

Noticed while reviewing #33015. Also related to #32866

#31553 reverted the plotting behaviour when plot_method='contour' such that only thresholded integers are used plotting, for all plot methods, not continuous proba/decision function values.

This means that the contour lines show where the predicted class changes from one to the other:

instead of contour lines showing how continuous predicted values change, over the region where that class dominates:

(figures taken from #31546)

The problem is that without setting levels mpl infers number of levels, 9 seems to be the default, resulting in a lot of overlapping lines.

Details

The contourf docs aren't very helpful - it says levels is optional but does not tell you what the default is. Digging into the code, it seems that _process_contour_level_args is responsible. It calls _ensure_locator_exists which does:

        if self.locator is None:
            if self.logscale:
                self.locator = ticker.LogLocator(numticks=N)
            else:
                if N is None:
                    N = 7  # Hard coded default
                self.locator = ticker.MaxNLocator(N + 1, min_n_ticks=1)

which seems to call MaxNLocator, with n_bins=7+1, which gives max 9 (n_bins + 1) ticks. This does seem to be what happens:

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X = iris.data[:, :2]
classifier = LogisticRegression().fit(X, iris.target)

# Create the contour plot
disp = DecisionBoundaryDisplay.from_estimator(
    classifier,
    X,
    plot_method='contour',
    xlabel=iris.feature_names[0],
    ylabel=iris.feature_names[1],
)

# Access the matplotlib axes and contour object
ax = disp.ax_
contour = disp.surface_

print(f"Number of classes: {len(iris.target_names)}")
print(f"Number of contour levels: {len(contour.levels)}")
print(f"Contour levels (decision values): {contour.levels}")

Number of classes: 3
Number of contour levels: 9
Contour levels (decision values): [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]

The other problem is that we pass n_classes colors, but this does not match the number of levels (the passed colours are used to colour each level). In the case above, we pass 3 colours but there are 9 levels and from the docs:

The sequence is cycled for the levels in ascending order. If the sequence is shorter than the number of levels, it's repeated.

Thus the colours don't make sense and don't really correspond to any individual class.

Not sure what the best solution would be. We could specify levels such that we get n_classes levels, which would be a great improvement already. The question of what colour(s) the lines should be is more difficult. e.g., for the case above with 3 classes and levels at [1,2,3], 2 lines are drawn:

Code

from sklearn.datasets import load_iris
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X = iris.data[:, :2]
classifier = LogisticRegression().fit(X, iris.target)

# Create the contour plot
disp = DecisionBoundaryDisplay.from_estimator(
    classifier,
    X,
    plot_method='contour',
    xlabel=iris.feature_names[0],
    ylabel=iris.feature_names[1],
    levels=[0,1,2],
)

one for level 0: the transition from 0 -> 1 / 0 -> 2 and one for level 1: the transition from 1 -> 2 (the level 1 line also draws a line from 0 -> 2, but I think that is due to interpolation of the 0 -> 2 transition being 1 ? - this is the part where there are both blue and orange lines). The line colours don't really match to a single class, mostly because we have 2 lines for 3 classes.

Maybe it would be best to just draw contour lines in just one colour?

Related: #33015 (comment)

cc @AnneBeyer @ogrisel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC `DecisionBoundaryDisplay` levels overlap with 'contour' #33108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC DecisionBoundaryDisplay levels overlap with 'contour' #33108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

RFC `DecisionBoundaryDisplay` levels overlap with 'contour' #33108