-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Warning
This is a request for comment (RFC), please do not open a pull request for this issue. If you wish to contribute to scikit-learn please have a look at our contributing doc and in particular the section New contributors.
Noticed while reviewing #33015. Also related to #32866
#31553 reverted the plotting behaviour when plot_method='contour' such that only thresholded integers are used plotting, for all plot methods, not continuous proba/decision function values.
This means that the contour lines show where the predicted class changes from one to the other:
instead of contour lines showing how continuous predicted values change, over the region where that class dominates:
(figures taken from #31546)
The problem is that without setting levels mpl infers number of levels, 9 seems to be the default, resulting in a lot of overlapping lines.
Details
The contourf docs aren't very helpful - it says levels is optional but does not tell you what the default is. Digging into the code, it seems that _process_contour_level_args is responsible. It calls _ensure_locator_exists which does:
if self.locator is None:
if self.logscale:
self.locator = ticker.LogLocator(numticks=N)
else:
if N is None:
N = 7 # Hard coded default
self.locator = ticker.MaxNLocator(N + 1, min_n_ticks=1)which seems to call MaxNLocator, with n_bins=7+1, which gives max 9 (n_bins + 1) ticks. This does seem to be what happens:
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegression
iris = load_iris()
X = iris.data[:, :2]
classifier = LogisticRegression().fit(X, iris.target)
# Create the contour plot
disp = DecisionBoundaryDisplay.from_estimator(
classifier,
X,
plot_method='contour',
xlabel=iris.feature_names[0],
ylabel=iris.feature_names[1],
)
# Access the matplotlib axes and contour object
ax = disp.ax_
contour = disp.surface_
print(f"Number of classes: {len(iris.target_names)}")
print(f"Number of contour levels: {len(contour.levels)}")
print(f"Contour levels (decision values): {contour.levels}")Number of classes: 3
Number of contour levels: 9
Contour levels (decision values): [0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2. ]
The other problem is that we pass n_classes colors, but this does not match the number of levels (the passed colours are used to colour each level). In the case above, we pass 3 colours but there are 9 levels and from the docs:
The sequence is cycled for the levels in ascending order. If the sequence is shorter than the number of levels, it's repeated.
Thus the colours don't make sense and don't really correspond to any individual class.
Not sure what the best solution would be. We could specify levels such that we get n_classes levels, which would be a great improvement already. The question of what colour(s) the lines should be is more difficult. e.g., for the case above with 3 classes and levels at [1,2,3], 2 lines are drawn:
Code
from sklearn.datasets import load_iris
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegression
iris = load_iris()
X = iris.data[:, :2]
classifier = LogisticRegression().fit(X, iris.target)
# Create the contour plot
disp = DecisionBoundaryDisplay.from_estimator(
classifier,
X,
plot_method='contour',
xlabel=iris.feature_names[0],
ylabel=iris.feature_names[1],
levels=[0,1,2],
)one for level 0: the transition from 0 -> 1 / 0 -> 2 and one for level 1: the transition from 1 -> 2 (the level 1 line also draws a line from 0 -> 2, but I think that is due to interpolation of the 0 -> 2 transition being 1 ? - this is the part where there are both blue and orange lines). The line colours don't really match to a single class, mostly because we have 2 lines for 3 classes.
Maybe it would be best to just draw contour lines in just one colour?
Related: #33015 (comment)