-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Rework voting classifier example #30985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lucyleeow
merged 19 commits into
scikit-learn:main
from
ArturoAmorQ:rework_voting_classifier
May 1, 2025
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
41f9e9f
DOC Rework VotingClassifier decision boundaries example
22b8854
Update User Guide accordingly
47553ac
Add redirect from Plot class probabilities example
28be627
Remove Plot class probabilities example
c4f46c9
Apply suggestions from code review
ArturoAmorQ 85dedec
Address comments from ogrisel
6b56208
Merge branch 'main' into rework_voting_classifier
ArturoAmorQ 82b0c91
Add comment on thresholding
47ee0e6
Merge main
d634e77
Fix conflicts
b0455e7
Change example's title and match sphinx ref
822c2e7
Address Olivier comment
ArturoAmorQ 1d352a1
Address Olivier's comment
ArturoAmorQ c56852c
Merge branch 'main' into rework_voting_classifier
lucyleeow fc73e78
fix lint
lucyleeow 9d01032
Apply suggestions from code review
ArturoAmorQ 5bbacd6
Address comments form Lucy
eb7c8f5
Add comment on linear-separability
ec42ff1
Apply suggestions from code review
ArturoAmorQ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,73 +1,218 @@ | ||
""" | ||
================================================== | ||
Plot the decision boundaries of a VotingClassifier | ||
================================================== | ||
=============================================================== | ||
Visualizing the probabilistic predictions of a VotingClassifier | ||
=============================================================== | ||
|
||
.. currentmodule:: sklearn | ||
|
||
Plot the decision boundaries of a :class:`~ensemble.VotingClassifier` for two | ||
features of the Iris dataset. | ||
Plot the predicted class probabilities in a toy dataset predicted by three | ||
different classifiers and averaged by the :class:`~ensemble.VotingClassifier`. | ||
|
||
Plot the class probabilities of the first sample in a toy dataset predicted by | ||
three different classifiers and averaged by the | ||
:class:`~ensemble.VotingClassifier`. | ||
First, three linear classifiers are initialized. Two are spline models with | ||
interaction terms, one using constant extrapolation and the other using periodic | ||
extrapolation. The third classifier is a :class:`~kernel_approximation.Nystroem` | ||
with the default "rbf" kernel. | ||
|
||
First, three exemplary classifiers are initialized | ||
(:class:`~tree.DecisionTreeClassifier`, | ||
:class:`~neighbors.KNeighborsClassifier`, and :class:`~svm.SVC`) and used to | ||
initialize a soft-voting :class:`~ensemble.VotingClassifier` with weights `[2, | ||
1, 2]`, which means that the predicted probabilities of the | ||
:class:`~tree.DecisionTreeClassifier` and :class:`~svm.SVC` each count 2 times | ||
as much as the weights of the :class:`~neighbors.KNeighborsClassifier` | ||
classifier when the averaged probability is calculated. | ||
In the first part of this example, these three classifiers are used to | ||
demonstrate soft-voting using :class:`~ensemble.VotingClassifier` with weighted | ||
average. We set `weights=[2, 1, 3]`, meaning the constant extrapolation spline | ||
model's predictions are weighted twice as much as the periodic spline model's, | ||
and the Nystroem model's predictions are weighted three times as much as the | ||
periodic spline. | ||
|
||
The second part demonstrates how soft predictions can be converted into hard | ||
predictions. | ||
|
||
""" | ||
|
||
# Authors: The scikit-learn developers | ||
# SPDX-License-Identifier: BSD-3-Clause | ||
|
||
from itertools import product | ||
# %% | ||
# We first generate a noisy XOR dataset, which is a binary classification task. | ||
|
||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
import pandas as pd | ||
from matplotlib.colors import ListedColormap | ||
|
||
n_samples = 500 | ||
rng = np.random.default_rng(0) | ||
feature_names = ["Feature #0", "Feature #1"] | ||
common_scatter_plot_params = dict( | ||
cmap=ListedColormap(["tab:red", "tab:blue"]), | ||
edgecolor="white", | ||
linewidth=1, | ||
) | ||
|
||
xor = pd.DataFrame( | ||
np.random.RandomState(0).uniform(low=-1, high=1, size=(n_samples, 2)), | ||
columns=feature_names, | ||
) | ||
noise = rng.normal(loc=0, scale=0.1, size=(n_samples, 2)) | ||
target_xor = np.logical_xor( | ||
xor["Feature #0"] + noise[:, 0] > 0, xor["Feature #1"] + noise[:, 1] > 0 | ||
) | ||
|
||
X = xor[feature_names] | ||
y = target_xor.astype(np.int32) | ||
|
||
ogrisel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
fig, ax = plt.subplots() | ||
ax.scatter(X["Feature #0"], X["Feature #1"], c=y, **common_scatter_plot_params) | ||
ax.set_title("The XOR dataset") | ||
plt.show() | ||
|
||
# %% | ||
# Due to the inherent non-linear separability of the XOR dataset, tree-based | ||
# models would often be preferred. However, appropriate feature engineering | ||
# combined with a linear model can yield effective results, with the added | ||
# benefit of producing better-calibrated probabilities for samples located in | ||
# the transition regions affected by noise. | ||
# | ||
# We define and fit the models on the whole dataset. | ||
|
||
from sklearn import datasets | ||
from sklearn.ensemble import VotingClassifier | ||
from sklearn.inspection import DecisionBoundaryDisplay | ||
from sklearn.neighbors import KNeighborsClassifier | ||
from sklearn.svm import SVC | ||
from sklearn.tree import DecisionTreeClassifier | ||
|
||
# Loading some example data | ||
iris = datasets.load_iris() | ||
X = iris.data[:, [0, 2]] | ||
y = iris.target | ||
|
||
# Training classifiers | ||
clf1 = DecisionTreeClassifier(max_depth=4) | ||
clf2 = KNeighborsClassifier(n_neighbors=7) | ||
clf3 = SVC(gamma=0.1, kernel="rbf", probability=True) | ||
from sklearn.kernel_approximation import Nystroem | ||
from sklearn.linear_model import LogisticRegression | ||
from sklearn.pipeline import make_pipeline | ||
from sklearn.preprocessing import PolynomialFeatures, SplineTransformer, StandardScaler | ||
|
||
clf1 = make_pipeline( | ||
SplineTransformer(degree=2, n_knots=2), | ||
PolynomialFeatures(interaction_only=True), | ||
LogisticRegression(C=10), | ||
) | ||
clf2 = make_pipeline( | ||
SplineTransformer( | ||
degree=2, | ||
n_knots=4, | ||
extrapolation="periodic", | ||
include_bias=True, | ||
), | ||
PolynomialFeatures(interaction_only=True), | ||
LogisticRegression(C=10), | ||
) | ||
clf3 = make_pipeline( | ||
StandardScaler(), | ||
Nystroem(gamma=2, random_state=0), | ||
LogisticRegression(C=10), | ||
) | ||
weights = [2, 1, 3] | ||
eclf = VotingClassifier( | ||
estimators=[("dt", clf1), ("knn", clf2), ("svc", clf3)], | ||
estimators=[ | ||
("constant splines model", clf1), | ||
("periodic splines model", clf2), | ||
("nystroem model", clf3), | ||
], | ||
voting="soft", | ||
weights=[2, 1, 2], | ||
weights=weights, | ||
) | ||
|
||
clf1.fit(X, y) | ||
clf2.fit(X, y) | ||
clf3.fit(X, y) | ||
eclf.fit(X, y) | ||
|
||
# Plotting decision regions | ||
f, axarr = plt.subplots(2, 2, sharex="col", sharey="row", figsize=(10, 8)) | ||
for idx, clf, tt in zip( | ||
# %% | ||
# Finally we use :class:`~inspection.DecisionBoundaryDisplay` to plot the | ||
# predicted probabilities. By using a diverging colormap (such as `"RdBu"`), we | ||
# can ensure that darker colors correspond to `predict_proba` close to either 0 | ||
# or 1, and white corresponds to `predict_proba` of 0.5. | ||
|
||
from itertools import product | ||
|
||
from sklearn.inspection import DecisionBoundaryDisplay | ||
|
||
fig, axarr = plt.subplots(2, 2, sharex="col", sharey="row", figsize=(10, 8)) | ||
for idx, clf, title in zip( | ||
product([0, 1], [0, 1]), | ||
[clf1, clf2, clf3, eclf], | ||
["Decision Tree (depth=4)", "KNN (k=7)", "Kernel SVM", "Soft Voting"], | ||
[ | ||
"Splines with\nconstant extrapolation", | ||
"Splines with\nperiodic extrapolation", | ||
"RBF Nystroem", | ||
"Soft Voting", | ||
], | ||
): | ||
DecisionBoundaryDisplay.from_estimator( | ||
clf, X, alpha=0.4, ax=axarr[idx[0], idx[1]], response_method="predict" | ||
disp = DecisionBoundaryDisplay.from_estimator( | ||
clf, | ||
X, | ||
response_method="predict_proba", | ||
plot_method="pcolormesh", | ||
cmap="RdBu", | ||
alpha=0.8, | ||
ax=axarr[idx[0], idx[1]], | ||
) | ||
axarr[idx[0], idx[1]].scatter( | ||
X["Feature #0"], | ||
X["Feature #1"], | ||
c=y, | ||
**common_scatter_plot_params, | ||
) | ||
axarr[idx[0], idx[1]].scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") | ||
axarr[idx[0], idx[1]].set_title(tt) | ||
axarr[idx[0], idx[1]].set_title(title) | ||
fig.colorbar(disp.surface_, ax=axarr[idx[0], idx[1]], label="Probability estimate") | ||
|
||
plt.show() | ||
|
||
# %% | ||
# As a sanity check, we can verify for a given sample that the probability | ||
# predicted by the :class:`~ensemble.VotingClassifier` is indeed the weighted | ||
# average of the individual classifiers' soft-predictions. | ||
# | ||
# In the case of binary classification such as in the present example, the | ||
# :term:`predict_proba` arrays contain the probability of belonging to class 0 | ||
# (here in red) as the first entry, and the probability of belonging to class 1 | ||
# (here in blue) as the second entry. | ||
|
||
test_sample = pd.DataFrame({"Feature #0": [-0.5], "Feature #1": [1.5]}) | ||
predict_probas = [est.predict_proba(test_sample).ravel() for est in eclf.estimators_] | ||
for (est_name, _), est_probas in zip(eclf.estimators, predict_probas): | ||
print(f"{est_name}'s predicted probabilities: {est_probas}") | ||
|
||
# %% | ||
print( | ||
"Weighted average of soft-predictions: " | ||
f"{np.dot(weights, predict_probas) / np.sum(weights)}" | ||
) | ||
|
||
# %% | ||
ArturoAmorQ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# We can see that manual calculation of predicted probabilities above is | ||
# equivalent to that produced by the `VotingClassifier`: | ||
|
||
print( | ||
"Predicted probability of VotingClassifier: " | ||
f"{eclf.predict_proba(test_sample).ravel()}" | ||
) | ||
ogrisel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# %% | ||
# To convert soft predictions into hard predictions when weights are provided, | ||
# the weighted average predicted probabilities are computed for each class. | ||
# Then, the final class label is then derived from the class label with the | ||
# highest average probability, which corresponds to the default threshold at | ||
# `predict_proba=0.5` in the case of binary classification. | ||
|
||
print( | ||
"Class with the highest weighted average of soft-predictions: " | ||
f"{np.argmax(np.dot(weights, predict_probas) / np.sum(weights))}" | ||
) | ||
|
||
# %% | ||
# This is equivalent to the output of `VotingClassifier`'s `predict` method: | ||
|
||
print(f"Predicted class of VotingClassifier: {eclf.predict(test_sample).ravel()}") | ||
|
||
# %% | ||
# Soft votes can be thresholded as for any other probabilistic classifier. This | ||
# allows you to set a threshold probability at which the positive class will be | ||
# predicted, instead of simply selecting the class with the highest predicted | ||
# probability. | ||
|
||
from sklearn.model_selection import FixedThresholdClassifier | ||
|
||
eclf_other_threshold = FixedThresholdClassifier( | ||
eclf, threshold=0.7, response_method="predict_proba" | ||
).fit(X, y) | ||
print( | ||
"Predicted class of thresholded VotingClassifier: " | ||
f"{eclf_other_threshold.predict(test_sample)}" | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding that it is not linearly separable?