8000 ENH Add Multiclass Brier Score Loss (#22046) · scikit-learn/scikit-learn@318a282 · GitHub
[go: up one dir, main page]

Skip to content

Commit 318a282

Browse files
ogriselVarun Aggarwalantoinebaker
authored
ENH Add Multiclass Brier Score Loss (#22046)
Co-authored-by: Varun Aggarwal <varunaggarwal@Varuns-MBP.fios-router.home> Co-authored-by: Antoine Baker <antoine.baker59@gmail.com>
1 parent 0fb9e8c commit 318a282

File tree

7 files changed

+609
-181
lines changed

7 files changed

+609
-181
lines changed

doc/modules/model_evaluation.rst

Lines changed: 45 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1344,30 +1344,30 @@ probability outputs (``predict_proba``) of a classifier instead of its
13441344
discrete predictions.
13451345

13461346
For binary classification with a true label :math:`y \in \{0,1\}`
1347-
and a probability estimate :math:`p = \operatorname{Pr}(y = 1)`,
1347+
and a probability estimate :math:`\hat{p} \approx \operatorname{Pr}(y = 1)`,
13481348
the log loss per sample is the negative log-likelihood
13491349
of the classifier given the true label:
13501350

13511351
.. math::
13521352
1353-
L_{\log}(y, p) = -\log \operatorname{Pr}(y|p) = -(y \log (p) + (1 - y) \log (1 - p))
1353+
L_{\log}(y, \hat{p}) = -\log \operatorname{Pr}(y|\hat{p}) = -(y \log (\hat{p}) + (1 - y) \log (1 - \hat{p}))
13541354
13551355
This extends to the multiclass case as follows.
13561356
Let the true labels for a set of samples
13571357
be encoded as a 1-of-K binary indicator matrix :math:`Y`,
13581358
i.e., :math:`y_{i,k} = 1` if sample :math:`i` has label :math:`k`
13591359
taken from a set of :math:`K` labels.
1360-
Let :math:`P` be a matrix of probability estimates,
1361-
with :math:`p_{i,k} = \operatorname{Pr}(y_{i,k} = 1)`.
1360+
Let :math:`\hat{P}` be a matrix of probability estimates,
1361+
with elements :math:`\hat{p}_{i,k} \approx \operatorname{Pr}(y_{i,k} = 1)`.
13621362
Then the log loss of the whole set is
13631363

13641364
.. math::
13651365
1366-
L_{\log}(Y, P) = -\log \operatorname{Pr}(Y|P) = - \frac{1}{N} \sum_{i=0}^{N-1} \sum_{k=0}^{K-1} y_{i,k} \log p_{i,k}
1366+
L_{\log}(Y, \hat{P}) = -\log \operatorname{Pr}(Y|\hat{P}) = - \frac{1}{N} \sum_{i=0}^{N-1} \sum_{k=0}^{K-1} y_{i,k} \log \hat{p}_{i,k}
13671367
13681368
To see how this generalizes the binary log loss given above,
13691369
note that in the binary case,
1370-
:math:`p_{i,0} = 1 - p_{i,1}` and :math:`y_{i,0} = 1 - y_{i,1}`,
1370+
:math:`\hat{p}_{i,0} = 1 - \hat{p}_{i,1}` and :math:`y_{i,0} = 1 - y_{i,1}`,
13711371
so expanding the inner sum over :math:`y_{i,k} \in \{0,1\}`
13721372
gives the binary log loss.
13731373

@@ -1923,41 +1923,64 @@ set [0,1] has an error::
19231923
Brier score loss
19241924
----------------
19251925

1926-
The :func:`brier_score_loss` function computes the
1927-
`Brier score <https://en.wikipedia.org/wiki/Brier_score>`_
1928-
for binary classes [Brier1950]_. Quoting Wikipedia:
1926+
The :func:`brier_score_loss` function computes the `Brier score
1927+
<https://en.wikipedia.org/wiki/Brier_score>`_ for binary and multiclass
1928+
probabilistic predictions and is equivalent to the mean squared error.
1929+
Quoting Wikipedia:
19291930

1930-
"The Brier score is a proper score function that measures the accuracy of
1931-
probabilistic predictions. It is applicable to tasks in which predictions
1932-
must assign probabilities to a set of mutually exclusive discrete outcomes."
1931+
"The Brier score is a strictly proper scoring rule that measures the accuracy of
1932+
probabilistic predictions. [...] [It] is applicable to tasks in which predictions
1933+
must assign probabilities to a set of mutually exclusive discrete outcomes or
1934+
classes."
19331935

1934-
This function returns the mean squared error of the actual outcome
1935-
:math:`y \in \{0,1\}` and the predicted probability estimate
1936-
:math:`p = \operatorname{Pr}(y = 1)` (:term:`predict_proba`) as outputted by:
1936+
Let the true labels for a set of :math:`N` data points be encoded as a 1-of-K binary
1937+
indicator matrix :math:`Y`, i.e., :math:`y_{i,k} = 1` if sample :math:`i` has
1938+
label :math:`k` taken from a set of :math:`K` labels. Let :math:`\hat{P}` be a matrix
1939+
of probability estimates with elements :math:`\hat{p}_{i,k} \approx \operatorname{Pr}(y_{i,k} = 1)`.
1940+
Following the original definition by [Brier1950]_, the Brier score is given by:
19371941

19381942
.. math::
19391943
1940-
BS = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1}(y_i - p_i)^2
1944+
BS(Y, \hat{P}) = \frac{1}{N}\sum_{i=0}^{N-1}\sum_{k=0}^{K-1}(y_{i,k} - \hat{p}_{i,k})^{2}
19411945
1942-
The Brier score loss is also between 0 to 1 and the lower the value (the mean
1943-
square difference is smaller), the more accurate the prediction is.
1946+
The Brier score lies in the interval :math:`[0, 2]` and the lower the value the
1947+
better the probability estimates are (the mean squared difference is smaller).
1948+
Actually, the Brier score is a strictly proper scoring rule, meaning that it
1949+
achieves the best score only when the estimated probabilities equal the
1950+
true ones.
1951+
1952+
Note that in the binary case, the Brier score is usually divided by two and
1953+
ranges between :math:`[0,1]`. For binary targets :math:`y_i \in {0, 1}` and
1954+
probability estimates :math:`\hat{p}_i \approx \operatorname{Pr}(y_i = 1)`
1955+
for the positive class, the Brier score is then equal to:
1956+
1957+
.. math::
1958+
1959+
BS(y, \hat{p}) = \frac{1}{N} \sum_{i=0}^{N - 1}(y_i - \hat{p}_i)^2
1960+
1961+
The :func:`brier_score_loss` function computes the Brier score given the
1962+
ground-truth labels and predicted probabilities, as returned by an estimator's
1963+
``predict_proba`` method. The `scale_by_half` parameter controls which of the
1964+
two above definitions to follow.
19441965

1945-
Here is a small example of usage of this function::
19461966

19471967
>>> import numpy as np
19481968
>>> from sklearn.metrics import brier_score_loss
19491969
>>> y_true = np.array([0, 1, 1, 0])
19501970
>>> y_true_categorical = np.array(["spam", "ham", "ham", "spam"])
19511971
>>> y_prob = np.array([0.1, 0.9, 0.8, 0.4])
1952-
>>> y_pred = np.array([0, 1, 1, 0])
19531972
>>> brier_score_loss(y_true, y_prob)
19541973
0.055
19551974
>>> brier_score_loss(y_true, 1 - y_prob, pos_label=0)
19561975
0.055
19571976
>>> brier_score_loss(y_true_categorical, y_prob, pos_label="ham")
19581977
0.055
1959-
>>> brier_score_loss(y_true, y_prob > 0.5)
1960-
0.0
1978+
>>> brier_score_loss(
1979+
... ["eggs", "ham", "spam"],
1980+
... [[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.2, 0.2, 0.6]],
1981+
... labels=["eggs", "ham", "spam"],
1982+
... )
1983+
0.146...
19611984

19621985
The Brier score can be used to assess how well a classifier is calibrated.
19631986
However, a lower Brier score loss does not always mean a better calibration.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
- :func:`metrics.brier_score_loss` implements the Brier score for multiclass
2+
classification problems and adds a `scale_by_half` argument. This metric is
3+
notably useful to assess both sharpness and calibration of probabilistic
4+
classifiers. See the docstrings for more details. By
5+
:user:`Varun Aggarwal <aggvarun01>`, :user:`Olivier Grisel <ogrisel>` and
6+
:user:`Antoine Baker <antoinebaker>`.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
- :func:`metrics.log_loss` now raises a `ValueError` if values of `y_true`
2+
are missing in `labels`. By :user:`Varun Aggarwal <aggvarun01>`,
3+
:user:`Olivier Grisel <ogrisel>` and :user:`Antoine Baker <antoinebaker>`.

examples/calibration/plot_calibration_multiclass.py

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -212,14 +212,30 @@ class of an instance (red: class 1, green: class 2, blue: class 3).
212212

213213
from sklearn.metrics import log_loss
214214

215-
score = log_loss(y_test, clf_probs)
216-
cal_score = log_loss(y_test, cal_clf_probs)
215+
loss = log_loss(y_test, clf_probs)
216+
cal_loss = log_loss(y_test, cal_clf_probs)
217217

218-
print("Log-loss of")
219-
print(f" * uncalibrated classifier: {score:.3f}")
220-
print(f" * calibrated classifier: {cal_score:.3f}")
218+
print("Log-loss of:")
219+
print(f" - uncalibrated classifier: {loss:.3f}")
220+
print(f" - calibrated classifier: {cal_loss:.3f}")
221221

222222
# %%
223+
# We can also assess calibration with the Brier score for probabilistics predictions
224+
# (lower is better, possible range is [0, 2]):
225+
226+
from sklearn.metrics import brier_score_loss
227+
228+
loss = brier_score_loss(y_test, clf_probs)
229+
cal_loss = brier_score_loss(y_test, cal_clf_probs)
230+
231+
print("Brier score of")
232+
print(f" - uncalibrated classifier: {loss:.3f}")
233+
print(f" - calibrated classifier: {cal_loss:.3f}")
234+
235+
# %%
236+
# According to the Brier score, the calibrated classifier is not better than
237+
# the original model.
238+
#
223239
# Finally we generate a grid of possible uncalibrated probabilities over
224240
# the 2-simplex, compute the corresponding calibrated probabilities and
225241
# plot arrows for each. The arrows are colored according the highest
@@ -274,3 +290,15 @@ class of an instance (red: class 1, green: class 2, blue: class 3).
274290
plt.ylim(-0.05, 1.05)
275291

276292
plt.show()
293+
294+
# %%
295+
# One can observe that, on average, the calibrator is pushing highly confident
296+
# predictions away from the boundaries of the simplex while simultaneously
297+
# moving uncertain predictions towards one of three modes, one for each class.
298+
# We can also observe that the mapping is not symmetric. Furthermore some
299+
# arrows seems to cross class assignment boundaries which is not necessarily
300+
# what one would expect from a calibration map as it means that some predicted
301+
# classes will change after calibration.
302+
#
303+
# All in all, the One-vs-Rest multiclass-calibration strategy implemented in
304+
# `CalibratedClassifierCV` should not be trusted blindly.

0 commit comments

Comments
 (0)
0