8000 FEA Add d2_log_loss_score by OmarManzoor · Pull Request #28351 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FEA Add d2_log_loss_score #28351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2826,6 +2826,51 @@ Here are some usage examples of the :func:`d2_absolute_error_score` function::

|details-end|

|details-start|
**D² log loss score**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under „regression metrics“ but it’s a classification metric. Wen can fix that in another PR as it involves a larger change of the user guide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha you spot it as well. I assumed I would just merge it discretly and open an issue right after 😄

|details-split|

The :func:`d2_log_loss_score` function implements the special case
of D² with the log loss, see :ref:`log_loss`, i.e.:

.. math::

\text{dev}(y, \hat{y}) = \text{log_loss}(y, \hat{y}).

The :math:`y_{\text{null}}` for the :func:`log_loss` is the per-class
proportion.

Here are some usage examples of the :func:`d2_log_loss_score` function::

>>> from sklearn.metrics import d2_log_loss_score
>>> y_true = [1, 1, 2, 3]
>>> y_pred = [
... [0.5, 0.25, 0.25],
... [0.5, 0.25, 0.25],
... [0.5, 0.25, 0.25],
... [0.5, 0.25, 0.25],
... ]
>>> d2_log_loss_score(y_true, y_pred)
0.0
>>> y_true = [1, 2, 3]
>>> y_pred = [
... [0.98, 0.01, 0.01],
... [0.01, 0.98, 0.01],
... [0.01, 0.01, 0.98],
... ]
>>> d2_log_loss_score(y_true, y_pred)
0.981...
>>> y_true = [1, 2, 3]
>>> y_pred = [
... [0.1, 0.6, 0.3],
... [0.1, 0.6, 0.3],
... [0.4, 0.5, 0.1],
... ]
>>> d2_log_loss_score(y_true, y_pred)
-0.552...

|details-end|

.. _visualization_regression_evaluation:

Visual evaluation of regression models
Expand Down
6 changes: 5 additions & 1 deletion doc/whats_new/v1.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Changelog
..........................

- |Fix| Fixed a regression in :class:`calibration.CalibratedClassifierCV` where
an error was wrongly raised with string targets.
an error was wrongly raised with string targets.
:pr:`28843` by :user:`Jérémie du Boisberranger <jeremiedbb>`.

:mod:`sklearn.cluster`
Expand Down Expand Up @@ -406,6 +406,10 @@ Changelog
is deprecated and will raise an error in v1.7.
:pr:`18555` by :user:`Kaushik Amar Das <cozek>`.

- |Feature| :func:`metrics.d2_log_loss_score` has been added which
calculates the D^2 score for the log loss.
:pr:`28351` by :user:`Omar Salman <OmarManzoor>`.

:mod:`sklearn.mixture`
......................

Expand Down
2 changes: 2 additions & 0 deletions sklearn/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
classification_report,
cohen_kappa_score,
confusion_matrix,
d2_log_loss_score,
f1_score,
fbeta_score,
hamming_loss,
Expand Down Expand Up @@ -113,6 +114,7 @@
"coverage_error",
"d2_tweedie_score",
"d2_absolute_error_score",
"d2_log_loss_score",
"d2_pinball_score",
"dcg_score",
"davies_bouldin_score",
Expand Down
99 changes: 98 additions & 1 deletion sklearn/metrics/_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,11 @@
from ..utils.extmath import _nanaverage
from ..utils.multiclass import type_of_target, unique_labels
from ..utils.sparsefuncs import count_nonzero
from ..utils.validation import _check_pos_label_consistency, _num_samples
from ..utils.validation import (
_check_pos_label_consistency,
_check_sample_weight,
_num_samples,
)


def _check_zero_division(zero_division):
Expand Down Expand Up @@ -3257,3 +3261,96 @@ def brier_score_loss(
raise
y_true = np.array(y_true == pos_label, int)
return np.average((y_true - y_proba) ** 2, weights=sample_weight)


@validate_params(
{
"y_true": ["array-like"],
"y_pred": ["array-like"],
"sample_weight": ["array-like", None],
"labels": ["array-like", None],
},
prefer_skip_nested_validation=True,
)
def d2_log_loss_score(y_true, y_pred, *, sample_weight=None, labels=None):
"""
:math:`D^2` score function, fraction of log loss explained.

Best possible score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A model that always uses the empirical mean of `y_true` as
constant prediction, disregarding the input features, gets a D^2 score of 0.0.

Read more in the :ref:`User Guide <d2_score>`.

.. versionadded:: 1.5

Parameters
----------
y_true : array-like or label indicator matrix
The actuals labels for the n_samples samples.

y_pred : array-like of shape (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier's
predict_proba method. If ``y_pred.shape = (n_samples,)``
the probabilities provided are assumed to be that of the
positive class. The labels in ``y_pred`` are assumed to be
ordered alphabetically, as done by
:class:`~sklearn.preprocessing.LabelBinarizer`.

sample_weight : array-like of shape (n_samples,), default=None
Sample weights.

labels : array-like, default=None
If not provided, labels will be inferred from y_true. If ``labels``
is ``None`` and ``y_pred`` has shape (n_samples,) the labels are
assumed to be binary and are inferred from ``y_true``.

Returns
-------
d2 : float or ndarray of floats
The D^2 score.

Notes
-----
This is not a symmetric function.

Like R^2, D^2 score may be negative (it need not actually be the square of
a quantity D).

This metric is not well-defined for a single sample and will return a NaN
value if n_samples is less than two.
"""
y_pred = check_array(y_pred, ensure_2d=False, dtype="numeric")
check_consistent_length(y_pred, y_true, sample_weight)
if _num_samples(y_pred) < 2:
msg = "D^2 score is not well-defined with less than two samples."
warnings.warn(msg, UndefinedMetricWarning)
return float("nan")

# log loss of the fitted model
numerator = log_loss(
y_true=y_true,
y_pred=y_pred,
normalize=False,
sample_weight=sample_weight,
labels=labels,
)

# Proportion of labels in the dataset
weights = _check_sample_weight(sample_weight, y_true)

_, y_value_indices = np.unique(y_true, return_inverse=True)
counts = np.bincount(y_value_indices, weights=weights)
y_prob = counts / weights.sum()
y_pred_null = np.tile(y_prob, (len(y_true), 1))

# log loss of the null model
denominator = log_loss(
y_true=y_true,
y_pred=y_pred_null,
normalize=False,
sample_weight=sample_weight,
labels=labels,
)

return 1 - (numerator / denominator)
Loading
0