BUG log_loss renormalizes the predictions #24515

lorentzenchr · 2022-09-26T19:24:08Z

Describe the bug

log_loss(y_true, y_pred) renormalizes y_pred internally such that it sums to 1. This way, a really bad model, the predictions of which do not sum to 1, gets a better loss then it actually has.

Steps/Code to Reproduce

from scipy.special import xlogy
from sklearn.metrics import log_loss

y_true = [[0, 1]]
y_pred = [[0.2, 0.3]]

log_loss(y_true, y_pred)

Expected Results

-xlogy(y_true, y_pred).sum(axis=1)

Result: 1.2039728

Actual Results

Result: 0.5108256237659907

Versions

System:
    python: 3.9.14
   machine: macOS

Python dependencies:
      sklearn: 1.1.2

The text was updated successfully, but these errors were encountered:

HariharPadhi1412 · 2022-10-13T14:44:05Z

can u share me the refernce of the code where the bug was there

lorentzenchr · 2022-10-24T12:01:33Z

@TomDLT I'd be interested in you opinion. We hit this in #24365 (comment).

TomDLT · 2022-10-24T22:11:39Z

I feel like computing a log loss with probabilities not summing to one does not make sense, so I am ok with the renormalization.

a really bad model, the predictions of which do not sum to 1

To me this is not a bad model (a model that has poor prediction accuracy), this is an incorrect model (a model that does not return results in the expected format), as would be a model that predicts negative probabilities. We could almost raise an error if the probabilities do not sum to one (or close), but I feel like it is simpler to just renormalize and compute the loss.

lorentzenchr · 2022-10-25T08:55:38Z

I feel like computing a log loss with probabilities not summing to one does not make sense, so I am ok with the renormalization.

To be honest, I strongly think renormalizing in the metric is methodologically wrong because this way the metric is not measuring the model prediction anymore. The model should take care of it's own predictions (i.e. make them sum to 1), not the metric!

A metric is like a measuring device, say a scale (for measuring weight). We are measuring objects for a flight transport, so lighter is better. Renormalization is like measuring the weight of objects with their packaging removed. But the flight will have to carry the whole objects, with packaging included.

ogrisel · 2022-11-17T16:45:04Z

We could almost raise an error if the probabilities do not sum to one (or close), but I feel like it is simpler to just renormalize and compute the loss.

I would be fine with raising a warning first and an error in the future.

ogrisel · 2022-11-17T16:46:06Z

I agree with @lorentzenchr that the current behavior is very surprising and therefore it's a bug to me.

thomasjpfan · 2022-12-03T18:43:37Z

What should be the behavior for larger eps values? For example, the following has eps=0.1:

from sklearn.metrics import log_loss
import numpy as np

y_true = [0, 1, 2]
y_pred = [[0, 0, 1], [0, 1, 0], [0, 0, 1]]

eps = 0.1
log_loss(y_true, y_pred, eps=eps)
# 0.9330788879075577

# `log_loss` will use a clipped `y_pred` for computing the log loss:
np.clip(y_pred, eps, 1 - eps)
# array([[0.1, 0.1, 0.9],
#        [0.1, 0.9, 0.1],
#        [0.1, 0.1, 0.9]])

We could just validate the input, i.e. np.isclose(y_pred.sum(axis=1), 1.0), but log_loss will use the clipped version for computation. For reference, here is the clipping:

scikit-learn/sklearn/metrics/_classification.py

Lines 2629 to 2630 in d52e946

    
           # Clipping 
        
           y_pred = np.clip(y_pred, eps, 1 - eps)

For me, I think that eps shouldn't be a parameter and should always be np.finfo(y_pred.dtype).eps. For reference, PyTorch's binary_cross_entropy clips the log to -100 by default

lorentzenchr · 2022-12-06T18:55:46Z

I agree with @thomasjpfan that eps should not be a parameter. In my opinion, the perfect default value is 0 (what is the use case of having it in the first place?). I'm also fine with np.finfo(y_pred.dtype).eps.

OmarManzoor · 2022-12-15T12:03:56Z

@lorentzenchr Would it be okay for me to work on this issue?

lorentzenchr · 2022-12-15T18:56:13Z

@OmarManzoor If the solution is clear to you, then yes.

OmarManzoor · 2022-12-16T07:37:53Z

@OmarManzoor If the solution is clear to you, then yes.

From what I can understand we need to raise a warning if y_pred does not sum to 1 and we need to remove eps as a parameter to this metric and instead use a value of np.finfo(y_pred.dtype).eps.

lorentzenchr · 2022-12-16T14:17:55Z

@OmarManzoor Yes. You are very welcome to go ahead, open a PR and link it with this issue. Keep in mind that eps needs a a deprecation cycle, i.e. throw a warning when set for 2 releases. Be prepared that a few concerns might be raised during review.

OmarManzoor · 2022-12-16T14:27:27Z

@OmarManzoor Yes. You are very welcome to go ahead, open a PR and link it with this issue. Keep in mind that eps needs a a deprecation cycle, i.e. throw a warning when set for 2 releases. Be prepared that a few concerns might be raised during review.

Sure thank you.

lorentzenchr added Bug Needs Triage Issue requires triage module:metrics labels Sep 26, 2022

ogrisel removed the Needs Triage Issue requires triage label Nov 17, 2022

lorentzenchr added the help wanted label Nov 19, 2022

thomasjpfan added this to Quansight's scikit-learn Project Board Nov 29, 2022

thomasjpfan moved this to Todo📬 in Quansight's scikit-learn Project Board Nov 29, 2022

OmarManzoor mentioned this issue Jan 5, 2023

FIX renormalization of y_pred inside log_loss #25299

Merged

lorentzenchr closed this as completed in #25299 Feb 23, 2023

github-project-automation bot moved this from Todo📬 to Done🚀 in Quansight's scikit-learn Project Board Feb 23, 2023

lorentzenchr mentioned this issue Apr 16, 2024

MAINT Clean up deprecations for 1.5: in log_loss #28851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG log_loss renormalizes the predictions #24515

BUG log_loss renormalizes the predictions #24515

BUG log_loss renormalizes the predictions #24515

BUG log_loss renormalizes the predictions #24515

Comments

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions