10000 BUG log_loss renormalizes the predictions · Issue #24515 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

BUG log_loss renormalizes the predictions #24515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lorentzenchr opened this issue Sep 26, 2022 · 13 comments · Fixed by #25299
Closed

BUG log_loss renormalizes the predictions #24515

lorentzenchr opened this issue Sep 26, 2022 · 13 comments · Fixed by #25299

Comments

@lorentzenchr
Copy link
Member

Describe the bug

log_loss(y_true, y_pred) renormalizes y_pred internally such that it sums to 1. This way, a really bad model, the predictions of which do not sum to 1, gets a better loss then it actually has.

Steps/Code to Reproduce

from scipy.special import xlogy
from sklearn.metrics import log_loss

y_true = [[0, 1]]
y_pred = [[0.2, 0.3]]

log_loss(y_true, y_pred)

Expected Results

-xlogy(y_true, y_pred).sum(axis=1)

Result: 1.2039728

Actual Results

Result: 0.5108256237659907

Versions

System:
    python: 3.9.14
   machine: macOS

Python dependencies:
      sklearn: 1.1.2
@HariharPadhi1412
Copy link

can u share me the refernce of the code where the bug was there

@lorentzenchr
Copy link
Member Author

@TomDLT I'd be interested in you opinion. We hit this in #24365 (comment).

@TomDLT
Copy link
Member
TomDLT commented Oct 24, 2022

I feel like computing a log loss with probabilities not summing to one does not make sense, so I am ok with the renormalization.

a really bad model, the predictions of which do not sum to 1

To me this is not a bad model (a model that has poor prediction accuracy), this is an incorrect model (a model that does not return results in the expected format), as would be a model that predicts negative probabilities. We could almost raise an error if the probabilities do not sum to one (or close), but I feel like it is simpler to just renormalize and compute the loss.

@lorentzenchr
Copy link
Member Author

I feel like computing a log loss with probabilities not summing to one does not make sense, so I am ok with the renormalization.

To be honest, I strongly think renormalizing in the metric is methodologically wrong because this way the metric is not measuring the model prediction anymore. The model should take care of it's own predictions (i.e. make them sum to 1), not the metric!

A metric is like a measuring device, say a scale (for measuring weight). We are measuring objects for a flight transport, so lighter is better. Renormalization is like measuring the weight of objects with their packaging removed. But the flight will have to carry the whole objects, with packaging included.

@ogrisel
Copy link
Member
ogrisel commented Nov 17, 2022

We could almost raise an error if the probabilities do not sum to one (or close), but I feel like it is simpler to just renormalize and compute the loss.

I would be fine with raising a warning first and an error in the future.

@ogrisel ogrisel removed the Needs Triage Issue requires triage label Nov 17, 2022
@ogrisel
Copy link
Member
ogrisel commented Nov 17, 2022

I agree with @lorentzenchr that the current behavior is very surprising and therefore it's a bug to me.

@thomasjpfan
Copy link
Member

What should be the behavior for larger eps values? For example, the following has eps=0.1:

from sklearn.metrics import log_loss
import numpy as np

y_true = [0, 1, 2]
y_pred = [[0, 0, 1], [0, 1, 0], [0, 0, 1]]

eps = 0.1
log_loss(y_true, y_pred, eps=eps)
# 0.9330788879075577

# `log_loss` will use a clipped `y_pred` for computing the log loss:
np.clip(y_pred, eps, 1 - eps)
# array([[0.1, 0.1, 0.9],
#        [0.1, 0.9, 0.1],
#        [0.1, 0.1, 0.9]])

We could just validate the input, i.e. np.isclose(y_pred.sum(axis=1), 1.0), but log_loss will use the clipped version for computation. For reference, here is the clipping:

# Clipping
y_pred = np.clip(y_pred, eps, 1 - eps)

For me, I think that eps shouldn't be a parameter and should always be np.finfo(y_pred.dtype).eps. For reference, PyTorch's binary_cross_entropy clips the log to -100 by default

@lorentzenchr
Copy link
Member Author
lorentzenchr commented Dec 6, 2022

I agree with @thomasjpfan that eps should not be a parameter. In my opinion, the perfect default value is 0 (what is the use case of having it in the first place?). I'm also fine with np.finfo(y_pred.dtype).eps.

@OmarManzoor
Copy link
Contributor

@lorentzenchr Would it be okay for me to work on this issue?

@lorentzenchr
Copy link
Member Author

@OmarManzoor If the solution is clear to you, then yes.

@OmarManzoor
Copy link
Contributor

@OmarManzoor If the solution is clear to you, then yes.

From what I can understand we need to raise a warning if y_pred does not sum to 1 and we need to remove eps as a parameter to this metric and instead use a value of np.finfo(y_pred.dtype).eps.

@lorentzenchr
Copy link
Member Author

@OmarManzoor Yes. You are very welcome to go ahead, open a PR and link it with this issue. Keep in mind that eps needs a a deprecation cycle, i.e. throw a warning when set for 2 releases. Be prepared that a few concerns might be raised during review.

@OmarManzoor
Copy link
Contributor

@OmarManzoor Yes. You are very welcome to go ahead, open a PR and link it with this issue. Keep in mind that eps needs a a deprecation cycle, i.e. throw a warning when set for 2 releases. Be prepared that a few concerns might be raised during review.

Sure thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0