-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MRG add log loss (cross-entropy loss) to metrics #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add your function to the common tests? |
Can you add some narrative doc? |
predictions. For a single sample with true label yₜ ∈ {0,1} and | ||
estimated probability yₚ that yₜ = 1, the log loss is | ||
|
||
-log P(yₜ|yₚ) = -(yₜ log(yₚ) + (1 - yₜ) log(1 - yₚ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the character ₜ intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The character doesn't render on my screen. I guess I'm missing some unicode font.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to yt and yp.
The common tests don't work for this loss since the input format is different; it's labels (or indicators, but usually 1-d) and |
Added narrative docs. I hope they build correctly; I tested with |
y_pred = np.array([[0.5, 0.5], [0.1, 0.9], [0.01, 0.99], | ||
[0.9, 0.1], [0.75, 0.25], [0.001, 0.999]]) | ||
loss = log_loss(y_true, y_pred) | ||
assert_almost_equal(loss, 1.8817970689982668) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK assert_almost_equal checks for 7 decimal places by default, hence it would probably make more sense to truncate the excess digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover from @ephes' original test, but you're right.
Can you add in the list of classification in |
Too bad :-( There is not test for the normalize option. If you want, you can add invariance tests for those kind of metrics. |
I think that this might lead to a bug
|
I am indeed quite weary of the attempt to make input arguments too |
I'm not really sure what you mean. There is a test for normalization: the lines starting with
extend the test before that to a larger (non-square!) matrix of probabilities, then check that the total loss is indeed 6 * the mean loss established earlier. See also the comment. Or should |
Sorry, @larsmans : this remark was completely out of place and not useful. I was reviewing other PRs and dealing with mail in parallel, and simply got confused. |
Ok! Then @arjoly what exactly do you mean? And regarding a link, Bishop's book is not available online (I had to borrow it from the library today -- haven't done that in quite some time). |
Sorry, I was wrong. I thought it would raise a issue with edit : there is a test |
|
||
Parameters | ||
---------- | ||
y_true : array-like or list of labels or label indicator matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean y_true : array-like
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually anything that goes into a LabelBinarizer
is accepted. The question is whether we should advertise that (probably not without a test for all cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But "list of labels" might not be the right term, even if it's used elsewhere. Seems ambiguous to me.
Just to let you know that if one label is missing, an error is raised. "log_loss([0,0,1], [[0, 0, 1], [0.6, 0.2, 0.2], [0.6, 0.1, 0.3]])". Not a bad behavior. |
I would like to review this PR over the week-end. |
------- | ||
loss : float | ||
|
||
References |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get an example here?
When new metrics are added, should they not also be added to |
For the record, I have also implemented multiclass log loss here: It takes the output of decision_function as input. It doesn't implement the same formula as in this PR but I would assume that the two should give A3E2 the same results. |
@jnothman A scorer would get |
A major intention of the |
This definitely needs to be added to |
The whole |
I can make a |
The problem is then that the output in |
True. Another option would be to return a pair from |
Shall I send a separate PR with the latter idea? |
(That's fine conceptually, of course, but it's harder to get the API right. And I'd be very happy to see a PR that does! I agree that checking the |
Ok, rebased on master, stuff seems to work. I renamed the scorer I can haz reviews? |
I'm unsure about the naming issue :-/ |
@@ -1031,6 +1077,7 @@ Scoring Function | |||
'accuracy' :func:`sklearn.metrics.accuracy_score` | |||
'average_precision' :func:`sklearn.metrics.average_precision_score` | |||
'f1' :func:`sklearn.metrics.f1_score` | |||
'log_likelihood' :func:`sklearn.metric.log_loss` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put a comment that the scorer returns the opposite of the loss function. Or introduce an alias function named log_likelihood_score(y_true, y_pred)
that returns - log_loss(y_true, y_pred)
.
I am fine with having both functions as in practice sometimes you want once or the other and that makes the code more readable to avoid having to throw a -
in the middle of an expression. We would also increase google-ability by having both as the reference API has a very good page-rank on such keywords apparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like, we should add _score
, _loss
or _error
to be clear to all score
strings.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean log_likelihood_score
? That's pretty verbose. I'd like to just say scoring="log_likelihood"
. Users can look this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to de F42D scribe this comment to others. Learn more.
How about log_loss? I'd really prefer if the metric function and the scorer
name were consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @larsmans proposal: the name of the scoring should be short: "log_likelihood"
and the function name should be log_likelihood_score
to follow the convention.
Ok, any votes for adding a |
... or rather, the other way around.
Looks good. +1 for merging if once rebased with fixed conflicts the tests pass. |
Ok, merged by rebase! |
Coming late to the battle (things have been going too fast for me). I |
I agree with @GaelVaroquaux. One can compute the log likelihood of any |
It's the log-likelihood of a categorical distribution, i.e. a classifier. We could still rename it |
How do you plan to handle the hinge loss, for instance? (sorry I haven't |
@larsmans can you add your name in the author of the metrics module? :-) Thanks !!! |
Isn't it the same thing as multinomial distribution? It's a name that I |
"Log loss" is the name from Bishop so I think it's pretty standard. The problem is the name of the "score" (which is the negative of the log loss). I see two option:
Then we need to agree on the name of the scoring string to be consistent with other scoring names which are all positive (greater is better):
|
I am fine with log_loss |
Ok, reverting in a bit. |
Hadn't seen @ogrisel's comment. Actually we have the same problem with MSE, which is also a loss function with a scorer that flips the sign. |
Reissue of #1125 with a clearer implementation, more documentation and more tests.