8000 MRG add log loss (cross-entropy loss) to metrics by larsmans · Pull Request #2013 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MRG add log loss (cross-entropy loss) to metrics #2013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

larsmans
Copy link
Member

Reissue of #1125 with a clearer implementation, more documentation and more tests.

@arjoly
Copy link
Member
arjoly commented May 28, 2013

Can you add your function to the common tests?

@arjoly
Copy link
Member
arjoly commented May 28, 2013

Can you add some narrative doc?

predictions. For a single sample with true label yₜ ∈ {0,1} and
estimated probability yₚ that yₜ = 1, the log loss is

-log P(yₜ|yₚ) = -(yₜ log(yₚ) + (1 - yₜ) log(1 - yₚ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the character ₜ intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character doesn't render on my screen. I guess I'm missing some unicode font.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to yt and yp.

@larsmans
Copy link
Member Author

The common tests don't work for this loss since the input format is different; it's labels (or indicators, but usually 1-d) and predict_proba output (2-d).

@larsmans
Copy link
Member Author

Added narrative docs. I hope they build correctly; I tested with rst2pdf because make doc is broken on my box.

y_pred = np.array([[0.5, 0.5], [0.1, 0.9], [0.01, 0.99],
[0.9, 0.1], [0.75, 0.25], [0.001, 0.999]])
loss = log_loss(y_true, y_pred)
assert_almost_equal(loss, 1.8817970689982668)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK assert_almost_equal checks for 7 decimal places by default, hence it would probably make more sense to truncate the excess digits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from @ephes' original test, but you're right.

@arjoly
Copy link
Member
arjoly commented May 28, 2013

Can you add in the list of classification in doc/module/module_evalutation.rst the log_loss.
Can you also add a link in the reference?

@arjoly
Copy link
Member
arjoly commented May 28, 2013

The common tests don't work for this loss since the input format is different; it's labels (or indicators, but usually 1-d) and predict_proba output (2-d).

Too bad :-(

There is not test for the normalize option. If you want, you can add invariance tests for those kind of metrics.

@arjoly
Copy link
Member
arjoly commented May 28, 2013

I think that this might lead to a bug

y_true = [1, 0, 2]
y_pred = [[0, 0, 1], [0.6, 0.2, 0.2], [0.6, 0.1, 0.3]]
loss = log_loss(y_true, y_pred)

@GaelVaroquaux
Copy link
Member

I think that this might lead to a bug

y_true = [1, 0, 2]
y_pred = [[0, 0, 1], [0.6, 0.2, 0.2], [0.6, 0.1, 0.3]]
loss = log_loss(y_true, y_pred)

I am indeed quite weary of the attempt to make input arguments too
flexible in scikit-learn. It can easily open the door to errors or bugs
going unnoticed.

@larsmans
Copy link
Member Author

I'm not really sure what you mean. There is a test for normalization: the lines starting with

y_true *= 2
y_pred *= 2

extend the test before that to a larger (non-square!) matrix of probabilities, then check that the total loss is indeed 6 * the mean loss established earlier. See also the comment.

Or should log_loss only accept indicator matrices? Those are not objects we usually pass around, so I stuck in the binarizer in between.

@GaelVaroquaux
Copy link
Member

Sorry, @larsmans : this remark was completely out of place and not useful. I was reviewing other PRs and dealing with mail in parallel, and simply got confused.

@larsmans
Copy link
Member Author

Ok! Then @arjoly what exactly do you mean? And regarding a link, Bishop's book is not available online (I had to borrow it from the library today -- haven't done that in quite some time).

@arjoly
Copy link
Member
arjoly commented May 29, 2013

Ok! Then @arjoly what exactly do you mean? And regarding a link, Bishop's book is not available online (I had to borrow it from the library today -- haven't done that in quite some time).

Sorry, I was wrong. I thought it would raise a issue with np.log, but you clip everything.
Maybe add a test to check that case?

edit : there is a test


Parameters
----------
y_true : array-like or list of labels or label indicator matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean y_true : array-like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually anything that goes into a LabelBinarizer is accepted. The question is whether we should advertise that (probably not without a test for all cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But "list of labels" might not be the right term, even if it's used elsewhere. Seems ambiguous to me.

@arjoly
Copy link
Member
arjoly commented May 29, 2013

Just to let you know that if one label is missing, an error is raised. "log_loss([0,0,1], [[0, 0, 1], [0.6, 0.2, 0.2], [0.6, 0.1, 0.3]])". Not a bad behavior.

@mblondel
Copy link
Member

I would like to review this PR over the week-end.

-------
loss : float

References
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get an example here?

@jnothman
Copy link
Member
jnothman commented Jun 2, 2013

When new metrics are added, should they not also be added to scorer.py?

@mblondel
Copy link
Member
mblondel commented Jun 2, 2013

For the record, I have also implemented multiclass log loss here:
https://github.com/mblondel/lightning/blob/master/lightning/loss_fast.pyx#L158

It takes the output of decision_function as input. It doesn't implement the same formula as in this PR but I would assume that the two should give A3E2 the same results.

@larsmans
Copy link
Member Author

@jnothman A scorer would get predict output, right? That doesn't work here since predict_proba output is expected.

@jnothman
Copy link
Member

A major intention of the Scorer abstraction is that it takes (est, X, y) rather than (y_true, y_pred) and so can call whatever it likes. The only implementation so far will use predict_proba (or decision_function) if its needs_threshold constructor argument is True.

@amueller
Copy link
Member
amueller commented Jul 1, 2013

This definitely needs to be added to scorer.py. @larsmans allowing both outputs is the main reason for having the scorer interface.

@larsmans
Copy link
Member Author
larsmans commented Jul 1, 2013

The whole Scorer thing mostly went by me because I was busy, but I see why we have it now :)
I'll make a scorer out of this.

@larsmans
Copy link
Member Author
larsmans commented Jul 1, 2013

I can make a Scorer, but only if I modify that class. This function won't work with a decision_function, it really needs predict_proba.

@amueller
Copy link
Member
amueller commented Jul 1, 2013

The problem is then that the output in GridSearchCV is also the negative score. That is somewhat suboptimal.
It would be great to get rid if this tie between GridSearchCV and the scorers, but I think printing negative likelihoods is pretty counter-intiuitive.

@larsmans
Copy link
Member Author
larsmans commented Jul 1, 2013

True. Another option would be to return a pair from Scorer.__call__, say (-1, score) or (1, score) where <0 means "minimize this" (greater_is_better=False) and >=0 means "maximize this". (Or the strings "min" and "max", or something like that.) If we do it this way, it becomes much easier to write a scorer as a simple Python function instead of a class or an actual Scorer-typed object.

@larsmans
Copy link
Member Author
larsmans commented Jul 1, 2013

Shall I send a separate PR with the latter idea?

@jnothman
Copy link
Member
jnothman commented Jul 1, 2013

@larsmans, I think this might be part of the solution to getting Scorers that return multiple scores (#1837, #1850): we need to separate the concept of the (vector/dict of) scores as returned to the user and the (scalar) objective to be maximised.

@jnothman
Copy link
Member
jnothman commented Jul 1, 2013

(That's fine conceptually, of course, but it's harder to get the API right. And I'd be very happy to see a PR that does! I agree that checking the greater_is_better attribute of a callable smells fishy.)

@larsmans
Copy link
Member Author

Ok, rebased on master, stuff seems to work. I renamed the scorer "log_likelihood" because that's actually what it's computing. Should I change the log_loss function as well? It makes the connection clearer, but users would get a metric that returns strictly negative values, which might be confusing.

I can haz reviews?

@amueller
Copy link
Member

I'm unsure about the naming issue :-/

@@ -1031,6 +1077,7 @@ Scoring Function
'accuracy' :func:`sklearn.metrics.accuracy_score`
'average_precision' :func:`sklearn.metrics.average_precision_score`
'f1' :func:`sklearn.metrics.f1_score`
'log_likelihood' :func:`sklearn.metric.log_loss`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put a comment that the scorer returns the opposite of the loss function. Or introduce an alias function named log_likelihood_score(y_true, y_pred) that returns - log_loss(y_true, y_pred).

I am fine with having both functions as in practice sometimes you want once or the other and that makes the code more readable to avoid having to throw a - in the middle of an expression. We would also increase google-ability by having both as the reference API has a very good page-rank on such keywords apparently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like, we should add _score, _loss or _error to be clear to all score
strings.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean log_likelihood_score? That's pretty verbose. I'd like to just say scoring="log_likelihood". Users can look this up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to de F42D scribe this comment to others. Learn more.

How about log_loss? I'd really prefer if the metric function and the scorer
name were consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @larsmans proposal: the name of the scoring should be short: "log_likelihood" and the function name should be log_likelihood_score to follow the convention.

@larsmans
Copy link
Member Author

Ok, any votes for adding a log_likelihood function as well?

... or rather, the other way around.
@ogrisel
Copy link
Member
ogrisel commented Jul 26, 2013

Looks good. +1 for merging if once rebased with fixed conflicts the tests pass.

@larsmans
Copy link
Member Author

Ok, merged by rebase!

@larsmans larsmans closed this Jul 26, 2013
@GaelVaroquaux
Copy link
Member

Ok, merged by rebase!

Coming late to the battle (things have been going too fast for me). I
have a question: in the name 'log_likelihood', what is the model
underlying this likelihood? I am not thrilled by this name, as I don't
find it very explicit.

@mblondel
Copy link
Member

I agree with @GaelVaroquaux. One can compute the log likelihood of any
model.

@larsmans larsmans deleted the log-loss branch July 27, 2013 07:55
@larsmans
Copy link
Member Author

It's the log-likelihood of a categorical distribution, i.e. a classifier. We could still rename it classifier_log_likelihood_score, but then I'd rather revert the last commit and call everything log-loss.

@mblondel
Copy link
Member

How do you plan to handle the hinge loss, for instance? (sorry I haven't
had the time to follow the discussion regarding the new API)

@arjoly
Copy link
Member
arjoly commented Jul 27, 2013

@larsmans can you add your name in the author of the metrics module? :-)

Thanks !!!

@GaelVaroquaux
Copy link
Member

It's the log-likelihood of a categorical distribution, i.e. a classifier.

Isn't it the same thing as multinomial distribution? It's a name that I
know better. I would personnally feel more at ease with
'multinomial_loss'.

@ogrisel
Copy link
Member
ogrisel commented Jul 27, 2013

"Log loss" is the name from Bishop so I think it's pretty standard. The problem is the name of the "score" (which is the negative of the log loss). I see two option:

  • just keep the log_loss function, remove the log_likelihood_score function.
  • rename the log_likelihood_score to something more explicit like classification_log_likelihood_score.

Then we need to agree on the name of the scoring string to be consistent with other scoring names which are all positive (greater is better):

  • 'accuracy' sklearn.metrics.accuracy_score
  • 'average_precision' :sklearn.metrics.average_precision_score
  • 'f1' sklearn.metrics.f1_score
  • 'precision' sklearn.metrics.precision_score
  • 'recall' sklearn.metrics.recall_score
  • 'roc_auc' sklearn.metrics.auc_score

@GaelVaroquaux
Copy link
Member

"Log loss" is the name from Bishop so I think it's pretty standard.

I am fine with log_loss

@larsmans
Copy link
Member Author

Ok, reverting in a bit.

@larsmans
Copy link
Member Author

Hadn't seen @ogrisel's comment. Actually we have the same problem with MSE, which is also a loss function with a scorer that flips the sign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0