-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX adapt epsilon value depending of the dtype of the input #24354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@glemaitre Made the changes as per your comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to other maintainer:
We should not forget to add @gsiisg as Co-Author
when merging the PR.
doc/whats_new/v1.2.rst
Outdated
- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to | ||
eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was | ||
made to be able to handle float16 and float32 numpy arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to | |
eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was | |
made to be able to handle float16 and float32 numpy arrays | |
- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input | |
arrays. `eps` was previously too small by default when passing lower precision | |
floating point arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add an entry in the "Change models" section stating that we switch from 1e-15
to np.finfo(np.float64).eps
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the "Change models" section in the codebase?
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…gloss_float32_fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Safikh! A few small notes.
y_true = [[0, 1]] | ||
y_score = np.array([1], dtype=dtype) | ||
loss = log_loss(y_true, y_score, eps="auto") | ||
assert_allclose(loss, 0.000977, atol=1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer if this value were computed rather than hard-coded.
doc/whats_new/v1.2.rst
Outdated
- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input | ||
arrays. `eps` was previously too small by default when passing lower precision | ||
floating point arrays. | ||
:pr:`24354` by :user:`Safiuddin Khaja <Safikh>` and | ||
:user:`gsiisg <gsiisg>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changelog entry should reflect that you are adding a new keyword option to eps
and changing the default value of eps
.
…gloss_float32_fix
Big thanks to @Safikh and @glemaitre and all contributors! |
I'm facing a weird bug where if I set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid the hard coding of dtype=np.float64 in check_array(), it will blow up the memory used if the original input was np.float16/32, wouldn't it? And if y_pred is cast as 64 bit then we wouldn't have problem with the eps=1e-15 in the first place.
You're right. I think it should instead be |
…t-learn into logloss_float32_fix
…gloss_float32_fix
sklearn/metrics/_classification.py
Outdated
y_pred = check_array( | ||
y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16] | ||
) | ||
eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for multiplying by 1.0001
. This looks really arbritrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test sklearn/metrics/tests/test_common.py::test_not_symmetric_metric
fails for eps. I have not been able to identify why it is happening. But a very minute change to eps seems to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the test is not adapted for the log_loss
. Indeed, the loss is symmetric when it contains only 0/1 values but not otherwise. Basically, since it takes y_proba
and y_pred
, the tests make little sense.
We should correct this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I remove log_loss from the symmetric metrics? Then there would be no need for modifying the eps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to remove the loss from the symmetric metrics and remove this 1.0001
.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…gloss_float32_fix
doc/whats_new/v1.2.rst
Outdated
- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`. | ||
This option will automatically set the `eps` value depending on the data | ||
type `y_pred`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording here still needs to be more explicit.
Also, at this point I would consider this an enhancement which also happens to fix a bug. Wondering what the maintainers think.
- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`. | |
This option will automatically set the `eps` value depending on the data | |
type `y_pred`. | |
- |Enhancement| Adds an `"auto"` option to `eps` in :func:`metrics.logloss`. | |
This option will automatically set the `eps` value depending on the data | |
type of `y_pred`. In addition, the default value of `eps` is changed from | |
`1e-15` to the new `"auto"` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
Hi @glemaitre, I think this PR is complete. Is there anything that needs to be changed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to remove this 1.0001
.
sklearn/metrics/_classification.py
Outdated
y_pred = check_array( | ||
y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16] | ||
) | ||
eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to remove the loss from the symmetric metrics and remove this 1.0001
.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I had a couple of small wording nits, and a concern regarding testing the np.float16
dtype. If those are addressed then this should be ready to merge.
…t-learn into logloss_float32_fix
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @glemaitre feel free to merge if the changes are still acceptable to you
Thanks @Safikh Merging. |
Thanks everyone! |
Reference Issues/PRs
Fixes #24315
What does this implement/fix? Explain your changes.
Change the default epsilon value in logloss from 1e-15 to auto which is equal to eps of y_pred's dtype if y_pred
is a numpy float array else it defaults to 1e-15 as earlier.
Any other comments?