FIX adapt epsilon value depending of the dtype of the input #24354

Safikh · 2022-09-04T16:34:15Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Change the default epsilon value in logloss from 1e-15 to auto which is equal to eps of y_pred's dtype if y_pred
is a numpy float array else it defaults to 1e-15 as earlier.

Any other comments?

…gloss_float32_fix

Safikh · 2022-09-05T17:10:07Z

@glemaitre Made the changes as per your comment.

…oss_float32_fix

glemaitre

Note to other maintainer:
We should not forget to add @gsiisg as Co-Author when merging the PR.

doc/whats_new/v1.2.rst

glemaitre · 2022-09-06T09:11:52Z

doc/whats_new/v1.2.rst

+- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to
+  eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was
  made to be able to handle float16 and float32 numpy arrays


Suggested change

- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to

eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was

made to be able to handle float16 and float32 numpy arrays

- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input

arrays. `eps` was previously too small by default when passing lower precision

floating point arrays.

We should also add an entry in the "Change models" section stating that we switch from 1e-15 to np.finfo(np.float64).eps by default.

Where is the "Change models" section in the codebase?

sklearn/metrics/_classification.py

sklearn/metrics/tests/test_classification.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…gloss_float32_fix

Micky774

Thanks for the PR @Safikh! A few small notes.

Micky774 · 2022-09-06T11:26:03Z

sklearn/metrics/tests/test_classification.py

+        y_true = [[0, 1]]
+        y_score = np.array([1], dtype=dtype)
+        loss = log_loss(y_true, y_score, eps="auto")
+        assert_allclose(loss, 0.000977, atol=1e-3)


It would be clearer if this value were computed rather than hard-coded.

sklearn/metrics/_classification.py

Micky774 · 2022-09-06T11:29:45Z

doc/whats_new/v1.2.rst

+- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input
+  arrays. `eps` was previously too small by default when passing lower precision
+  floating point arrays.
+  :pr:`24354` by :user:`Safiuddin Khaja <Safikh>` and
+  :user:`gsiisg <gsiisg>`


The changelog entry should reflect that you are adding a new keyword option to eps and changing the default value of eps.

…gloss_float32_fix

gsiisg · 2022-09-06T22:12:49Z

Big thanks to @Safikh and @glemaitre and all contributors!
This is my first time contributing to scikit-learn, so wasn't familiar with the process and created some confusion with a second pull request, will stick with this one from now on. Just want to add a comment to the "eps" comment section, should say that all input of y_pred will pass through sklearn.utils.check_array which will cast ordinary int/float into int64/float64, so that even if the input was not explicit, it will be treated with 64 bit precision unless specified otherwise as in np.float16/32. etc. At first I was worried that y_pred.dtype will error out if the input was [1] etc, but realized later it will be cast as array([1]) which will be 64 bit by default.

…gloss_float32_fix

Safikh · 2022-09-07T15:06:23Z

I'm facing a weird bug where if I set the eps to dtype.eps, the test sklearn/metrics/tests/test_common.py::test_not_symmetric_metric fails. If I set it to something else (that is not a multiple of eps), it works.

gsiisg

I think we should avoid the hard coding of dtype=np.float64 in check_array(), it will blow up the memory used if the original input was np.float16/32, wouldn't it? And if y_pred is cast as 64 bit then we wouldn't have problem with the eps=1e-15 in the first place.

Micky774 · 2022-09-08T03:29:46Z

I think we should avoid the hard coding of dtype=np.float64 in check_array(), it will blow up the memory used if the original input was np.float16/32, wouldn't it? And if y_pred is cast as 64 bit then we wouldn't have problem with the eps=1e-15 in the first place.

You're right. I think it should instead be dtype=[np.float64, np.float32, np.float16] in which case it'll convert to np.float64 if it is anything other than those floating-type. In the int{32, 64} case it'll convert to np.float64 which should be fine.

…t-learn into logloss_float32_fix

…gloss_float32_fix

doc/whats_new/v1.2.rst

sklearn/metrics/_classification.py

glemaitre · 2022-09-12T07:52:53Z

sklearn/metrics/_classification.py

+    y_pred = check_array(
+        y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16]
+    )
+    eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps


what is the reason for multiplying by 1.0001. This looks really arbritrary.

The test sklearn/metrics/tests/test_common.py::test_not_symmetric_metric fails for eps. I have not been able to identify why it is happening. But a very minute change to eps seems to fix it.

Yep, the test is not adapted for the log_loss. Indeed, the loss is symmetric when it contains only 0/1 values but not otherwise. Basically, since it takes y_proba and y_pred, the tests make little sense.

We should correct this.

So, should I remove log_loss from the symmetric metrics? Then there would be no need for modifying the eps?

Yes we need to remove the loss from the symmetric metrics and remove this 1.0001.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…gloss_float32_fix

Micky774 · 2022-09-12T16:18:19Z

doc/whats_new/v1.2.rst

+- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`.
+  This option will automatically set the `eps` value depending on the data
+  type `y_pred`.


The wording here still needs to be more explicit.

Also, at this point I would consider this an enhancement which also happens to fix a bug. Wondering what the maintainers think.

Suggested change

- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`.

This option will automatically set the `eps` value depending on the data

type `y_pred`.

- |Enhancement| Adds an `"auto"` option to `eps` in :func:`metrics.logloss`.

This option will automatically set the `eps` value depending on the data

type of `y_pred`. In addition, the default value of `eps` is changed from

`1e-15` to the new `"auto"` option.

Fair enough.

Safikh · 2022-09-29T16:39:00Z

Hi @glemaitre, I think this PR is complete. Is there anything that needs to be changed?

glemaitre

We need to remove this 1.0001.

sklearn/metrics/_classification.py

glemaitre · 2022-11-03T10:59:47Z

sklearn/metrics/_classification.py

+    y_pred = check_array(
+        y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16]
+    )
+    eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps


Yes we need to remove the loss from the symmetric metrics and remove this 1.0001.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre

LGTM. Thanks @Safikh.

@Micky774 @ogrisel do you want to have a look.

Micky774

Overall looks good. I had a couple of small wording nits, and a concern regarding testing the np.float16 dtype. If those are addressed then this should be ready to merge.

doc/whats_new/v1.2.rst

sklearn/metrics/_classification.py

sklearn/metrics/tests/test_classification.py

…t-learn into logloss_float32_fix

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

sklearn/conftest.py

…t-learn into logloss_float32_fix

Micky774

LGTM. @glemaitre feel free to merge if the changes are still acceptable to you

glemaitre · 2022-11-10T08:42:37Z

Thanks @Safikh Merging.

gsiisg · 2022-11-10T22:35:02Z

Thanks everyone!

Change default epsilon in logloss metric from 1e-15 to 1e-7

abc505b

github-actions bot added the module:metrics label Sep 4, 2022

glemaitre changed the title ~~Change default epsilon in logloss metric from 1e-15 to 1e-7~~ FIX adapt epsilon value depending of the dtype of the input Sep 5, 2022

Safikh added 3 commits September 5, 2022 22:00

add 'auto' as default eps value

491f201

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

bc310ba

…gloss_float32_fix

Add the fix to what's new doc

473e8e7

Safikh added 2 commits September 6, 2022 11:44

fix docstring for log_loss function

832680b

Merge branch 'main' of github.com:scikit-learn/scikit-learn into logl…

691af51

…oss_float32_fix

glemaitre reviewed Sep 6, 2022

View reviewed changes

glemaitre mentioned this pull request Sep 6, 2022

Log loss 16 32bit input #24357

Closed

Safikh and others added 4 commits September 6, 2022 15:09

Update doc/whats_new/v1.2.rst

56a7e7c

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Add couathor to PR

e0f1a2e

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

8bf27ac

…gloss_float32_fix

Add test_log_loss_eps_auto as separate test

abbf32a

Micky774 reviewed Sep 6, 2022

View reviewed changes

Safikh added 2 commits September 6, 2022 20:21

minor fix

fcb79cb

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

4df44ff

…gloss_float32_fix

Safikh added 2 commits September 7, 2022 20:33

Add np.float64 as dtype to check_array in log_loss

67d53e8

67E6

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

91dbe36

…gloss_float32_fix

Merge branch 'main' into logloss_float32_fix

a2bb50f

gsiisg suggested changes Sep 8, 2022

View reviewed changes

Safikh added 3 commits September 10, 2022 10:49

Add multiple dtypes to check array

bd0850c

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

17331a9

…t-learn into logloss_float32_fix

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

9d2ebdb

…gloss_float32_fix

glemaitre reviewed Sep 12, 2022

View reviewed changes

glemaitre self-requested a review September 12, 2022 07:53

Safikh and others added 5 commits September 12, 2022 20:16

Update doc/whats_new/v1.2.rst

bf8e020

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/metrics/_classification.py

bfc4b8e

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/metrics/tests/test_classification.py

a72712b

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Add to contributors

b74af1d

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

e7af1d7

…gloss_float32_fix

Micky774 reviewed Sep 12, 2022

View reviewed changes

glemaitre self-requested a review September 13, 2022 13:15

blackify

8cadc7c

Merge remote-tracking branch 'origin/main' into pr/Safikh/24354

00f00d5

glemaitre reviewed Nov 3, 2022

View reviewed changes

Safikh and others added 3 commits November 3, 2022 21:35

Update sklearn/metrics/_classification.py

2e749f4

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge branch 'main' into logloss_float32_fix

b3e1f9b

Remove log loss from symmetric metrics

f3bcc0c

glemaitre approved these changes Nov 3, 2022

View reviewed changes

DOC add the change of default in the "Change models" section

0f37e09

Micky774 reviewed Nov 4, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_classification.py Show resolved Hide resolved

Safikh and others added 4 commits November 4, 2022 09:51

Add float16 test

abdb31c

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

f301c0a

…t-learn into logloss_float32_fix

Apply suggestions from code review

20b4094

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

Merge branch 'main' into logloss_float32_fix

eefd4c7

Micky774 reviewed Nov 5, 2022

View reviewed changes

sklearn/conftest.py Outdated Show resolved Hide resolved

Safikh and others added 3 commits November 8, 2022 21:35

Add log loss test for float16

856cc58

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

8d61be6

…t-learn into logloss_float32_fix

Merge branch 'main' into logloss_float32_fix

521cee1

Micky774 approved these changes Nov 9, 2022

View reviewed changes

glemaitre merged commit f8986ee into scikit-learn:main Nov 10, 2022

Safikh deleted the logloss_float32_fix branch November 11, 2022 04:40

Uh oh!

FIX adapt epsilon value depending of the dtype of the input #24354

FIX adapt epsilon value depending of the dtype of the input #24354

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!