8000 mean_squared_log_error - accepts targets with negative values · Issue #9963 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

mean_squared_log_error - accepts targets with negative values #9963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stefansimik opened this issue Oct 20, 2017 · 5 comments · Fixed by #9968
Closed

mean_squared_log_error - accepts targets with negative values #9963

stefansimik opened this issue Oct 20, 2017 · 5 comments · Fixed by #9968

Comments

@stefansimik
Copy link
stefansimik commented Oct 20, 2017

Description

I found problem, where mean_squared_log_error() function does not catch error.

Steps/Code to Reproduce

Imagine simple situation, where regression model predicts negative value:

import numpy as np

y_true = np.array([1, 2, 3])
y_pred = np.array([1, -2, 3])
mean_squared_log_error(y_true, y_pred)

This happened, when my regression model predicted 1 negative value (among thousands of positive values).

Expected correct behavior:

Expected behavior is, that exception:
ValueError("Mean Squared Logarithmic Error cannot be used when targets contain negative values.")
should be raised and correctly inform about the underlying problem.

Exact location, where is the bug

Problematic code is exactly in sklearn/metrics/regression.py: (around line 313)

if not (y_true >= 0).all() and not (y_pred >= 0).all():
        raise ValueError("Mean Squared Logarithmic Error cannot be used when "
                         "targets contain negative values.")

The condition is not fully correct and evaluates to False for example above - which is wrong.
It should evaluate to True, and raise the exception.

Suggested solution:

Just change the condition to:

if (y_true < 0).any() or (y_pred < 0).any():
        raise ValueError("Mean Squared Logarithmic Error cannot be used when "
                         "targets contain negative values.")

so it catches the problem in case any of the y_true or y_pred contain negative value.

@amueller
Copy link
Member

or we could change and to an or, right? but I guess the operator precedence is unclear. So PR with regression test welcome. I think your version is more clear.

gustavodemari added a commit to gustavodemari/scikit-learn that referenced this issue Oct 21, 2017
jnothman pushed a commit that referenced this issue Oct 25, 2017
* fixes msle when the inputs is negative, resolves #9963

* adding some regression tests for msle metric
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
…kit-learn#9968)

* fixes msle when the inputs is negative, resolves scikit-learn#9963

* adding some regression tests for msle metric
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
…kit-learn#9968)

* fixes msle when the inputs is negative, resolves scikit-learn#9963

* adding some regression tests for msle metric
@loretoparisi
Copy link
loretoparisi commented Mar 16, 2020

@amueller we have found that the release 0.22.1 of 3 March

https://scikit-learn.org/stable/whats_new/v0.22.html#version-0-22-1

have caused this issue again (running on the same code):

Traceback (most recent call last):
  File "run.py", line 44, in <module>
    run()
  File "run.py", line 40, in run
    xgboost_train_model(dataset, TRAIN_NAME=TRAIN_NAME, TRAIN_TYPE=TRAIN_TYPE, FEATURE_TYPE=FEATURE_TYPE)
  File "/root/src/train.py", line 191, in xgboost_train_model
    model_metrics = xgboost_test_model(X_test, y_test, model=multioutputregressor)
  File "/root/src/train.py", line 270, in xgboost_test_model
    mean_squared_log_error = np.round(metrics.mean_squared_log_error(y_test, y_predicted), 3)
  File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 326, in mean_squared_log_error
    raise ValueError("Mean Squared Logarithmic Error cannot be used when "

13:06:13
ValueError: Mean Squared Logarithmic Error cannot be used when targets contain negative values.
ValueError: Mean Squared Logarithmic Error cannot be used when targets contain negative values.

Strangely we have put also a

sklearn<0.22.1

but it seems it does not work. Is it possible that this issue cam out in a previous minor release. We will check with exact version we are using and report here.

[UPDATE]
This happens with

>>> sklearn.__version__
'0.21.3'

@loretoparisi
Copy link
loretoparisi commented Mar 16, 2020

[UPDATE]
@amueller We have tested our code and found that actually scikit-learn has this issue again with 0.22.1.
What we did was then to force the installation of requirements one by one like

cat requirements.txt | xargs -n 1 -L 1 pip install

so that pip will be forced to install sklearn<0.22.1, hence the very previous one that was '0.21.3'. In this way with the requirements having

req == 0.21.3

the code now works not using the 0.22.1.

Hope this helps.

@jnothman
Copy link
Member

@loretoparisi the post here is referring to a bug where an error was not being raised that should have been raised. You are showing now that this error is correctly raised. Therefore the bug has been fixed.

@m-zakeri
Copy link
7A9B
m-zakeri commented Sep 28, 2020

Why the raise ValueError happen when (y_true < 0) or (y_pred < 0), while log1p is used which can handle negative number greater than -1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0