[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API replace mean_squared_error(square=False) by root_mean_squared_error #26734

Merged
merged 30 commits into from
Sep 16, 2023

Conversation

101AlexMartin
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Create ad-hoc function to calculate RMSE

Any other comments?

As an external user of scikit-learn, I found it counter-intuitive to have a flag inside the MSE function to get RMSE. I think it is difficult to find. In fact, I'm not the only one: https://stackoverflow.com/questions/17197492/is-there-a-library-function-for-root-mean-square-error-rmse-in-python
If you also think this PR might be worth it, I will further elaborate it, generating the tests and creating also an ad-hoc function for the root_mean_square_log_error.

@github-actions
Copy link
github-actions bot commented Jun 29, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f7eea0d. Link to the linter CI: here

@jeremiedbb
Copy link
Member

Thanks for the PR @101AlexMartin. I'm not opposed to have 2 distinct functions for mse and rmse if it makes it easier to find and reduce the confusion.

Regarding the code, you should just call mean_squared_error(square=False) instead of duplicating it.
Please also add a test to check that the new rmse and mse(square=False) give the same results.
If we introduce this function we might as well use it for the neg_mean_squared_error scorer, and in all the places we're currently calling mse(square=False).
I wonder if we should deprecate the square parameter of mse then.

@101AlexMartin
Copy link
Contributor Author

I'd deprecate the square parameter. That's why I didn't call to mse inside the rmse function, but duplicated it instead.
And, indeed, if we deprecate it, I'll make sure there's still robustness in the rest of the functions that call to mse(square=False).
Just let me know if you prefer to deprecate it or still call mse(square=False).

PD.: If you want to deprecate, can you please let me know how is that implemented in the code?

@glemaitre glemaitre changed the title DEV: added ad-hoc function for RMSE ENH add root_mean_squared_error function Jun 30, 2023
@glemaitre
Copy link
Member

This is also confusing to have 2 things that do the same. I prefer to deprecate the square parameter from the mean_squared_error.

For the deprecation, you can refer to https://scikit-learn.org/dev/developers/contributing.html#deprecation

@glemaitre glemaitre changed the title ENH add root_mean_squared_error function API replace mean_squared_error(square=False) by root_mean_squared_error Jun 30, 2023
Comment on lines 535 to 548
y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput
)
check_consistent_length(y_true, y_pred, sample_weight)
output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight))

if isinstance(multioutput, str):
if multioutput == "raw_values":
return output_errors
elif multioutput == "uniform_average":
# pass None as weights to np.average: uniform mean
multioutput = None

return np.average(output_errors, weights=multioutput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput
)
check_consistent_length(y_true, y_pred, sample_weight)
output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight))
if isinstance(multioutput, str):
if multioutput == "raw_values":
return output_errors
elif multioutput == "uniform_average":
# pass None as weights to np.average: uniform mean
multioutput = None
return np.average(output_errors, weights=multioutput)
output_errors = mean_squared_error(
y_true,
y_pred,
sample_weight=sample_weight,
multioutput="raw_values",
)
if isinstance(multioutput, str):
if multioutput == "raw_values":
return output_errors
elif multioutput == "uniform_average":
# pass None as weights to np.average: uniform mean
multioutput = None
return np.average(output_errors, weights=multioutput)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In here you are missing the square root right?

@glemaitre
Copy link
Member
glemaitre commented Jun 30, 2023 via email

@101AlexMartin
Copy link
Contributor Author

Can you please check the version numbers of deprecation?

@101AlexMartin
Copy link
Contributor Author

I'll do the same changes to the MSLE to be consistent. I'll create a new ad-hoc function to calculate RMSLE

@101AlexMartin
Copy link
Contributor Author

I also ensured there are no leftover references to the parameter square in the rest of the project

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments. The target version for the deprecation cycle are 1.4 (start) and 1.6 (end).

To fix the linting issues you need to run black and ruff, or you can install pre-commit, as explained here https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute.

Please also add an entry in the v1.4.rst what's new, and add the 2 new functions in doc/modules/classes.rst.

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
@101AlexMartin
Copy link
Contributor Author

After installing the pre-commit I'm no longer allowed to push the changes, maybe you can give me a hand.
I'm getting an error in one file I did not modify, and I believe this is preventing the push.
I'm getting this after doing the commit:

check yaml...........................................(no files to check)Skipped
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Passed
ruff.....................................................................Passed
mypy.....................................................................Failed
- hook id: mypy
- exit code: 2

sklearn\preprocessing\_polynomial.py: error: invalid syntax  [syntax]
Found 1 error in 1 file (errors prevented further checking)

cython-lint..........................................(no files to check)Skipped

@jeremiedbb
Copy link
Member

The error is not very informative. Hard to guess what's going on with that. git commit -n -m "commit message" allows to skip the pre-commit hook. You can push like that and we'll see in our CI if mypy still complains and if so maybe we'll have a better error description.

@101AlexMartin
Copy link
Contributor Author

Done. Let me know what's the issue, I'm curious about this pre-commit tool.

@jeremiedbb
Copy link
Member

There's a summary of the linting issues in this comment #26734 (comment). Looks like there's only 1 import sorting issue. I don't know what's wrong with your local mypy. The min version we support is 1.3. Another possibility is that you just installed a very recent version of mypy that complains where it did not before.

@101AlexMartin
Copy link
Contributor Author

I can try to install a different version of mypy. However, since changes are already commited, I'm not sure on how to solve this problem. Any suggestion?

@jeremiedbb
Copy link
Member

Our CI is using mypy==1.3. You can install that version. There's nothing more you need to do. Looks like you already pushed your latest changes. You can commit new stuff and push the same way

@101AlexMartin
Copy link
Contributor Author

But there's no more I had to commit for now, all the changes you asked for were implemented. I cannot thus solve the linting issue for now, right?

@jeremiedbb
Copy link
Member

to fix the linting issue you need to run ruff --fix . and commit the resulting changes.
Since you commited without pre-commit enabled, the ruff check was skipped.
If you install mypy==1.3, you can commit with the pre-commit enabled and ruff should fix it automatically. If you've nothing to commit in the first place you can make an empty commit: ```git commit --allow-empty -m "msg"``

@101AlexMartin
Copy link
Contributor Author

Done :)

@OmarManzoor
Copy link
Contributor

@jeremiedbb Could you kindly check if we should consider the codecov errors that are being presented? They seem to relate to the exception conditions in _check_reg_targets, which I don't think are an important part of this PR.

@101AlexMartin
Copy link
Contributor Author

Any update on this? It's been a long while since changes were approved, but still didn't merge the PR

@OmarManzoor
Copy link
Contributor

@101AlexMartin I think the PR looks good overall. There is just the codecov error ( the 1 failing check ). I think it would be good to have a second opinion regarding that and if required we can add a few tests. @glemaitre could you kindly have a quick look?

@101AlexMartin
Copy link
Contributor Author

Hello, any update on this? It's been another month without any update

@glemaitre glemaitre self-requested a review September 15, 2023 11:38
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a mention of the root_mean_squared_error in the model_selection.rst file. I think it woule be enough to add a small paragraph in the "Mean Squared Error" section mentioning the function. We can do the same thing for the root mean squared log error.

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/_scorer.py Show resolved Hide resolved
sklearn/metrics/_scorer.py Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member

I checked the codecov issue but I think this is a false positive.

@101AlexMartin
Copy link
Contributor Author

It would be nice to have a mention of the root_mean_squared_error in the model_selection.rst file. I think it woule be enough to add a small paragraph in the "Mean Squared Error" section mentioning the function. We can do the same thing for the root mean squared log error.

When I access model_selection.rst, I don't see text, just this (so not sure where to write):

.. Places parent toc into the sidebar

:parenttoc: True

.. include:: includes/big_toc_css.rst

.. _model_selection:

Model selection and evaluation
------------------------------

.. toctree::
    :maxdepth: 2

    modules/cross_validation
    modules/grid_search
    modules/model_evaluation
    modules/learning_curve

@glemaitre
Copy link
Member

Oh right, the file is model_evaluation.rst sorry about this.

@101AlexMartin
Copy link
Contributor Author

Thanks for the revision, I pushed the new changes. Now CI fails but when I click on it I think the errors are not related with my push (?)
Also the linux test doesn't even start, so I'm a bit confused

@glemaitre
Copy link
Member

The linting is failing: #26734 (comment). You need to run ruff (or better just install the pre-commit from this repository).

Also, you still have a conflict in the changelog (most probably you need to accept all entries). The CIs will not start until you don't solve this conflict.

@101AlexMartin
Copy link
Contributor Author

The linting is failing: #26734 (comment). You need to run ruff (or better just install the pre-commit from this repository).

Also, you still have a conflict in the changelog (most probably you need to accept all entries). The CIs will not start until you don't solve this conflict.

I have pre-commit installed, but since I did not modify the files failing at ruff then it is skipped. When I run ruff check --fix ., I get these errors, on files that I did not modify (and thus don't know how to fix the issues):

sklearn\metrics\tests\test_classification.py:162:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:163:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:164:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:165:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:164:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:165:12: E721 Do not compare types, use `isinstance()`
sklearn\model_selection\_split.py:2676:12: E721 Do not compare types, use `isinstance()`
sklearn\utils\tests\test_utils.py:528:12: E721 Do not compare types, use `isinstance()`
sklearn\utils\tests\test_utils.py:534:12: E721 Do not compare types, use `isinstance()`
Found 7 errors.

Regarding the conflict on the changelog, I don't know what do you mean by accepting the entries. Maybe you can elaborate a bit more on that.

@glemaitre glemaitre self-requested a review September 16, 2023 11:27
@glemaitre
Copy link
Member

I merge main in the branch to see if it fixes issues and I saw that it was missing a couple of addition to have tests running for the RMSLE scorer. I directly made the changes and pushed them since it was in a file untouched in this PR.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CIs are passing, we are good to merge this one. Thanks @101AlexMartin

@101AlexMartin
Copy link
Contributor Author

Cool, thanks

@glemaitre glemaitre merged commit 2f99cda into scikit-learn:main Sep 16, 2023
26 of 27 checks passed
@glemaitre
Copy link
Member

I checked codecov and the newly created function are covered by the tests. So this is still the same false positive.

@glemaitre
Copy link
Member

Merging.

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023
…or (scikit-learn#26734)

Co-authored-by: Alejandro Martin <alejandro.martingil@tno.nl>
Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…or (scikit-learn#26734)

Co-authored-by: Alejandro Martin <alejandro.martingil@tno.nl>
Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants