API replace mean_squared_error(square=False) by root_mean_squared_error #26734

101AlexMartin · 2023-06-29T21:37:42Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Create ad-hoc function to calculate RMSE

Any other comments?

As an external user of scikit-learn, I found it counter-intuitive to have a flag inside the MSE function to get RMSE. I think it is difficult to find. In fact, I'm not the only one: https://stackoverflow.com/questions/17197492/is-there-a-library-function-for-root-mean-square-error-rmse-in-python
If you also think this PR might be worth it, I will further elaborate it, generating the tests and creating also an ad-hoc function for the root_mean_square_log_error.

github-actions · 2023-06-29T21:39:27Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: f7eea0d. Link to the linter CI: here}

jeremiedbb · 2023-06-30T10:12:22Z

Thanks for the PR @101AlexMartin. I'm not opposed to have 2 distinct functions for mse and rmse if it makes it easier to find and reduce the confusion.

Regarding the code, you should just call mean_squared_error(square=False) instead of duplicating it.
Please also add a test to check that the new rmse and mse(square=False) give the same results.
If we introduce this function we might as well use it for the neg_mean_squared_error scorer, and in all the places we're currently calling mse(square=False).
I wonder if we should deprecate the square parameter of mse then.

101AlexMartin · 2023-06-30T12:28:23Z

I'd deprecate the square parameter. That's why I didn't call to mse inside the rmse function, but duplicated it instead.
And, indeed, if we deprecate it, I'll make sure there's still robustness in the rest of the functions that call to mse(square=False).
Just let me know if you prefer to deprecate it or still call mse(square=False).

PD.: If you want to deprecate, can you please let me know how is that implemented in the code?

glemaitre · 2023-06-30T20:48:56Z

This is also confusing to have 2 things that do the same. I prefer to deprecate the square parameter from the mean_squared_error.

For the deprecation, you can refer to https://scikit-learn.org/dev/developers/contributing.html#deprecation

glemaitre · 2023-06-30T20:54:16Z

sklearn/metrics/_regression.py

+    y_type, y_true, y_pred, multioutput = _check_reg_targets(
+        y_true, y_pred, multioutput
+    )
+    check_consistent_length(y_true, y_pred, sample_weight)
+    output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight))
+
+    if isinstance(multioutput, str):
+        if multioutput == "raw_values":
+            return output_errors
+        elif multioutput == "uniform_average":
+            # pass None as weights to np.average: uniform mean
+            multioutput = None
+
+    return np.average(output_errors, weights=multioutput)


Suggested change

y_type, y_true, y_pred, multioutput = _check_reg_targets(

y_true, y_pred, multioutput

)

check_consistent_length(y_true, y_pred, sample_weight)

output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight))

if isinstance(multioutput, str):

if multioutput == "raw_values":

return output_errors

elif multioutput == "uniform_average":

# pass None as weights to np.average: uniform mean

multioutput = None

return np.average(output_errors, weights=multioutput)

output_errors = mean_squared_error(

y_true,

y_pred,

sample_weight=sample_weight,

multioutput="raw_values",

)

if isinstance(multioutput, str):

if multioutput == "raw_values":

return output_errors

elif multioutput == "uniform_average":

# pass None as weights to np.average: uniform mean

multioutput = None

return np.average(output_errors, weights=multioutput)

In here you are missing the square root right?

glemaitre · 2023-06-30T21:19:03Z

Oups indeed 😁 I just wanted to show the pattern to reuse the code of mean squared errorSent from my iPhoneOn 30 Jun 2023, at 23:16, 101AlexMartin ***@***.***> wrote: @101AlexMartin commented on this pull request. In sklearn/metrics/_regression.py:

+ y_type, y_true, y_pred, multioutput = _check_reg_targets(

+ y_true, y_pred, multioutput + ) + check_consistent_length(y_true, y_pred, sample_weight) + output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)) + + if isinstance(multioutput, str): + if multioutput == "raw_values": + return output_errors + elif multioutput == "uniform_average": + # pass None as weights to np.average: uniform mean + multioutput = None + + return np.average(output_errors, weights=multioutput) In here you are missing the square root right? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

101AlexMartin · 2023-07-01T11:32:39Z

Can you please check the version numbers of deprecation?

101AlexMartin · 2023-07-01T12:00:37Z

I'll do the same changes to the MSLE to be consistent. I'll create a new ad-hoc function to calculate RMSLE

101AlexMartin · 2023-07-01T12:47:03Z

I also ensured there are no leftover references to the parameter square in the rest of the project

jeremiedbb

Here are some comments. The target version for the deprecation cycle are 1.4 (start) and 1.6 (end).

To fix the linting issues you need to run black and ruff, or you can install pre-commit, as explained here https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute.

Please also add an entry in the v1.4.rst what's new, and add the 2 new functions in doc/modules/classes.rst.

sklearn/metrics/_regression.py

sklearn/metrics/tests/test_regression.py

101AlexMartin · 2023-07-06T13:46:34Z

After installing the pre-commit I'm no longer allowed to push the changes, maybe you can give me a hand.
I'm getting an error in one file I did not modify, and I believe this is preventing the push.
I'm getting this after doing the commit:

check yaml...........................................(no files to check)Skipped
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Passed
ruff.....................................................................Passed
mypy.....................................................................Failed
- hook id: mypy
- exit code: 2

sklearn\preprocessing\_polynomial.py: error: invalid syntax  [syntax]
Found 1 error in 1 file (errors prevented further checking)

cython-lint..........................................(no files to check)Skipped

jeremiedbb · 2023-07-06T13:52:38Z

The error is not very informative. Hard to guess what's going on with that. git commit -n -m "commit message" allows to skip the pre-commit hook. You can push like that and we'll see in our CI if mypy still complains and if so maybe we'll have a better error description.

101AlexMartin · 2023-07-06T13:55:02Z

Done. Let me know what's the issue, I'm curious about this pre-commit tool.

jeremiedbb · 2023-07-06T14:03:19Z

There's a summary of the linting issues in this comment #26734 (comment). Looks like there's only 1 import sorting issue. I don't know what's wrong with your local mypy. The min version we support is 1.3. Another possibility is that you just installed a very recent version of mypy that complains where it did not before.

101AlexMartin · 2023-07-06T14:10:31Z

I can try to install a different version of mypy. However, since changes are already commited, I'm not sure on how to solve this problem. Any suggestion?

jeremiedbb · 2023-07-06T14:20:45Z

Our CI is using mypy==1.3. You can install that version. There's nothing more you need to do. Looks like you already pushed your latest changes. You can commit new stuff and push the same way

101AlexMartin · 2023-07-06T14:26:08Z

But there's no more I had to commit for now, all the changes you asked for were implemented. I cannot thus solve the linting issue for now, right?

jeremiedbb · 2023-07-06T14:29:57Z

to fix the linting issue you need to run ruff --fix . and commit the resulting changes.
Since you commited without pre-commit enabled, the ruff check was skipped.
If you install mypy==1.3, you can commit with the pre-commit enabled and ruff should fix it automatically. If you've nothing to commit in the first place you can make an empty commit: ```git commit --allow-empty -m "msg"``

sklearn/metrics/_regression.py

OmarManzoor · 2023-07-17T08:16:33Z

@jeremiedbb Could you kindly check if we should consider the codecov errors that are being presented? They seem to relate to the exception conditions in _check_reg_targets, which I don't think are an important part of this PR.

101AlexMartin · 2023-08-17T12:17:48Z

Any update on this? It's been a long while since changes were approved, but still didn't merge the PR

OmarManzoor · 2023-08-17T13:14:14Z

@101AlexMartin I think the PR looks good overall. There is just the codecov error ( the 1 failing check ). I think it would be good to have a second opinion regarding that and if required we can add a few tests. @glemaitre could you kindly have a quick look?

101AlexMartin · 2023-09-11T20:00:45Z

Hello, any update on this? It's been another month without any update

glemaitre

It would be nice to have a mention of the root_mean_squared_error in the model_selection.rst file. I think it woule be enough to add a small paragraph in the "Mean Squared Error" section mentioning the function. We can do the same thing for the root mean squared log error.

sklearn/metrics/_regression.py

sklearn/metrics/_scorer.py

sklearn/metrics/_regression.py

sklearn/metrics/tests/test_regression.py

glemaitre · 2023-09-15T12:01:58Z

I checked the codecov issue but I think this is a false positive.

101AlexMartin · 2023-09-15T19:30:31Z

It would be nice to have a mention of the root_mean_squared_error in the model_selection.rst file. I think it woule be enough to add a small paragraph in the "Mean Squared Error" section mentioning the function. We can do the same thing for the root mean squared log error.

When I access model_selection.rst, I don't see text, just this (so not sure where to write):

.. Places parent toc into the sidebar

:parenttoc: True

.. include:: includes/big_toc_css.rst

.. _model_selection:

Model selection and evaluation
------------------------------

.. toctree::
    :maxdepth: 2

    modules/cross_validation
    modules/grid_search
    modules/model_evaluation
    modules/learning_curve

glemaitre · 2023-09-15T19:34:53Z

Oh right, the file is model_evaluation.rst sorry about this.

101AlexMartin · 2023-09-15T19:52:48Z

Thanks for the revision, I pushed the new changes. Now CI fails but when I click on it I think the errors are not related with my push (?)
Also the linux test doesn't even start, so I'm a bit confused

glemaitre · 2023-09-15T19:55:47Z

The linting is failing: #26734 (comment). You need to run ruff (or better just install the pre-commit from this repository).

Also, you still have a conflict in the changelog (most probably you need to accept all entries). The CIs will not start until you don't solve this conflict.

101AlexMartin · 2023-09-16T11:26:20Z

The linting is failing: #26734 (comment). You need to run ruff (or better just install the pre-commit from this repository).

Also, you still have a conflict in the changelog (most probably you need to accept all entries). The CIs will not start until you don't solve this conflict.

I have pre-commit installed, but since I did not modify the files failing at ruff then it is skipped. When I run ruff check --fix ., I get these errors, on files that I did not modify (and thus don't know how to fix the issues):

sklearn\metrics\tests\test_classification.py:162:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:163:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:164:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:165:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:164:12: E721 Do not compare types, use `isinstance()`
sklearn\metrics\tests\test_classification.py:165:12: E721 Do not compare types, use `isinstance()`
sklearn\model_selection\_split.py:2676:12: E721 Do not compare types, use `isinstance()`
sklearn\utils\tests\test_utils.py:528:12: E721 Do not compare types, use `isinstance()`
sklearn\utils\tests\test_utils.py:534:12: E721 Do not compare types, use `isinstance()`
Found 7 errors.

Regarding the conflict on the changelog, I don't know what do you mean by accepting the entries. Maybe you can elaborate a bit more on that.

glemaitre · 2023-09-16T11:52:34Z

I merge main in the branch to see if it fixes issues and I saw that it was missing a couple of addition to have tests running for the RMSLE scorer. I directly made the changes and pushed them since it was in a file untouched in this PR.

glemaitre

If the CIs are passing, we are good to merge this one. Thanks @101AlexMartin

101AlexMartin · 2023-09-16T12:57:41Z

Cool, thanks

glemaitre · 2023-09-16T12:59:06Z

I checked codecov and the newly created function are covered by the tests. So this is still the same false positive.

glemaitre · 2023-09-16T12:59:11Z

Merging.

…or (scikit-learn#26734) Co-authored-by: Alejandro Martin <alejandro.martingil@tno.nl> Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DEV: added ad-hoc function for RMSE

d5800ef

github-actions bot added the module:metrics label Jun 29, 2023

glemaitre changed the title ~~DEV: added ad-hoc function for RMSE~~ ENH add root_mean_squared_error function Jun 30, 2023

glemaitre changed the title ~~ENH add root_mean_squared_error function~~ API replace mean_squared_error(square=False) by root_mean_squared_error Jun 30, 2023

glemaitre reviewed Jun 30, 2023

View reviewed changes

API: deprecated square argument in MSE

2b9c65d

AlexMGTNO added 2 commits July 1, 2023 13:39

TST: deprecated square paremeter in test functions

b361563

TST: added deprecation test for MSE

5799341

AlexMGTNO added 3 commits July 1, 2023 14:34

API: added function for RMSLE

0b7f6a1

REF: refactored code for RMSLE

68e4c11

TST: addded tests for RMSLE

2685bb9

jeremiedbb reviewed Jul 6, 2023

View reviewed changes

101AlexMartin force-pushed the add-rmse branch from 49b3e83 to 2685bb9 Compare July 6, 2023 13:29

API: applied changed

5d5344f

OmarManzoor reviewed Jul 17, 2023

View reviewed changes

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

AlexMGTNO added 2 commits July 17, 2023 12:04

Added point to docstring

a37b284

TST: corrected warning message

1cf0607

glemaitre self-requested a review September 15, 2023 11:38

glemaitre reviewed Sep 15, 2023

View reviewed changes

AlexMGTNO added 2 commits September 15, 2023 21:39

Reviewed changes

eb5b005

Added RMSE and RMSLE to model_evaluation.rst

5d37088

glemaitre self-requested a review September 16, 2023 11:27

glemaitre added 2 commits September 16, 2023 13:28

Merge remote-tracking branch 'origin/main' into pr/101AlexMartin/26734

e3c6475

fixes for RMSLE

f7eea0d

glemaitre approved these changes Sep 16, 2023

View reviewed changes

glemaitre merged commit 2f99cda into scikit-learn:main Sep 16, 2023

einarwar mentioned this pull request Sep 18, 2024

Add stubs for RMSE and RMSLE in sklearn.metrics microsoft/python-type-stubs#317

Merged

Ab2nour mentioned this pull request Apr 20, 2025

[MRG] ENH add macro-averaged mean squared error scikit-learn-contrib/imbalanced-learn#1134

Open

Uh oh!

API replace mean_squared_error(square=False) by root_mean_squared_error #26734

API replace mean_squared_error(square=False) by root_mean_squared_error #26734

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!