8000 TST Add a check for dtype preservation on Regressors' predictions by Diwakar-Gupta · Pull Request #22763 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

TST Add a check for dtype preservation on Regressors' predictions #22763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

TST Add a check for dtype preservation on Regressors' predictions #22763

wants to merge 2 commits into from

Conversation

Diwakar-Gupta
Copy link
Contributor
@Diwakar-Gupta Diwakar-Gupta commented Mar 11, 2022

Reference Issues/PRs

Fixes #22682

What does this implement/fix? Explain your changes.

check_regressor_preserve_dtypes function is added in estimator_checks.py it looks for regressor models with the 'preserves dtype' tag to ensure that the label dtype is preserved when predict is called.

@Diwakar-Gupta
Copy link
Contributor Author
Diwakar-Gupta commented Mar 11, 2022

If check_regressor_preserve_dtypes miss any test case then please let me know.

How can I hook this function check, when the default test is executed.

After this PR gets merged new issue has to be raised to hook all regressors which holds this property.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this, @Diwakar-Gupta.

@jjerphan jjerphan changed the title Regressors check for dtype preservation TST Add a check for dtype preservation on Regressors' predictions Mar 11, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR gets merged new issue has to be raised to hook all regressors which holds this property.

I think we need to hook it up now so we actually run the code on an estimator that has the property. Hooking it up is very similar to #21355 but listing out the regressors that fails to ignore and calling check_regressor_preserve_dtypes directly.

After #22525, I think BayesianRidge should have the property.

Comment on lines 1738 to 1739
X, y = make_regression(random_state=42, n_targets=5, n_samples=100, n_features=3)
X = StandardScaler().fit_transform(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

@ignore_warnings(category=FutureWarning)
def check_regressors_train(
name, regressor_orig, readonly_memmap=False, X_dtype=np.float64
):

for a demonstration of generating a regression dataset.

@Diwakar-Gupta
Copy link
Contributor Author
Diwakar-Gupta commented Mar 13, 2022

After adding

def _more_tags(self):
    return {"preserves_dtype": [np.float64, np.float32]} 

to class sklearn.linear_model._base.LinearRegression and executing
pytest sklearn/tests/test_common.py -k "LinearRegression and test_regressor_preserve_dtypes" I was able to get expected error

=========================================== test session starts ===========================================
platform linux -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0
rootdir: /media/zero/mediapart/opensource/scikit-learn, configfile: setup.cfg
plugins: cov-3.0.0
collected 8852 items / 8851 deselected / 1 selected                                                                             

sklearn/tests/test_common.py F                                                                                            [100%]

=========================================================== FAILURES ============================================================
_______________________________________ test_regressor_preserve_dtypes[LinearRegression] ________________________________________

Regressor = <class 'sklearn.linear_model._base.LinearRegression'>

    @pytest.mark.parametrize(
        "Regressor",
        [est for name, est in all_estimators() if name in VALIDATE_REGRESSOR_DTYPE],
    )
    def test_regressor_preserve_dtypes(Regressor):
>       check_regressor_preserve_dtypes(Regressor.__name__, Regressor)

sklearn/tests/test_common.py:465: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

name = 'LinearRegression', regressor_orig = LinearRegression()

    def check_regressor_preserve_dtypes(name, regressor_orig):
        # Regressors predictions must have the same dtype
        # than the one of the data.
        regressor_orig = regressor_orig()
    
        X, y = _regression_dataset()
        X = _pairwise_estimator_convert_X(X, regressor_orig)
        regressor = clone(regressor_orig)
        y = _enforce_estimator_tags_y(regressor, y)
    
        for dtype in _safe_tags(regressor_orig, key="preserves_dtype"):
            y_cast = y.astype(dtype)
            regressor = clone(regressor_orig)
            regressor.fit(X, y_cast)
            y_pred = regressor.predict(X)
    
>           assert y_pred.dtype == dtype, (
                f"{regressor.__class__.__name__}.predict returns "
                f"{y_pred.dtype} predictions instead of "
                f"{dtype.__name__} predictions."
            )
E           AssertionError: LinearRegression.predict returns float64 predictions instead of float32 predictions.

sklearn/utils/estimator_checks.py:1751: AssertionError
======================================== 1 failed, 8851 deselected, 57 warnings in 4.30s ========================================

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Diwakar-Gupta. Please find below how we expect this new check to be integrated in the scikit-learn test suite to run automatically for any estimator that has the preserves_dtype tag defined.

@ogrisel
Copy link
Member
ogrisel commented Mar 21, 2022

LGTM but before merging I would like to make sure that we have at least one estimator which has the relevant tags to make sure that this new test passes on int. @thomasjpfan mentioned previously that it might be the case for BayesianRidge:

After #22525, I think BayesianRidge should have the property.

Can you please try if this holds by setting the _more_tags attribute on it?

@ogrisel
Copy link
Member
ogrisel commented Mar 22, 2022

The documentation of the preserves_dtype tag needs to updated to also include regressors:

https://scikit-learn.org/stable/developers/develop.html#estimator-tags

The source of this page is in: doc/developers/develop.rst.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Estimator check for dtype preservation for regressors
4 participants
0