10000 check_decision_proba_consistency fails with LinearDiscriminantAnalysis · Issue #19224 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

check_decision_proba_consistency fails with LinearDiscriminantAnalysis #19224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ogrisel opened this issue Jan 21, 2021 · 0 comments
Open

check_decision_proba_consistency fails with LinearDiscriminantAnalysis #19224

ogrisel opened this issue Jan 21, 2021 · 0 comments
Labels
Bug module:test-suite everything related to our tests

Comments

@ogrisel
Copy link
Member
ogrisel commented Jan 21, 2021

The following common check recently started to fail on a Windows CI job:

name = 'LinearDiscriminantAnalysis'
estimator_orig = LinearDiscriminantAnalysis()

    @ignore_warnings(category=FutureWarning)
    def check_decision_proba_consistency(name, estimator_orig):
        # Check whether an estimator having both decision_function and
        # predict_proba methods has outputs with perfect rank correlation.
    
        centers = [(2, 2), (4, 4)]
        X, y = make_blobs(n_samples=100, random_state=0, n_features=4,
                          centers=centers, cluster_std=1.0, shuffle=True)
        X_test = np.random.randn(20, 2) + 4
        estimator = clone(estimator_orig)
    
        if (hasattr(estimator, "decision_function") and
                hasattr(estimator, "predict_proba")):
    
            estimator.fit(X, y)
            # Since the link function from decision_function() to predict_proba()
            # is sometimes not precise enough (typically expit), we round to the
            # 10th decimal to avoid numerical issues.
            a = estimator.predict_proba(X_test)[:, 1].round(decimals=10)
            b = estimator.decision_function(X_test).round(decimals=10)
>           assert_array_equal(rankdata(a), rankdata(b))
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 2 / 20 (10%)
E           Max absolute difference: 0.5
E           Max relative difference: 0.02631579
E            x: array([ 7. ,  8. , 11. ,  9. , 17. , 10. ,  5. , 14. ,  6. ,  1. , 19.5,
E                   4. ,  2. , 16. , 12. , 13. ,  3. , 15. , 19.5, 18. ])
E            y: array([ 7.,  8., 11.,  9., 17., 10.,  5., 14.,  6.,  1., 20.,  4.,  2.,
E                  16., 12., 13.,  3., 15., 19., 18.])

This happened on this PR which should not have any impact on the behavior LinearDiscriminantAnalysis.predict_proba: #17743.

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=25474&view=logs&j=d32b16b6-cb9d-571b-e765-de83708fb8dd&t=b93f76c1-c2c9-579e-c2ec-c4f438af1261

I suspect the test to be too brittle. Maybe using a test set more related to the original distribution (blobs) would avoid ties or caused by arbitrary rounding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug module:test-suite everything related to our tests
Projects
None yet
Development

No branches or pull requests

1 participant
0