8000 New definition of `poor_score` Estimator tag · Issue #17182 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

New definition of poor_score Estimator tag #17182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cmarmo opened this issue May 11, 2020 · 7 comments · Fixed by #17356
Closed

New definition of poor_score Estimator tag #17182

cmarmo opened this issue May 11, 2020 · 7 comments · Fixed by #17356

Comments

@cmarmo
Copy link
Contributor
cmarmo commented May 11, 2020

Following #16155 Boston dataset is gradually being replaced with other public or synthetic datasets.
Boston dataset is also used to define the estimator tag describing a failure to provide a “reasonable” test-set score (see documentation).
This issue is a follow-up of @adrinjalali comment.
Changing the tag definition will probably affect the tag of some estimators and some test results, this is why I think this is worth a specific issue.
Related issues and PR: #16897 , #16799.

@amueller
Copy link
Member

Good point. I'm not sure if the tag overall is that well-defined. Maybe using a synthetic dataset would be better? If it's synthetic and all estimators are expected to work 'well' on it, it needs to be linear, though, I guess? Which would make it very simple.

@adrinjalali
Copy link
Member

In one of the PRs (which one was it @lucyleeow ?) we agreed on make_classification I think.

@cmarmo
Copy link
Contributor Author
cmarmo commented May 11, 2020

Yes, @lucyleeow proposed #16897. There is no explicit decision about poor_score there. The poor_score is already defined on a syntethic dataset for classification problems. The issue is for regressors: they use Boston as a reference. #16897 can't be solved without a consensus on the poor_score definition for regressors, because it will probably change the current tagging for some regressors.

@lucyleeow
Copy link
Member

because it will probably change the current tagging for some regressors.

I guess another question is do we want the synthetic dataset to perform similarly to boston (such that the poor_score tags remain pretty much unchanged) or have it perform differently?

@ogrisel
Copy link
Member
ogrisel commented May 26, 2020

Let's try with make_regression, the point is just to run smoke tests on regressors, right?

@glemaitre
Copy link
Member

I made a PR at #17356 with make_regression. It has a similar behaviour regarding the score having all estimators > 0.7 apart of the 3 estimators failing (PLS, DummyRegressor and CCA).

@glemaitre
Copy link
Member

Let's try with make_regression, the point is just to run smoke tests on regressors, right?

In the common tests, they are all smoke tests (check that it works with y) and one test that checks the performance of the regressor (which I consider a smoke test as well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0