[WIP] Use dataset factories for estimator checks #17544

rtavenar · 2020-06-09T12:08:33Z

Reference Issues/PRs

This is a follow-up of #14057

What does this implement/fix? Explain your changes.

The idea here is to provide a dataset factory to the estimator checks. This would allow downstream packages to define their own dataset factory to check their estimators on the specific kind of data they expect as input.

Any other comments?

This first commit is a small step towards implementing @rth 's suggestion, just to make sure that I did not misunderstand his suggestion.
The idea would be to use the dataset factory for all check_* functions (or at least for as many functions as possible).

…into dataset_factory # Conflicts: # sklearn/utils/estimator_checks.py

rth

Thanks for working on this @rtavenar ! Yeah I was thinking something along these lines. Maybe,

def default_dataset_factory(n_samples=50, kind='random', **tags):
    ...

and then used as

dataset_factory(kind='random', **estimator.get_tags())

One thing is we would need something like kind parameter because I suspect we would still need different types of datasets even e.g. for dense data. WDYT?

sklearn/utils/estimator_checks.py

rth · 2020-06-09T12:42:17Z

BTW, there might be some intersection with generating datasets for benchmarks in #17026 cc @jeremiedbb

Edit: pinging a potential reviews just to get some general feedback before implementing this further.

rtavenar · 2020-06-09T12:43:59Z

Yes, you're right that tags already contain a lot of useful information, I will use those when possible.

Start working on the dataset factory idea

d2bcd5a

github-actions bot added the module:utils label Jun 9, 2020

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

f177d2a

…into dataset_factory # Conflicts: # sklearn/utils/estimator_checks.py

rth reviewed Jun 9, 2020

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

rth requested review from NicolasHug and glemaitre June 9, 2020 12:43

rtavenar added 3 commits June 9, 2020 15:36

Use estimator tags when possible

5bcdf31

Fixed linting issue

79e5896

Fix _get_tags

d540dcc

Base automatically changed from master to main January 22, 2021 10:52

glemaitre removed their request for review December 16, 2021 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Use dataset factories for estimator checks #17544

[WIP] Use dataset factories for estimator checks #17544

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] Use dataset factories for estimator checks #17544

Are you sure you want to change the base?

[WIP] Use dataset factories for estimator checks #17544

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!