Redesign of the sklearn.datasets API #13122

daniel-cortez-stevenson · 2019-02-08T19:45:57Z

@rth At the API level I agree the functional programming style is friendlier. The benefit really would come when the sklearn datasets API is extended, allowing for easier maintenance and ensuring similar behavior. The proposed pull request would not make sense if the datasets API is 'closed'.

@rth I also agree that a the Datasets API in deep learning libraries is more obviously necessary. But, on a leap of faith here, I'm hoping that - through a thoughtful redesign - we may identify useful extensions of a Dataset class that are relevant to supporting reproducibility and performance of scikit-learn's more notable functionality (Estimators/Transformers).

For example, could extending datasets to act as generators increase memory or time performance of sklearn meaningfully?

I'll take a look at OpenML, and open up a fresh issue to start a high level conversation. I hope this PR serves to get some brain juices flowing.

Originally posted by @daniel-cortez-stevenson in #13120 (comment)

The text was updated successfully, but these errors were encountered:

daniel-cortez-stevenson · 2019-02-08T19:50:44Z

Opening up this Issue to discuss a potential redesign of the sklearn.datasets API to support more OO design. A prototype of a redesigned sklearn.datasets.base module can be found at PR #13120 . This was also briefly discussed in #10733 and may facilitate closing #10972 #11818

daniel-cortez-stevenson · 2019-02-08T21:45:45Z

Discussion with a better overview and scope definition has been started at #13123. Closing this issue.

daniel-cortez-stevenson closed this as completed Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Redesign of the sklearn.datasets API #13122

Redesign of the sklearn.datasets API #13122

Uh oh!

Uh oh!

Uh oh!

Redesign of the sklearn.datasets API #13122

Redesign of the sklearn.datasets API #13122

Comments

Uh oh!

Uh oh!