You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@rth At the API level I agree the functional programming style is friendlier. The benefit really would come when the sklearn datasets API is extended, allowing for easier maintenance and ensuring similar behavior. The proposed pull request would not make sense if the datasets API is 'closed'.
@rth I also agree that a the Datasets API in deep learning libraries is more obviously necessary. But, on a leap of faith here, I'm hoping that - through a thoughtful redesign - we may identify useful extensions of a Dataset class that are relevant to supporting reproducibility and performance of scikit-learn's more notable functionality (Estimators/Transformers).
For example, could extending datasets to act as generators increase memory or time performance of sklearn meaningfully?
I'll take a look at OpenML, and open up a fresh issue to start a high level conversation. I hope this PR serves to get some brain juices flowing.
Opening up this Issue to discuss a potential redesign of the sklearn.datasets API to support more OO design. A prototype of a redesigned sklearn.datasets.base module can be found at PR #13120 . This was also briefly discussed in #10733 and may facilitate closing #10972#11818
@rth At the API level I agree the functional programming style is friendlier. The benefit really would come when the sklearn datasets API is extended, allowing for easier maintenance and ensuring similar behavior. The proposed pull request would not make sense if the datasets API is 'closed'.
@rth I also agree that a the Datasets API in deep learning libraries is more obviously necessary. But, on a leap of faith here, I'm hoping that - through a thoughtful redesign - we may identify useful extensions of a
Dataset
class that are relevant to supporting reproducibility and performance of scikit-learn's more notable functionality (Estimators/Transformers).For example, could extending datasets to act as generators increase memory or time performance of sklearn meaningfully?
I'll take a look at OpenML, and open up a fresh issue to start a high level conversation. I hope this PR serves to get some brain juices flowing.
Originally posted by @daniel-cortez-stevenson in #13120 (comment)
The text was updated successfully, but these errors were encountered: