8000 Add OpenML dataset fetcher · Issue #9543 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Add OpenML dataset fetcher #9543
Closed
@amueller

Description

@amueller

The status of the OpenML API is now so that I think we can relatively easily implement a fetcher for OpenML datasets.

You can see some of the discussion here:
openml/OpenML#218 (comment)

The interface should probably either accepting a name or an id. Names are not unique in OpenML, integer IDs are - but less user friendly.

My suggestion would be to do a search call like

https://openml.org/api/v1/json/data/list/data_name/anneal/limit/1

which searches for the anneal dataset. The result will contain the ID of the first dataset called anneal. Then we can fetch that with a second API call as a CSV.
Finally we probably need to also do a call for the JSON meta-data, wh 65D4 ich tells us which column is the target, and probably also which columns are categorical and which are continuous, and possibly more.
For our interface, we definitely need the target column, though.

This should be fairly straight-forward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementModerateAnything that requires some knowledge of conventions and best practiceshelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0