-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] pulling data from openml.org rather than original data source #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
counts.append(1) | ||
else: | ||
# aggregate monthly sum to produce average | ||
ppmv_sums[-1] += float(ppmv) | ||
ppmv_sums[-1] += float(ppmvs[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, this is redundant
month_float = y + (m - 1) / 12 | ||
ppmvs = ml_data.target | ||
|
||
for i in range(len(ppmvs)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might as well just iterate over zip(month_float, ppmvs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea.
Why is this marked WIP? |
That is, what work do you intend to do before this is safe to consider merging? |
On the openml.org side, the dataset version still needs to be approved and set to "active". While its status is "in_preparation", their admins could in theory reject the dataset and set as "inactive" (due to compliance issues with tasks or workflows, etc) and would break |
@janvanrijn what's the chance of https://www.openml.org/d/41187 not being approved? ;) @maxcopeland is there a reason not to say "Fixes #..." in the PR description? Using that wording, rather than "Works on" means that github will automatically close the original issue when this is merged. |
@jnothman Sorry about that! Edited my PR comment. I'll use "Fixes #..." in the future. Newbie error :/ |
cool, new datasets :) |
@maxcopeland Thanks for uploading the dataset. Seems that there are still some formatting issues in the wiki part. Also, maybe we can provide more information in the wiki (e.g., the url you obtain the data) |
Thanks @qinhanmin2014, I've updated the wiki here. Let me know if it's acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, the example looks much cleaner using the openml fetcher!
Fixes #11858
What does this implement/fix? Explain your changes.
This pulls data from openml.org rather than the original data source.
Any other comments?
The dataset has spent several days on openml.org but is still "in_preparation". Data can be pulled via
fetch_openml
, but gives a warning about the "in_preparation" status of the dataset's current version.plot_gpr_co2
is technically functional, but need to get dataset to "active" state.Left to-do: