[MRG] pulling data from openml.org rather than original data source #12004

maxcopeland · 2018-09-04T19:51:58Z

What does this implement/fix? Explain your changes.

This pulls data from openml.org rather than the original data source.

Any other comments?

The dataset has spent several days on openml.org but is still "in_preparation". Data can be pulled via fetch_openml, but gives a warning about the "in_preparation" status of the dataset's current version. plot_gpr_co2 is technically functional, but need to get dataset to "active" state.

Left to-do:

More efficient aggregation of monthly sum averages
Upload additional versions of arff file to get dataset activated by openml admins

…inal data source

jnothman

Thanks for working on this!

jnothman · 2018-09-04T23:24:47Z

examples/gaussian_process/plot_gpr_co2.py

            counts.append(1)
        else:
            # aggregate monthly sum to produce average
-            ppmv_sums[-1] += float(ppmv)
+            ppmv_sums[-1] += float(ppmvs[i])


do we still need this float?

you're right, this is redundant

jnothman · 2018-09-04T23:26:12Z

examples/gaussian_process/plot_gpr_co2.py

+    month_float = y + (m - 1) / 12
+    ppmvs = ml_data.target
+
+    for i in range(len(ppmvs)):


You might as well just iterate over zip(month_float, ppmvs)

jnothman · 2018-09-05T00:27:27Z

Why is this marked WIP?

jnothman · 2018-09-05T00:27:44Z

That is, what work do you intend to do before this is safe to consider merging?

maxcopeland · 2018-09-05T00:52:28Z

On the openml.org side, the dataset version still needs to be approved and set to "active". While its status is "in_preparation", their admins could in theory reject the dataset and set as "inactive" (due to compliance issues with tasks or workflows, etc) and would break fetch_openml. Once it's active, the merge will be safe.

jnothman · 2018-09-05T01:01:00Z

@janvanrijn what's the chance of https://www.openml.org/d/41187 not being approved? ;)

@maxcopeland is there a reason not to say "Fixes #..." in the PR description? Using that wording, rather than "Works on" means that github will automatically close the original issue when this is merged.

maxcopeland · 2018-09-05T01:09:15Z

@jnothman Sorry about that! Edited my PR comment. I'll use "Fixes #..." in the future. Newbie error :/

janvanrijn · 2018-09-05T01:16:24Z

cool, new datasets :)
it's active now

qinhanmin2014 · 2018-09-05T10:16:38Z

@maxcopeland Thanks for uploading the dataset. Seems that there are still some formatting issues in the wiki part. Also, maybe we can provide more information in the wiki (e.g., the url you obtain the data)

maxcopeland · 2018-09-05T17:27:43Z

Thanks @qinhanmin2014, I've updated the wiki here. Let me know if it's acceptable.

rth

Very nice, the example looks much cleaner using the openml fetcher!

maxcopeland added 2 commits September 4, 2018 12:32

Changed plot_gpr_co2.py to pull data from openml.org rather than orig…

da044ec

…inal data source

fixing pep8 issues

c600edd

jnothman reviewed Sep 4, 2018

View reviewed changes

removing type redundancy, iter. over zip, changed fn name

4b92cd7

jnothman approved these changes Sep 5, 2018

View reviewed changes

maxcopeland changed the title ~~[WIP] pulling data from openml.org rather than original data source~~ [MRG] pulling data from openml.org rather than original data source Sep 5, 2018

rth approved these changes Sep 8, 2018

View reviewed changes

rth merged commit 2242f4c into scikit-learn:master Sep 8, 2018

maxcopeland deleted the mauna-loa-openml branch September 8, 2018 15:03

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Sep 9, 2018

EXA use openml fetcher in plot_gpr_co2.py example (scikit-learn#12004)

6711952

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Sep 17, 2018

EXA use openml fetcher in plot_gpr_co2.py example (scikit-learn#12004)

9de62e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] pulling data from openml.org rather than original data source #12004

[MRG] pulling data from openml.org rather than original data source #12004

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG] pulling data from openml.org rather than original data source #12004

[MRG] pulling data from openml.org rather than original data source #12004

Uh oh!

Conversation

Uh oh!

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!