8000 BUG Fixes fetch_kddcup99 for return_X_y and as_frame by thomasjpfan · Pull Request #19011 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

BUG Fixes fetch_kddcup99 for return_X_y and as_frame #19011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #19010

What does this implement/fix? Explain your changes.

Adds test for return_X_y=True and as_frame=True into the common tests.

@glemaitre glemaitre added this to the 0.24 milestone Dec 15, 2020
@glemaitre glemaitre self-requested a review December 15, 2020 16:43
Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I run the tests locally and it works. Thanks @thomasjpfan

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov asserts that the the return_X_y == True case is not covered by the common test:

https://github.com/scikit-learn/scikit-learn/pull/19011/files#diff-6020384f05da59b3ef5b9078a33ceca19b61f65b7b18806c9b91587fd22594e0R209

I suppose that this is because this check needs the network and is therefore skipped on the CI servers.

I tried to run the check locally and it passes:

% SKLEARN_SKIP_NETWORK_TESTS=0 pytest -vk fetch_kddcup99 sklearn/datasets/tests/test_common.py
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.9.1, pytest-6.2.0, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/miniforge3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: forked-1.2.0, xdist-2.1.0, timeout-1.4.2
collected 38 items / 35 deselected / 3 selected                                                                                                             

sklearn/datasets/tests/test_common.py::test_common_check_return_X_y[fetch_kddcup99-fetch_kddcup99] PASSED                                             [ 33%]
sklearn/datasets/tests/test_common.py::test_common_check_as_frame[fetch_kddcup99-fetch_kddcup99] PASSED                                               [ 66%]
sklearn/datasets/tests/test_common.py::test_common_check_pandas_dependency[fetch_kddcup99-fetch_kddcup99] SKIPPED (This test requires pandas to n...) [100%]

================================================================== short test summary info ==================================================================
SKIPPED [1] sklearn/datasets/tests/test_common.py:43: This test requires pandas to not be installed
======================================================== 2 passed, 1 skipped, 35 deselected in 9.39s ========================================================

assert isinstance(frame_X, pd.DataFrame)
else:
assert isinstance(frame_y, pd.Series)


def _skip_network_tests():
return os.environ.get('SKLEARN_SKIP_NETWORK_TESTS', '1') == '1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be:

Suggested change
return os.environ.get('SKLEARN_SKIP_NETWORK_TESTS', '1') == '1'
return os.environ.get('SKLEARN_SKIP_NETWORK_TESTS', '0') == '1'

by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, but I was surprise to get those checks skipped by default on my local setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have no network as default to make it easier for repo maintainers such as debian. (One less configuration to set)

On a side note, it would be nice to have a [ci network] to enable this test on the CI for PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, but I was surprise to get those checks skipped by default on my local setup.

I got a little surprised at first as well :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for [ci network]. In another PR.

@ogrisel ogrisel merged commit eedda91 into scikit-learn:master Dec 16, 2020
@ogrisel
Copy link
Member
ogrisel commented Dec 16, 2020

To be backported to 0.24.X

@ogrisel ogrisel added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Dec 16, 2020
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 22, 2020
@glemaitre glemaitre mentioned this pull request Dec 22, 2020
14 tasks
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
@adrinjalali adrinjalali mentioned this pull request Sep 3, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:datasets To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fetch_kddcup99(as_frame=True, return_X_y=True) returns numpy arrays
3 participants
0