10000 [WIP] Clean up of `cross_validation` et al. and `model_selection` refactoring by raghavrv · Pull Request #4254 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] Clean up of cross_validation et al. and model_selection refactoring #4254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

raghavrv
Copy link
Member

fixes #1848

Based off #3340 by @pignacio

@raghavrv raghavrv force-pushed the grid_search_data_indep_cv branch 2 times, most recently from 4380634 to e7fb55b Compare February 16, 2015 21:06
@raghavrv raghavrv force-pushed the grid_search_data_indep_cv branch from e7fb55b to 0945b5b Compare February 16, 2015 21:25
@raghavrv raghavrv mentioned this pull request Feb 17, 2015
@raghavrv
Copy link
Member Author

@jnothman

I have a convoluted hack in mind... You probably don't want this... but could you just share your views on such an approach?

To warn, whenever the a particular module is used from grid_search etc -

@sklearn/__init__.py

    import model_selection as _ms
    .
    .
    .

    # FIXME module definitions to be removed in 0.18

    warning = """**Warning:** Sub modules of :mod:{0} are refactored into 
    :mod:`model_selection`. Please use the same. This module (:mod:{0}) will
    be removed in version 0.18.
    """

    deprecator = deprecated(msg=warning, update_doc=False)

    # -----------------grid_search--------------------------------

    grid_search = ModuleType('grid_search')
    grid_search.__doc__ = """
    The :mod:`sklearn.grid_search` includes utilities to fine-tune the
    parameters of an estimator.
    """ + warning % ('sklearn.grid_search')

    deprecator._msg = warning % 'sklearn.grid_search'

    grid_search.__all__ = ['GridSearchCV', 'ParameterGrid', 'fit_grid_point',
                           'ParameterSampler', 'RandomizedSearchCV']

    for name in grid_search.__all__:
        grid_search.__dict__[name] = deprecator(getattr(_ms, name))

    sys.modules['sklearn.grid_search'] = grid_search
    # ------------------------------------------------------------

So after this

>>> from sklearn.grid_search import GridSearchCV
>>> GridSearchCV(LinearSVC, [])
DeprecationWarning: **Warning:** Submodules of :mod:`sklearn.grid_search` are refactored into 
    :mod:`module_selection`. Please use the same. This module will
    be removed in version 0.18.

  warnings.warn(msg, category=DeprecationWarning)

GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True,
       loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs',
       refit=True, score_func=None, scoring=None, verbose=0)

And

>>> from module_selection import GridSearchCV
>>> GridSearchCV(LinearSVC, [])
GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True,
       loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs',
       refit=True, score_func=None, scoring=None, verbose=0)

This will -

  • Add deprecation warning to individual classes / functions of moved modules
  • avoid grid_search.py etc
  • Make git track sklearn/grid_search.py to the largest file that most resembles grid_search.py [ This will help me rebase with master ( atleast partially ), which other wise has to be manually done

@jnothman
Copy link
Member

You can have module-level deprecation triggered upon import. I won't be
able to focus on exactly what you're proposing for a couple of weeks at
least.

On 17 February 2015 at 20:48, ragv notifications@github.com wrote:

@jnothman https://github.com/jnothman

I have a convoluted hack in mind... You probably don't want this... but
could you just share your views on such an approach?

To warn, whenever the a particular module is used from grid_search etc -

@sklearn/init.py

import model_selection as _ms
.
.
.

# FIXME module definitions to be removed in 0.18

warning = """**Warning:** Sub modules of :mod:{0} are refactored into
:mod:`model_selection`. Please use the same. This module (:mod:{0}) will
be removed in version 0.18.
"""

deprecator = deprecated(msg=warning, update_doc=False)

# -----------------grid_search--------------------------------

grid_search = ModuleType('grid_search')
grid_search.__doc__ = """
The :mod:`sklearn.grid_search` includes utilities to fine-tune the
parameters of an estimator.
""" + warning % ('sklearn.grid_search')

deprecator._msg = warning % 'sklearn.grid_search'

grid_search.__all__ = ['GridSearchCV', 'ParameterGrid', 'fit_grid_point',
                       'ParameterSampler', 'RandomizedSearchCV']

for name in grid_search.__all__:
    grid_search.__dict__[name] = deprecator(getattr(_ms, name))

sys.modules['sklearn.grid_search'] = grid_search
# ------------------------------------------------------------

So after this

from sklearn.grid_search import GridSearchCV
GridSearchCV(LinearSVC, [])
DeprecationWarning: Warning: Submodules of :mod:sklearn.grid_search are refactored into
:mod:module_selection. Please use the same. This module will
be removed in version 0.18.

warnings.warn(msg, category=DeprecationWarning)

GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True,
loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs',
refit=True, score_func=None, scoring=None, verbose=0)

And

from module_selection import GridSearchCV
GridSearchCV(LinearSVC, [])
GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True,
loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs',
refit=True, score_func=None, scoring=None, verbose=0)

This will -

  • Add deprecation warning to individual classes / functions of moved
    modules
  • avoid grid_search.py etc
  • Make git track sklearn/grid_search.py to the largest file that most
    resembles grid_search.py [ This will help me rebase with master (
    atleast partially ), which other wise has to be manually done


Reply to this email directly or view it on GitHub
#4254 (comment)
.

@ogrisel
Copy link
Member
ogrisel commented Feb 18, 2015

+1 for a module level deprecation warning that will be raised upon import of this module.

@amueller
Copy link
Member

The idea of moving modules together with redoing the CV objects is that we have a cleaner deprecation path, right? so the cross_validation module will have the old objects and the model_selection module the new objects?

@amueller
Copy link
Member

I'm wondering about ways to get a cleaner diff.
One way would be to push a branch move_model_selection with all the moving to upstream, then do two PRs: one is merging move_model_selection to master and one is a PR working on the cross-validation, which would be a PR merging into move_model_selection

@amueller
Copy link
Member

Btw, the point of my first comment was that if we want to be backward compatible, there will be both the old and the new code in the PR, right?

@raghavrv
Copy link
Member Author

if we want to be backward compatible, there will be both the old and the new code in the PR, right?

(If I understand your comment correctly), we could just import the respective stuff from the new module into the old cross_validation.py etc... which is how it is done currently ( by @pignacio ). Deprecation alone has to be done for those files.

The reason why I suggested that convoluted hack is :

  • Rebasing such a setup is difficult since git recognizes code that is moved to model_selection/x.py as a new file and continues to compare ( for rebasing purposes ) the master's cross_validation.py with this branch's cross_validation.py ( which essentially contains only the imports and the deprecation alone ) as long as cross_validation.py exists in this branch... Once it is removed it correctly compares cross_validation.py with the file that most resembles it... ( the rest of the code has to be updated manually anyway )
    Also I'd be happy to rebase it manually... but I'm a little paranoid so as not to leave out any minor changes to those files in master since the time of the last PR...

with ref to the above could someone ping me whenever there is any PR that makes changes to cross_validation.py / learning_curve.py / grid_search.py pl? I'll update the same here...

  • The diff is huge hindering code review. ( not sure if this is an important concern )

I've got one more ugly hack based on my question at stack overflow here. Not sure if I myself am +1 for that... but please do take a look

@raghavrv
Copy link
Member Author

Also thanks for neat suggestion on using PR to a new branch... I'll do that after rebasing... :)

This is my plan to split the PRs -

  1. This PR will be only for moving all the files as is ( in master currently ) to model_selection and having only the imports/deprecation in cross_validation.py etc... ( will have huge diff, but should be easier to review since this involves only moving the modules ) ( DEPR. warn should be "moved to model_selection... import from there..." )
  2. Next PR should sort the stuff in model_selection/x.py --> respective files as done by @pignacio along with the changes that does not alter any of the old functionality of the modules...
  3. To clean up, introduce the remaining changes ( as a branch "mov..." ) and fine tune DEPR. warning...
  4. data_indep_cv related changes ( in a separate PR to branch "mov..." ) and add DEPR warning for the old CV...
  5. Merge 4 into "mov..." .... then branch "mov..." into master
  6. Scavenge old grid_search / cross_validation / learning_curve related PRs and pester the OPs to rebase (:p) and complete... ( or take up and complete if within my scope )

1 and 2 should be easier to do perhaps in 3 or 4 days...

3 and 4 can now be done without the worry of deprecation / huge diff etc but parts ( or full, depending on how git handles the new files ) still have to be manually rebased for the time the "mov..." branch remains active...

do you feel this seems like a sane approach?

@pignacio
Copy link
Contributor

I have a convoluted hack in mind... You probably don't want this... but could you just share your views on such an approach?

To warn, whenever the a particular module is used from grid_search etc -

sklearn.grid_search does not have submodules, so a module-wide deprecation should be enough. Here is an example of that: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/metrics.py

As for the double PR matter: That should make the diff cleaner and easier to review.

@amueller, do you see any problem with merging the first PR (moving grid_search to model_selection as it is, docs and test updates, deprecation of grid_search) onto master ASAP? It should be a pretty straightforward change.
I feel we will have to worry less about changes on grid_search while we finish the data independence changes and no PR over PR magic is needed.
Also, I think the data-independent iterator changes will take some time to implement and review, and that it would be best to not hang the model_selection migration on those changes to avoid rebase problems.

@amueller
Copy link
Member

What I understood from the old PR (maybe my misunderstanding) is that we would do renaming together with doing the new CV interface so that we have a decent deprecation path.
If we don't use the renaming for deprecation (as you two suggested) then we should most definitely split this up into two PRs.

But then how would you do the deprecation? How would the GridSearchCV object know which cross-validation interface to use? Duck-typing?

Btw see this comment by @jnothman refering to @GaelVaroquaux :

I am worried this will lay too many usability traps, and I think @GaelVaroquaux was hoping that in renaming the path to this module, we could keep two separate copies of the functionality while deprecating, and not worry about backwards compatibility in the new one. Similarly, the new version could ignore any handling of the deprecated indices argument, etc.

@amueller
Copy link
Member

Btw if we don't go the code duplication way, I would suggest not doing the rename now, and only implement the new interface. The renaming is not super urgent imho.

@jnothman
Copy link
Member

Ducktyping should be fine. One sort of CV object will have a split method
the other will be iterable. We should have no problem supporting both.

On 25 February 2015 at 14:48, Andreas Mueller notifications@github.com
wrote:

Btw if we don't go the code duplication way, I would suggest not doing
the rename now, and only implement the new interface. The renaming is not
super urgent imho.


Reply to this email directly or view it on GitHub
#4254 (comment)
.

@amueller
Copy link
Member

If you think there will be no trouble (and I don't see any), then I'd rather just do that and don't move around objects just before a release.

@raghavrv raghavrv changed the title [WIP] Data independent cv and model_selection refactoring [WIP] Clean up of cross_validation et al. and model_selection refactoring Mar 24, 2015
@raghavrv raghavrv closed this Jun 30, 2015
@raghavrv raghavrv reopened this Jun 30, 2015
@amueller
Copy link
Member
amueller commented Jul 1, 2015

Isn't this superseded by #4294?

@raghavrv
Copy link
Member Author
raghavrv commented Jul 2, 2015

Yes :)

@raghavrv raghavrv closed this Jul 2, 2015
@raghavrv raghavrv deleted the grid_search_data_indep_cv branch February 11, 2016 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename grid_search module
5 participants
0