8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
cross_validation
model_selection
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes #1848
Based off #3340 by @pignacio
Sorry, something went wrong.
4380634
e7fb55b
MNT Move learning_curve, grid_search and cv modules into model_selection
d9d3c67
Working split method for KFold and StratifiedKFold
1f0bf1b
Removed _PartitionTestGenerator
9d67b0e
LeaveOut and LeaveLabelOut working with split
0945b5b
@jnothman
I have a convoluted hack in mind... You probably don't want this... but could you just share your views on such an approach?
To warn, whenever the a particular module is used from grid_search etc -
grid_search
@sklearn/__init__.py
import model_selection as _ms . . . # FIXME module definitions to be removed in 0.18 warning = """**Warning:** Sub modules of :mod:{0} are refactored into :mod:`model_selection`. Please use the same. This module (:mod:{0}) will be removed in version 0.18. """ deprecator = deprecated(msg=warning, update_doc=False) # -----------------grid_search-------------------------------- grid_search = ModuleType('grid_search') grid_search.__doc__ = """ The :mod:`sklearn.grid_search` includes utilities to fine-tune the parameters of an estimator. """ + warning % ('sklearn.grid_search') deprecator._msg = warning % 'sklearn.grid_search' grid_search.__all__ = ['GridSearchCV', 'ParameterGrid', 'fit_grid_point', 'ParameterSampler', 'RandomizedSearchCV'] for name in grid_search.__all__: grid_search.__dict__[name] = deprecator(getattr(_ms, name)) sys.modules['sklearn.grid_search'] = grid_search # ------------------------------------------------------------
So after this
>>> from sklearn.grid_search import GridSearchCV >>> GridSearchCV(LinearSVC, []) DeprecationWarning: **Warning:** Submodules of :mod:`sklearn.grid_search` are refactored into :mod:`module_selection`. Please use the same. This module will be removed in version 0.18. warnings.warn(msg, category=DeprecationWarning) GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0)
And
>>> from module_selection import GridSearchCV >>> GridSearchCV(LinearSVC, []) GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0)
This will -
sklearn/grid_search.py
grid_search.py
You can have module-level deprecation triggered upon import. I won't be able to focus on exactly what you're proposing for a couple of weeks at least.
On 17 February 2015 at 20:48, ragv notifications@github.com wrote:
@jnothman https://github.com/jnothman I have a convoluted hack in mind... You probably don't want this... but could you just share your views on such an approach? To warn, whenever the a particular module is used from grid_search etc - @sklearn/init.py import model_selection as _ms . . . # FIXME module definitions to be removed in 0.18 warning = """**Warning:** Sub modules of :mod:{0} are refactored into :mod:`model_selection`. Please use the same. This module (:mod:{0}) will be removed in version 0.18. """ deprecator = deprecated(msg=warning, update_doc=False) # -----------------grid_search-------------------------------- grid_search = ModuleType('grid_search') grid_search.__doc__ = """ The :mod:`sklearn.grid_search` includes utilities to fine-tune the parameters of an estimator. """ + warning % ('sklearn.grid_search') deprecator._msg = warning % 'sklearn.grid_search' grid_search.__all__ = ['GridSearchCV', 'ParameterGrid', 'fit_grid_point', 'ParameterSampler', 'RandomizedSearchCV'] for name in grid_search.__all__: grid_search.__dict__[name] = deprecator(getattr(_ms, name)) sys.modules['sklearn.grid_search'] = grid_search # ------------------------------------------------------------ So after this from sklearn.grid_search import GridSearchCV GridSearchCV(LinearSVC, []) DeprecationWarning: Warning: Submodules of :mod:sklearn.grid_search are refactored into :mod:module_selection. Please use the same. This module will be removed in version 0.18. warnings.warn(msg, category=DeprecationWarning) GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0) And from module_selection import GridSearchCV GridSearchCV(LinearSVC, []) GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0) This will - Add deprecation warning to individual classes / functions of moved modules avoid grid_search.py etc Make git track sklearn/grid_search.py to the largest file that most resembles grid_search.py [ This will help me rebase with master ( atleast partially ), which other wise has to be manually done — Reply to this email directly or view it on GitHub #4254 (comment) .
@jnothman https://github.com/jnothman
@sklearn/init.py
from sklearn.grid_search import GridSearchCV GridSearchCV(LinearSVC, []) DeprecationWarning: Warning: Submodules of :mod:sklearn.grid_search are refactored into :mod:module_selection. Please use the same. This module will be removed in version 0.18.
sklearn.grid_search
module_selection
warnings.warn(msg, category=DeprecationWarning)
GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0)
from module_selection import GridSearchCV GridSearchCV(LinearSVC, []) GridSearchCV(cv=None, estimator=LinearSVC(...), fit_params={}, iid=True, loss_func=None, n_jobs=1, param_grid=[], pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None, verbose=0)
— Reply to this email directly or view it on GitHub #4254 (comment) .
+1 for a module level deprecation warning that will be raised upon import of this module.
The idea of moving modules together with redoing the CV objects is that we have a cleaner deprecation path, right? so the cross_validation module will have the old objects and the model_selection module the new objects?
I'm wondering about ways to get a cleaner diff. One way would be to push a branch move_model_selection with all the moving to upstream, then do two PRs: one is merging move_model_selection to master and one is a PR working on the cross-validation, which would be a PR merging into move_model_selection
move_model_selection
master
Btw, the point of my first comment was that if we want to be backward compatible, there will be both the old and the new code in the PR, right?
if we want to be backward compatible, there will be both the old and the new code in the PR, right?
(If I understand your comment correctly), we could just import the respective stuff from the new module into the old cross_validation.py etc... which is how it is done currently ( by @pignacio ). Deprecation alone has to be done for those files.
cross_validation.py
The reason why I suggested that convoluted hack is :
model_selection/x.py
with ref to the above could someone ping me whenever there is any PR that makes changes to cross_validation.py / learning_curve.py / grid_search.py pl? I'll update the same here...
learning_curve.py
I've got one more ugly hack based on my question at stack overflow here. Not sure if I myself am +1 for that... but please do take a look
Also thanks for neat suggestion on using PR to a new branch... I'll do that after rebasing... :)
This is my plan to split the PRs -
1 and 2 should be easier to do perhaps in 3 or 4 days...
3 and 4 can now be done without the worry of deprecation / huge diff etc but parts ( or full, depending on how git handles the new files ) still have to be manually rebased for the time the "mov..." branch remains active...
do you feel this seems like a sane approach?
I have a convoluted hack in mind... You probably don't want this... but could you just share your views on such an approach? To warn, whenever the a particular module is used from grid_search etc -
sklearn.grid_search does not have submodules, so a module-wide deprecation should be enough. Here is an example of that: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/metrics.py
As for the double PR matter: That should make the diff cleaner and easier to review.
@amueller, do you see any problem with merging the first PR (moving grid_search to model_selection as it is, docs and test updates, deprecation of grid_search) onto master ASAP? It should be a pretty straightforward change. I feel we will have to worry less about changes on grid_search while we finish the data independence changes and no PR over PR magic is needed. Also, I think the data-independent iterator changes will take some time to implement and review, and that it would be best to not hang the model_selection migration on those changes to avoid rebase problems.
What I understood from the old PR (maybe my misunderstanding) is that we would do renaming together with doing the new CV interface so that we have a decent deprecation path. If we don't use the renaming for deprecation (as you two suggested) then we should most definitely split this up into two PRs.
But then how would you do the deprecation? How would the GridSearchCV object know which cross-validation interface to use? Duck-typing?
Btw see this comment by @jnothman refering to @GaelVaroquaux :
I am worried this will lay too many usability traps, and I think @GaelVaroquaux was hoping that in renaming the path to this module, we could keep two separate copies of the functionality while deprecating, and not worry about backwards compatibility in the new one. Similarly, the new version could ignore any handling of the deprecated indices argument, etc.
Btw if we don't go the code duplication way, I would suggest not doing the rename now, and only implement the new interface. The renaming is not super urgent imho.
Ducktyping should be fine. One sort of CV object will have a split method the other will be iterable. We should have no problem supporting both.
On 25 February 2015 at 14:48, Andreas Mueller notifications@github.com wrote:
Btw if we don't go the code duplication way, I would suggest not doing the rename now, and only implement the new interface. The renaming is not super urgent imho. — Reply to this email directly or view it on GitHub #4254 (comment) .
If you think there will be no trouble (and I don't see any), then I'd rather just do that and don't move around objects just before a release.
Isn't this superseded by #4294?
Yes :)
Successfully merging this pull request may close these issues.