8000 Enable grid_search with classifiers that fail on individual training folds by romaniukm · Pull Request #2587 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Enable grid_search with classifiers that fail on individual training folds #2587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

romaniukm
Copy link

Enable grid_search with classifiers that fail on individual training folds.

The improved functionality is repeated in two places, with the same code for the case where y is not None and when y is None. I would welcome a suggestion on how to avoid this duplication. There are also two nearly identical tests for those two cases.

@romaniukm romaniukm closed this Nov 13, 2013
@romaniukm romaniukm deleted the gridsearch-failing-classifiers branch November 13, 2013 21:20
@romaniukm romaniukm restored the gridsearch-failing-classifiers branch November 13, 2013 21:20
@romaniukm romaniukm deleted the gridsearch-failing-classifiers branch November 13, 2013 21:21
@romaniukm romaniukm restored the gridsearch-failing-classifiers branch November 13, 2013 21:21
@romaniukm romaniukm reopened this Nov 13, 2013
@romaniukm
Copy link
Author

Is anyone interested in this code?

@jnothman
Copy link
Member

Probably. I'll take a look.

if scorer is not None:
this_score = scorer(clf, X_test, y_test)
try:
clf.fit(X_train, y_train, **fit_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication could be avoided with something like clf.fit(*fit_args, **fit_params) where fit_args is set differently for the y is None and y is not None cases. (In some PR related to grid search somewhere I have implemented it this way, but the remainder of the PR was too controversial to merge as yet.)

@romaniukm
Copy link
Author

I finally got back to working on this and now I'm wondering what I should do about the divergence between master and my local branch. Should I rebase my branch or merge it with the current master?

@jnothman
Copy link
Member

Usually we use rebase...

On 22 January 2014 07:59, Michal Romaniuk notifications@github.com wrote:

I finally got back to working on this and now I'm wondering what I
should do about the divergence between master and my local branch. Should I
rebase my branch or merge it with the current master?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-32962605
.

@romaniukm
Copy link
Author

I noticed that the master diverged quite significantly from my branch and the functionality I'm working on was moved to cross_validation.py so I decided to pull the latest master, create a new branch and work on that.
Now I have the following problem AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV. I think I know what is causing the assertion to fail, but I wonder where to find this test script for doctest, so that I can fix it?

@jnothman
Copy link
Member

Yes, making a new branch sound sensible.

Usually a doctest failure will give more information than that: what was printed, and what expected. The code it tests is written in a docstring comment, probably here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/grid_search.py#L502

@romaniukm
Copy link
Author

The output says this:

======================================================================
FAIL: Doctest: sklearn.grid_search.GridSearchCV
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/doctest.py", line 2201, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py", line 447, in GridSearchCV

----------------------------------------------------------------------
File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py", line 529, in sklearn.grid_search.GridSearchCV
Failed example:
    clf.fit(iris.data, iris.target)
                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                         degree=..., gamma=..., kernel='rbf', max_iter=-1,
                         probability=False, random_state=None, shrinking=True,
                         tol=..., verbose=False),
           fit_params={}, iid=..., loss_func=..., n_jobs=1,
           param_grid=..., pre_dispatch=..., refit=..., score_func=...,
           scoring=..., verbose=...)
Got:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False),
           fit_exception_score=0.0, fit_exceptions_to_warnings=False,
           fit_params={}, iid=True, loss_func=None, n_jobs=1,
           param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
           pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
           verbose=0)

>>  raise self.failureException(self.format_failure(<StringIO.StringIO instance at 0x1eb8b48>.getvalue()))

@jnothman
Copy link
Member

As I thought. It's because the previous doctest doesn't know about
fit_exception_score=0.0, fit_exceptions_to_warnings=False, which appears now.

On 23 January 2014 09:08, Michal Romaniuk notifications@github.com wrote:

The output says this:
`======================================================================
FAIL: Doctest: sklearn.grid_search.GridSearchCV

Traceback (most recent call last):
File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/doctest.py", line
2201, in runTest
raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py",
line 447, in GridSearchCV


File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py",
line 529, in sklearn.grid_search.GridSearchCV
Failed example:
clf.fit(iris.data, iris.target)

doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

Expected:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
degree=..., gamma=..., kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=..., verbose=False),
fit_params={}, iid=..., loss_func=..., n_jobs=1,
param_grid=..., pre_dispatch=..., refit=..., score_func=...,
scoring=..., verbose=...)
Got:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_exception_score=0.0, fit_exceptions_to_warnings=False,
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
verbose=0)

raise self.failureException(self.format_failure(.getvalue()))`


Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33074306
.

@romaniukm
Copy link
Author

I noticed that too :-) I just need to know where these doctests are stored so that I can update it...

@jnothman
Copy link
Member

"File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/
scikit-learn/sklearn/grid_search.py", line 529"

On 23 January 2014 09:26, Michal Romaniuk notifications@github.com wrote:

I noticed that :-) I just need to know where these doctests are stored
so that I can update it...


Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33075993
.

@romaniukm
Copy link
Author

Oh... so it's checking the file against its own documentation...
Well, I added the new parameters to the list but for some reason it still gives me an error. It appears that the arguments are in a different order. Do I have to sort them alphabetically in the docs even if they are not alphabetical in the actual code?

Failed example:
    clf.fit(iris.data, iris.target)
                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                         degree=..., gamma
8000
=..., kernel='rbf', max_iter=-1,
                         probability=False, random_state=None, shrinking=True,
                         tol=..., verbose=False),
           fit_params={}, iid=..., loss_func=..., n_jobs=1,
           param_grid=..., pre_dispatch=..., refit=..., score_func=...,
           scoring=..., verbose=..., fit_exceptions_to_warnings=...,
           fit_exception_score=...)
Got:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False),
           fit_exception_score=0.0, fit_exceptions_to_warnings=False,
           fit_params={}, iid=True, loss_func=None, n_jobs=1,
           param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
           pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
           verbose=0)

@jnothman
Copy link
Member

http://docs.python.org/2/library/doctest.html

It checks exactly the printed output, so the parameters need to be in the
right order. Yes, it's alphabetical (see
sklearn.base.BaseEstimator.repr.

On 23 January 2014 09:48, Michal Romaniuk notifications@github.com wrote:

Oh... so it's checking the file against its own documentation...
Well, I added the new parameters to the list but for some reason it still
gives me an error. It appears that the arguments are in a different order.
Do I have to sort them alphabetically in the docs even if they are not
alphabetical in the actual code?

Failed example:
clf.fit(iris.data, iris.target)
# doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
degree=..., gamma=..., kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=..., verbose=False),
fit_params={}, iid=..., loss_func=..., n_jobs=1,
param_grid=..., pre_dispatch=..., refit=..., score_func=...,
scoring=..., verbose=..., fit_exceptions_to_warnings=...,
fit_exception_score=...)
Got:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_exception_score=0.0, fit_exceptions_to_warnings=False,
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
verbose=0)


Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33077911
.

@romaniukm
Copy link
Author

Ok, I rearranged them alphabetically, but now I'm getting another strange error (sorry about bothering you so much with this!):

ERROR: Failure: ImportError (cannot import name DepthFirstTreeBuilder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/loader.py", line 413, in loadTestsFromName
    addr.filename, addr.module)
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tests/test_grid_search.py", line 34, in <module>
    from sklearn.tree import DecisionTreeRegressor
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/__init__.py", line 6, in <module>
    from .tree import DecisionTreeClassifier
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/tree.py", line 31, in <module>
    from ._tree import DepthFirstTreeBuilder, BestFirstTreeBuilder
ImportError: cannot import name DepthFirstTreeBuilder

I already did git status to verify that I didn't change any of the tree.py code by accident...

@jnothman
Copy link
Member

run make inplace to recompile

On 24 January 2014 08:30, Michal Romaniuk notifications@github.com wrote:

Ok, I rearranged them alphabetically, but now I'm getting another
strange error (sorry about bothering you so much with this!):


Traceback (most recent call last):
File
"/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/loader.py",
line 413, in loadTestsFromName
addr.filename, addr.module)
File
"/vol/medic02/use
8000
rs/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File
"/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py",
line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tests/test_grid_search.py",
line 34, in
from sklearn.tree import DecisionTreeRegressor
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/
_init_.py", line 6, in
from .tree import DecisionTreeClassifier
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/tree.py",
line 31, in
from ._tree import DepthFirstTreeBuilder, BestFirstTreeBuilder
ImportError: cannot import name DepthFirstTreeBuilder

I already did `git status` to verify that I didn't change any of the tree.py code by accident...

—
Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2587#issuecomment-33170694
.

@romaniukm
Copy link
Author

@jnothman thanks for help! It seems to be working now. I created a new branch for this (because the old one was so outdated) so now I wonder if I should issue a new pull request or try to somehow get this one to switch to a different branch (how to do this?).

@jnothman
Copy link
Member

I don't think you can switch a PR to a different branch, but you can reset this branch to point to the head of the new branch, something like:

$ git checkout pr_branch
$ git reset --hard new_branch 
$ git push origin pr_branch -f

@GaelVaroquaux
Copy link
Member

As Joel said, we prefer rebase, as it gives cleaner histories. However,
if you find that the rebase is very hard, and you are too after of making
errors, merge is accepted.

@romaniukm
Copy link
Author

Well, what I did was just start from scratch on a fresh checkout from master. So now it's technically an entirely new branch and that's why I'm thinking of opening a new pull request...

----- Reply message -----
From: "Gael Varoquaux" notifications@github.com
To: "scikit-learn/scikit-learn" scikit-learn@noreply.github.com
Cc: "Romaniuk, Michal" michal.romaniuk06@imperial.ac.uk
Subject: [scikit-learn] Enable grid_search with classifiers that fail on individual training folds (#2587)
Date: Sat, Jan 25, 2014 15:23

As Joel said, we prefer rebase, as it gives cleaner histories. However,
if you find that the rebase is very hard, and you are too after of making
errors, merge is accepted.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33291206.

@romaniukm romaniukm closed this Jan 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0