Enable grid_search with classifiers that fail on individual training folds #2587

romaniukm · 2013-11-13T20:50:03Z

Enable grid_search with classifiers that fail on individual training folds.

The improved functionality is repeated in two places, with the same code for the case where y is not None and when y is None. I would welcome a suggestion on how to avoid this duplication. There are also two nearly identical tests for those two cases.

…folds.

romaniukm · 2013-11-30T20:00:56Z

Is anyone interested in this code?

jnothman · 2013-11-30T21:10:26Z

Probably. I'll take a look.

jnothman · 2013-11-30T21:19:23Z

sklearn/grid_search.py

-        if scorer is not None:
-            this_score = scorer(clf, X_test, y_test)
+        try:
+            clf.fit(X_train, y_train, **fit_params)


The duplication could be avoided with something like clf.fit(*fit_args, **fit_params) where fit_args is set differently for the y is None and y is not None cases. (In some PR related to grid search somewhere I have implemented it this way, but the remainder of the PR was too controversial to merge as yet.)

romaniukm · 2014-01-21T20:59:00Z

I finally got back to working on this and now I'm wondering what I should do about the divergence between master and my local branch. Should I rebase my branch or merge it with the current master?

jnothman · 2014-01-21T21:00:37Z

Usually we use rebase...

On 22 January 2014 07:59, Michal Romaniuk notifications@github.com wrote:

I finally got back to working on this and now I'm wondering what I
should do about the divergence between master and my local branch. Should I
rebase my branch or merge it with the current master?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-32962605
.

romaniukm · 2014-01-22T21:25:41Z

I noticed that the master diverged quite significantly from my branch and the functionality I'm working on was moved to cross_validation.py so I decided to pull the latest master, create a new branch and work on that.
Now I have the following problem AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV. I think I know what is causing the assertion to fail, but I wonder where to find this test script for doctest, so that I can fix it?

jnothman · 2014-01-22T22:02:52Z

Yes, making a new branch sound sensible.

Usually a doctest failure will give more information than that: what was printed, and what expected. The code it tests is written in a docstring comment, probably here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/grid_search.py#L502

romaniukm · 2014-01-22T22:08:48Z

The output says this:

======================================================================
FAIL: Doctest: sklearn.grid_search.GridSearchCV
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/doctest.py", line 2201, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py", line 447, in GridSearchCV

----------------------------------------------------------------------
File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py", line 529, in sklearn.grid_search.GridSearchCV
Failed example:
    clf.fit(iris.data, iris.target)
                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                         degree=..., gamma=..., kernel='rbf', max_iter=-1,
                         probability=False, random_state=None, shrinking=True,
                         tol=..., verbose=False),
           fit_params={}, iid=..., loss_func=..., n_jobs=1,
           param_grid=..., pre_dispatch=..., refit=..., score_func=...,
           scoring=..., verbose=...)
Got:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False),
           fit_exception_score=0.0, fit_exceptions_to_warnings=False,
           fit_params={}, iid=True, loss_func=None, n_jobs=1,
           param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
           pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
           verbose=0)

>>  raise self.failureException(self.format_failure(<StringIO.StringIO instance at 0x1eb8b48>.getvalue()))

jnothman · 2014-01-22T22:17:53Z

As I thought. It's because the previous doctest doesn't know about
fit_exception_score=0.0, fit_exceptions_to_warnings=False, which appears now.

On 23 January 2014 09:08, Michal Romaniuk notifications@github.com wrote:

The output says this:
`======================================================================
FAIL: Doctest: sklearn.grid_search.GridSearchCV

Traceback (most recent call last):
File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/doctest.py", line
2201, in runTest
raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for sklearn.grid_search.GridSearchCV
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py",
line 447, in GridSearchCV

File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/grid_search.py",
line 529, in sklearn.grid_search.GridSearchCV
Failed example:
clf.fit(iris.data, iris.target)

doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

Expected:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
degree=..., gamma=..., kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=..., verbose=False),
fit_params={}, iid=..., loss_func=..., n_jobs=1,
param_grid=..., pre_dispatch=..., refit=..., score_func=...,
scoring=..., verbose=...)
Got:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_exception_score=0.0, fit_exceptions_to_warnings=False,
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
verbose=0)

raise self.failureException(self.format_failure(.getvalue()))`

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33074306
.

romaniukm · 2014-01-22T22:26:58Z

I noticed that too :-) I just need to know where these doctests are stored so that I can update it...

jnothman · 2014-01-22T22:33:28Z

"File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/
scikit-learn/sklearn/grid_search.py", line 529"

On 23 January 2014 09:26, Michal Romaniuk notifications@github.com wrote:

I noticed that :-) I just need to know where these doctests are stored
so that I can update it...

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33075993
.

romaniukm · 2014-01-22T22:48:52Z

Oh... so it's checking the file against its own documentation...
Well, I added the new parameters to the list but for some reason it still gives me an error. It appears that the arguments are in a different order. Do I have to sort them alphabetically in the docs even if they are not alphabetical in the actual code?

Failed example:
    clf.fit(iris.data, iris.target)
                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
                         degree=..., gamma
8000
=..., kernel='rbf', max_iter=-1,
                         probability=False, random_state=None, shrinking=True,
                         tol=..., verbose=False),
           fit_params={}, iid=..., loss_func=..., n_jobs=1,
           param_grid=..., pre_dispatch=..., refit=..., score_func=...,
           scoring=..., verbose=..., fit_exceptions_to_warnings=...,
           fit_exception_score=...)
Got:
    GridSearchCV(cv=None,
           estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
      kernel='rbf', max_iter=-1, probability=False, random_state=None,
      shrinking=True, tol=0.001, verbose=False),
           fit_exception_score=0.0, fit_exceptions_to_warnings=False,
           fit_params={}, iid=True, loss_func=None, n_jobs=1,
           param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
           pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
           verbose=0)

jnothman · 2014-01-22T22:54:01Z

http://docs.python.org/2/library/doctest.html

It checks exactly the printed output, so the parameters need to be in the
right order. Yes, it's alphabetical (see
sklearn.base.BaseEstimator.repr.

On 23 January 2014 09:48, Michal Romaniuk notifications@github.com wrote:

Oh... so it's checking the file against its own documentation...
Well, I added the new parameters to the list but for some reason it still
gives me an error. It appears that the arguments are in a different order.
Do I have to sort them alphabetically in the docs even if they are not
alphabetical in the actual code?

Failed example:
clf.fit(iris.data, iris.target)
# doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expected:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
degree=..., gamma=..., kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=..., verbose=False),
fit_params={}, iid=..., loss_func=..., n_jobs=1,
param_grid=..., pre_dispatch=..., refit=..., score_func=...,
scoring=..., verbose=..., fit_exceptions_to_warnings=...,
fit_exception_score=...)
Got:
GridSearchCV(cv=None,
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_exception_score=0.0, fit_exceptions_to_warnings=False,
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
verbose=0)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33077911
.

romaniukm · 2014-01-23T21:30:35Z

Ok, I rearranged them alphabetically, but now I'm getting another strange error (sorry about bothering you so much with this!):

ERROR: Failure: ImportError (cannot import name DepthFirstTreeBuilder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/loader.py", line 413, in loadTestsFromName
    addr.filename, addr.module)
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tests/test_grid_search.py", line 34, in <module>
    from sklearn.tree import DecisionTreeRegressor
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/__init__.py", line 6, in <module>
    from .tree import DecisionTreeClassifier
  File "/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/tree.py", line 31, in <module>
    from ._tree import DepthFirstTreeBuilder, BestFirstTreeBuilder
ImportError: cannot import name DepthFirstTreeBuilder

I already did git status to verify that I didn't change any of the tree.py code by accident...

jnothman · 2014-01-23T22:50:32Z

run make inplace to recompile

On 24 January 2014 08:30, Michal Romaniuk notifications@github.com wrote:

Ok, I rearranged them alphabetically, but now I'm getting another
strange error (sorry about bothering you so much with this!):


Traceback (most recent call last):
File
"/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/loader.py",
line 413, in loadTestsFromName
addr.filename, addr.module)
File
"/vol/medic02/use
8000
rs/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File
"/vol/medic02/users/mpr06/anaconda/lib/python2.7/site-packages/nose/importer.py",
line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tests/test_grid_search.py",
line 34, in
from sklearn.tree import DecisionTreeRegressor
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/
_init_.py", line 6, in
from .tree import DecisionTreeClassifier
File
"/vol/medic02/users/mpr06/sklearn-dev/anaconda/github/scikit-learn/sklearn/tree/tree.py",
line 31, in
from ._tree import DepthFirstTreeBuilder, BestFirstTreeBuilder
ImportError: cannot import name DepthFirstTreeBuilder

I already did `git status` to verify that I didn't change any of the tree.py code by accident...

—
Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2587#issuecomment-33170694
.

romaniukm · 2014-01-24T18:49:22Z

@jnothman thanks for help! It seems to be working now. I created a new branch for this (because the old one was so outdated) so now I wonder if I should issue a new pull request or try to somehow get this one to switch to a different branch (how to do this?).

jnothman · 2014-01-25T11:12:54Z

I don't think you can switch a PR to a different branch, but you can reset this branch to point to the head of the new branch, something like:

$ git checkout pr_branch
$ git reset --hard new_branch 
$ git push origin pr_branch -f

GaelVaroquaux · 2014-01-25T15:23:09Z

As Joel said, we prefer rebase, as it gives cleaner histories. However,
if you find that the rebase is very hard, and you are too after of making
errors, merge is accepted.

romaniukm · 2014-01-25T15:35:19Z

Well, what I did was just start from scratch on a fresh checkout from master. So now it's technically an entirely new branch and that's why I'm thinking of opening a new pull request...

----- Reply message -----
From: "Gael Varoquaux" notifications@github.com
To: "scikit-learn/scikit-learn" scikit-learn@noreply.github.com
Cc: "Romaniuk, Michal" michal.romaniuk06@imperial.ac.uk
Subject: [scikit-learn] Enable grid_search with classifiers that fail on individual training folds (#2587)
Date: Sat, Jan 25, 2014 15:23

As Joel said, we prefer rebase, as it gives cleaner histories. However,
if you find that the rebase is very hard, and you are too after of making
errors, merge is accepted.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2587#issuecomment-33291206.

Enable grid_search with classifiers that fail on individual training …

4f4dd30

…folds.

romaniukm closed this Nov 13, 2013

romaniukm deleted the gridsearch-failing-classifiers branch November 13, 2013 21:20

romaniukm restored the gridsearch-failing-classifiers branch November 13, 2013 21:20

romaniukm deleted the gridsearch-failing-classifiers branch November 13, 2013 21:21

romaniukm restored the gridsearch-failing-classifiers branch November 13, 2013 21:21

romaniukm reopened this Nov 13, 2013

jnothman reviewed Nov 30, 2013
View reviewed changes

romaniukm mentioned this pull request Jan 25, 2014

[MRG+2] Enable grid search with classifiers that may fail on individual fits. #2795

Closed

romaniukm closed this Jan 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable grid_search with classifiers that fail on individual training folds #2587

Enable grid_search with classifiers that fail on individual training folds #2587

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable grid_search with classifiers that fail on individual training folds #2587

Enable grid_search with classifiers that fail on individual training folds #2587

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!