-
Notifications
You must be signed in to change notification settings - Fork 1.4k
MRG: Functionality to change scoring method of searchlight #3502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
assert_array_equal(score.shape, [n_time]) | ||
assert_true(score.dtype == float) | ||
sl = SearchLight(LogisticRegression()) | ||
assert_array_equal(cross_val_score(sl, X, y, scoring='roc_auc'), score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cross_val_score
fails with an error.
File "/home/kaichogami/codes/dev_mne/scikit-learn/sklearn/utils/validation.py", line 561, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (17, 10)
I think the current design won't work, notably because we need to change the type of prediction as a function of the scoring. e.g. if you pass a roc_auc_score, the prediction should be continuous (e.g. Additionally, could you add tests to make sure that this is working:
And once it passes, update the GeneralizationLight too |
@@ -338,7 +358,10 @@ def _sl_score(estimators, X, y): | |||
""" | |||
n_iter = X.shape[-1] | |||
for ii, est in enumerate(estimators): | |||
_score = est.score(X[..., ii], y) | |||
if scoring is not None: | |||
_score = scoring(est, X[..., ii], y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kingjr I made it with the signature score_func(estimator, y, y_pred)
by using make_scorer
so as to take care of the prediction type which make_scorer
handles internally. However cross_val_score
doesn't still pass. Let me know if you have any idea about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't know. Do you manage to make cross_val_score(base_estimator, scoring=scoring)
pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your y is 2d and roc_auc requires y to be 1d or 1 column. You need to look over the columns of y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort this is a single/multi class scoring problem. Isn't there a way to handle this by sklearn scorers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_score = scoring(est, X[..., ii], y)
, shouldn't this fail as well in case when scoring
parameter is changed to roc_auc_score
in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my last comment. Going through the code, I think error occurs because X is not 2D before being passed to decision_function
. I can see two solutions.
- Make the input to be a 2D matrix or at least convert
X
to 2D before passing todecision_function
- Follow the other API @kingjr mentioned
cross_val_score(make_pipeline(SearchLight(Ridge(), scoring=my_scorer()), X, y)
I hope I make sense..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sklearn score returns a float per fold. Look maybe at multi_output scorers
On Thu, Aug 11, 2016 at 3:32 PM +0200, "Jean-Rémi KING" notifications@github.com wrote:
In mne/decoding/search_light.py:
@@ -338,7 +358,10 @@ def _sl_score(estimators, X, y):
"""
n_iter = X.shape[-1]
for ii, est in enumerate(estimators):
_score = est.score(X[..., ii], y)
if scoring is not None:
_score = scoring(est, X[..., ii], y)
@agramfort this is a single/multi class scoring problem. Isn't there a way to handle thus by sklearn scorers?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@kingjr I am not sure what should be done now. |
@kaichogami make CIs happy? |
@agramfort yes, however I don't know how to solve the issue with |
you cannot obtain more than one value per fold with a sklearn scorer. If you have multiple targets you need a custom scorer that return more than a float so cross_val_score(clf, X, y, scorer='roc_auc') simple cannot work as is. |
@agramfort I think there is a confusion between multiple issues:
|
does using check_scoring function from sklearn help?
see also implementation of make_scorer function.
a sklearn scorer made from a string knows if it needs proba or decision
function or just predict.
|
@kaichogami how are we doing over here? |
@kingjr I haven't touched it after my last comment. I'll look over it again and let you know if I can solve it, in few days. |
@kingjr as Alex mentioned we need to design custom scorer to make |
+1 for the second option Le 15 sept. 2016 04:19, "Asish Panda" notifications@github.com a écrit :
|
Apologies @kingjr, the above design wouldn't work as well. We are now returning one value per fold, however we still have an array of score, one float for each estimator and this line does not allow score to be anything other than a float, which makes sense. Error output
The only solution I can think of is to take average of scores, which could be a parameter in constructor. This should be made true while using with |
I think we have to dissociate three problems:
I recommend to solve 1, and leave 2 and 3 for now. |
636d666
to
e19ed6a
Compare
@kingjr I have removed the |
Not sure what s going on with Travis. Can you repush with a forced set-upstream? |
e19ed6a
to
20a3267
Compare
Pushing again didn't help. Perhaps @Eric89GXL has some idea? |
@@ -4,6 +4,8 @@ | |||
|
|||
import numpy as np | |||
|
|||
from sklearn.metrics import make_scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Travis error says:
Found un-nested sklearn import")
In other words, this needs to be nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eric89GXL Thanks for your help!
b948831
to
029c716
Compare
Current coverage is 87.35% (diff: 96.55%)
|
I added the import in the relevant methods rather than using global imports, which solved the nested import issue. |
@@ -19,6 +19,9 @@ class SearchLight(BaseEstimator, TransformerMixin): | |||
---------- | |||
base_estimator : object | |||
The base estimator to iteratively fit on a subset of the dataset. | |||
scoring : callable, defaults to None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make it accept strings as well, similar to sklearn cross_val_score
|
||
if not (scoring is None or hasattr(scoring, '__call__')): | ||
raise ValueError("scoring must be None or a callable type") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra line
# generated by estimator.score(). Else, we must first get the | ||
# predictions based on the scorer. | ||
self.scoring = (make_scorer(self.scoring) if self.scoring is not None | ||
else self.scoring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object parameters should not be change.
You could store the scorer in self.scorer_
, but I think we need not store the scorer, WDYT @agramfort ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now making all scorer
checks in __init__
which also handles string.
self.n_jobs = n_jobs | ||
|
||
if not isinstance(self.n_jobs, int): | ||
raise ValueError('n_jobs must be int, got %s' % n_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add the init? GeneralizationLight inherits from Searchlight and should have the same init params no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would otherwise result in error for the doc string for the scorer
parameter, as in GeneralizationLight we have not modified it to change scorer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shoudlnt change this init, but also add a scoring parameter to GeneralizationLight
@@ -55,6 +57,13 @@ def test_searchlight(): | |||
assert_true(np.sum(np.abs(score)) != 0) | |||
assert_true(score.dtype == float) | |||
|
|||
# change score method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test where
scoring='foo'
- check that sl.scoring == default
# change score method | ||
sl = SearchLight(LogisticRegression(), scoring=roc_auc_score) | ||
sl.fit(X, y) | ||
score = sl.score(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add explicit test that check the score
assert_equal(score[0], roc_auc_score(y, LogisticRegression().fit(X,y).predict_proba(X)[:, 1))
@@ -30,10 +33,24 @@ def __repr__(self): | |||
repr_str += ', fitted with %i estimators' % len(self.estimators_) | |||
return repr_str + '>' | |||
|
|||
def __init__(self, base_estimator, n_jobs=1): | |||
def __init__(self, base_estimator, scoring=None, n_jobs=1): | |||
from sklearn.metrics import make_scorer, get_scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kingjr Is it alright to add imports in __init__
of class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No see below
@kingjr please have a look at the changes. |
self.n_jobs = n_jobs | ||
|
||
if not isinstance(self.n_jobs, int): | ||
raise ValueError('n_jobs must be int, got %s' % n_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shoudlnt change this init, but also add a scoring parameter to GeneralizationLight
@@ -30,10 +33,24 @@ def __repr__(self): | |||
repr_str += ', fitted with %i estimators' % len(self.estimators_) | |||
return repr_str + '>' | |||
|
|||
def __init__(self, base_estimator, n_jobs=1): | |||
def __init__(self, base_estimator, scoring=None, n_jobs=1): | |||
from sklearn.metrics import make_scorer, get_scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No see below
|
||
elif self.scoring is not None: | ||
self.scoring = get_scorer(self.scoring) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change the param at init, this will break the cloning behavior.
get_scorer and make_scorer should be called at .score() not init
24b48f0
to
0bf0626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @kaichogami , only some docstring fixes, and I'll merge if @agramfort is ok.
Note that we will temporarily privatize this enhancement next week to ensure that SearchLight isn't in the next release.
@@ -328,6 +345,10 @@ def _sl_score(estimators, X, y): | |||
X : array, shape (n_samples, nd_features, n_estimators) | |||
The target data. The feature dimension can be multidimensional e.g. | |||
X.shape = (n_samples, n_features_1, n_features_2, n_estimators) | |||
scoring : callable or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or string?
@@ -375,11 +399,13 @@ class GeneralizationLight(SearchLight): | |||
---------- | |||
base_estimator : object | |||
The base estimator to iteratively fit on a subset of the dataset. | |||
scoring : callable, string, defaults to None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scoring : callable | string | None
Do you want to give it a look @agramfort ? |
LGTM @kingjr merge if you're happy fix the docstrings in master if needed or better in your next "privatization" PR |
I'll fix the docstring in the "privatizing" PR ;) |
Thanks @kaichogami ! |
@kingjr You are welcome! What is "privatizing" PR that is being talked about? |
We re making a release next week, but some of the recent decoding contribs Le 25 sept. 2016 9:59 PM, "Asish Panda" notifications@github.com a écrit :
|
Enhancement as discussed in #3475. Aim to support