-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Problem with ducktyping and meta-estimators #1805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An alternative (which would break backward compat) is to split |
I don't think that distinction separates estimators with/without And it does nothing for |
Yeah, I think we should stick with the ducktyping. Are there currently any failures? |
I agree we should currently stick with ducktyping, but we need to use something like properties to ensure the wrapping estimator reflects the underlying estimator's capabilities, or else the ductyping is broken. One can come up with examples that fail on the current release, such as calling |
I am OK with properties here: adaptation is a valid use case for They should be well documented in the class docstring, though. Thanks for bugging us about these core problems! |
Basically we need to fix About the *CV objects: I don't like them any way ;) |
Their code is horrendus, but they are very useful. I agree that they need |
My main problem is that they don't work together with GridSearchCV, so they are useless in Pipelines. |
Yes, we have a problem with a notion of out of bag validation. With the |
have you looked at what I wrote in #1626? Letting the Scorer handle the validation would be a possibility, still GridSearchCV would need to know which groups of parameters it would have to pass on and how to interpret the returned scores. |
You can't correctly throw But I don't know if I assume your complaints above are about things like |
Could you elaborate on the last comments? How would you change the parameter API to support changing parameters without refitting? |
One problem with the "no refitting required" idea might be that it need to know the "worst case parameter" beforehand, i.e. the setting which needs the most computation. This might not be known beforehand when doing random sampling. While doing random parameter sampling on a CV object alone makes no sense, in a pipeline I think it does. |
@jnothman you are obviously right that when doing |
[The topic of changing parameters without refitting isn't really within scope here, and I keep coming up with refinements to each solution I think of. But let's say we didn't just want it to apply to I think the cases you're talking about in the last comment, @amueller, present exactly the issue. Do we want to be able to use |
Agreed :) [second part] |
this is a core problem that we need to solve but we haven't discussed your fix enough to be able to merge it :-/ |
#3484 is more-or-less a duplicate |
Yay! Thanks to all involved in finally closing this! On 27 December 2014 at 23:59, Lars notifications@github.com wrote:
|
In a number of places, sklearn controls flow according to the existence of some method on an estimator. For example:
*SearchCV.score
checks forscore
on the estimator;Scorer
andmulticlass
functions check fordecision_function
; and it is used for validation inAdaBoostClassifier.fit
,multiclass._check_estimator
andPipeline
; and for testing intest_common
.Meta-estimators such as
*SearchCV
,Pipeline
,RFECV
, etc. should respond to suchhasattr
s in agreement with their underlying estimators (or else thehasattr
approach should be avoided).This is possible by implementing such methods with a
property
that returns the correct method from the sub-estimator (or a closure around it), or raisesAttributeError
if the sub-estimator is found lacking (see #1801).hasattr
would then function correctly. Caveats: the code would be less straightforward in some cases;help()
/pydoc
won't show the methods as methods (with an argument list, etc.), though theproperty
's docstring will show.The text was updated successfully, but these errors were encountered: