-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Inheritance based estimator tags #14644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There's no chance reordering the Mixins can be backward-incompatible for libraries that inherit from our estimators, right?? I like this solution better than creating an artificial base class #14635 |
This is probably cleaner for now. |
Also, this is redoing the tags mechanism, so in terms of tags it has some backward-compatibility issues. In terms of the rest it should be fine. |
sklearn/linear_model/bayes.py
Outdated
@@ -379,7 +379,7 @@ def _log_marginal_likelihood(self, n_samples, n_features, eigen_vals, | |||
# ARD (Automatic Relevance Determination) regression | |||
|
|||
|
|||
class ARDRegression(LinearModel, RegressorMixin): | |||
class ARDRegression(RegressorMixin, LinearModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some how from the names, I'd think the other way is the correct order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Mixins go to the very left so the super call there doesn't go to object
. Clearly as Gaël argues here: #14044 (comment) that's much easier to reason about than having an actual class hierarchy....
sklearn/linear_model/omp.py
Outdated
@@ -539,7 +539,7 @@ def orthogonal_mp_gram(Gram, Xy, n_nonzero_coefs=None, tol=None, | |||
return np.squeeze(coef) | |||
|
|||
|
|||
class OrthogonalMatchingPursuit(LinearModel, RegressorMixin, MultiOutputMixin): | |||
class OrthogonalMatchingPursuit(MultiOutputMixin, RegressorMixin, LinearModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda same here, confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally: the more specialized, the more to the left. And all the mixins are more specialized than all the estimators because that's the way mixins work. I argue in #14044 (comment) that I think this is silly and confusing.
NICE! |
I think this will be very helpful for extending the tags (which there is quite a bit of work on in other places). Anyone else interested? ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now that parent classes are in the correct MRO order, what's preventing us from iterating over MRO, getting the dict from _more_tags
and overwriting existing keys in the merged result (if any)?
That would avoid having to define _get_tags
in each mixin (and calling super) -- it took me some time to get what's happening. So here we construct tags by recursively iterating over parent classes in MRO order. It would be less verbose and simpler to just use getmro
as we did before, no?
Though I might be missing something, it has been a while since the last PR on this..
I think it's very hacky to iterate over the MRO ourselves. Why not use the built-in python way, which is using I'm not sure I understand your point about mixins. How would you add a tag in a mixin without adding a method? |
we could shorten it if we don't use def _get_tags(self):
return dict(list(super._get_tags().items()) + [('multioutput', True)])) but that doesn't seem much more readable |
Given all the misconceptions around Also code like, class MultiOutputMixin:
"""Mixin to mark estimators that support multioutput."""
def _get_tags(self):
tags = super()._get_tags()
tags.update({'multioutput': True})
return tags really makes no sense to me (or at least it makes one think about MRO more that what I would like). It tightly (and implicitly) locks the implementation of this class with the I would very much prefer, class MultiOutputMixin:
"""Mixin to mark estimators that support multioutput."""
def _more_tags(self):
return {'multioutput': True})
or for that matter, to answer the above question, class MultiOutputMixin:
"""Mixin to mark estimators that support multioutput."""
_more_tags = {'multioutput': True} and then in |
Indeed, for projects that did start using tags (e.g. imbalanced-learn) it would break CI. I know tags are private, but if we can avoiding changing them too often (unless there is a major reason for it) it might be best :) Also if they every want to support multiple scikit-learn versions (imbalanced-learn doesn't) they would be in an awkward situation to having to define both
To expand on that, imagine some contrib project defines a new mixin with tags. With this PR it would do this with the super mechanism. So then the implementation of tags would be spread between scikit-learn and their contrib project, we can certainly change it but it would be breaking changes. While if all the logic is on the |
@rth I agree that the current implementation is not natural and the coupling between the Mixins and BaseEstimator is weird, but I would argue that's because we're using Mixins in the first place. If we would be using a class hierarchy that would be much cleaner, see #14635 and my comments at #14044 (comment) I don't see any benefits of using mixins instead of having these inherit from BaseEstimator. I agree that going from
I don't get this argument. It's spread out in any case. The issue is that it's spread out and this PR changes the api and therefore breaks the coupling. But I wouldn't say it makes it more coupled than it was before. I agree that there's misconceptions about |
In particular, if we assemble the tags by some magic in If we walk the MRO in a private function in |
I overall agree with @amueller's position. I would also argue that Walking the MRO in the Regarding backward incompatibility: we made clear it's private. I understand the inconvenience for authors, but I feel like most libraries only support a handful of estimators, so the required changes are limited. |
I see the appeal of using inheritance with super, but I as long as we are using mixins that would be difficult I think?
It's not just super behavior. It's that we are combining recursive function calls with multiple inheritance MRO resolution. I did a quick poll (but I suspect a fair amount of correct responses are from people following this thread ^^) so I'll mark it as inconclusive, I should have added "don't do that" option :).
It's the other way around.
so In the end it's about whether we prefer to build tags recursively or with a flat loop. In either case we need to document how they are constructed because saying "we use super recursively so it's just standard python inheritance" will leave most users perplexed. |
I would say the opposite. If we compute tags in BaseEstimator, yes we need to document how they are computed but we fully control the mechanism in If we use recursive super calls, each of the subclass have access to all tags. Yes, it probably adds/overwrites a few but we can't make any guarantees about it. It may also very well drop all of them (or make some other operations) due to an implementation mistake. So the only way to be sure of the tags you get, is to read through parent classes MRO, and check what each One could argue about tests for |
Putting aside the discussion about tags API, I think it would be great to merge the mixin re-ordering changes. |
…ixin_order # Conflicts: # sklearn/preprocessing/label.py
see #14884 for a refactoring without all the niceties the refactoring allows :P |
# Conflicts: # sklearn/base.py # sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py # sklearn/tests/test_base.py
@amueller thanks for resolving conflicts!
I would like to note that this is the design that you proposed in the original estimator tags PR, so you probably didn't think that at the time :)
I agree that this is debatable. Personally I would rather prefer #14892 but if there is enough support for this PR (maybe @jnothman, since you followed the initial PR) I will not argue further about it :) |
No, I definitely thought that at the time. But I was willing to use a terrible design to get the thing going. I assumed that rearranging the mixins would prolong the debate for too long and I wanted to move forward with the tags (that already had taken a couple of years). |
To summarize the situation on this issue, Nicolas is +1 I think, I'm +0. We would need one (or ideally 2) more approvals to merge this since it's a bit controversial. Anyone interested in tags and willing to review this? :) |
yeah ideally I'd have some other parts of the technical committee look at it before it gets merged ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with it (although not a TC)
You're also adding the support nan tag for HGBT, which we forgot to set in its PR, which reminds me of thinking about ways to maintain the consistency of the tags and the actual implementation.
@@ -638,6 +638,11 @@ def n_iter_(self): | |||
check_is_fitted(self) | |||
return len(self._predictors) | |||
|
|||
def _get_tags(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't remove the _more_tags method above
From my minimal experience (esp about the nan tag in HBDT), if you don't set the tags appropriately the common tests will fail: we have tests to make sure estimators support what they claim they support, and tests to make sure they don't support what they're not supposed to support. |
Weren't we considering another option which kept the _more_tags syntax?
|
Well there was #14884 that was closed due to the lack of positive feedback. A third alternative is to keep things as they are now in master. |
I'm fine with that too honestly |
I'm pretty sure at least the NaN tag has to be consistent, right? Or is it only testing in one direction? Our test might not assert that it fails if you don't have the tag, but I thought it did. |
# Conflicts: # sklearn/preprocessing/_function_transformer.py
The tag was set and yes the tests would fail in case of an inconsistency. |
The only solution I can come up with is the following: def recurse(cls):
def f(obj):
s = super(cls, obj)
if hasattr(s, 'recurse'):
s.recurse()
cls.more(obj)
return f
class RecursionMeta(type):
def __new__(cls, name, bases, dct):
x = super().__new__(cls, name, bases, dct)
x.recurse = recurse(x)
return x
class A(metaclass=RecursionMeta):
def more(self):
print("more A")
class B(A):
def more(self):
print("more B")
class C(A):
def more(self):
print("more C")
class D(B, C):
pass
d = D()
d.recurse() To me, this is slightly more elegant than walking the MRO, but probably harder to understand. Also, it requires a common metaclass for everything that wants to add tags. What this solution does is in effect add a method class A:
def more(self):
print("more A")
def recurse(self):
A.more(self)
class B(A):
def more(self):
print("more B")
def recurse(self):
super().recurse()
B.more(self)
class C(A):
def more(self):
print("more C")
def recurse(self):
super().recurse()
C.more(self)
class D(B, C):
pass [edit: fixed recurse to be a method] |
@NicolasHug you wanted a programming interview question, right? Can you come up with a solution that doesn't require metaclasses? |
Stared at the white board for more than 10 minutes so I guess I'm not getting an offer. |
regarding the metaclass solution, I'd rather keep the current implementation. It's not standard but it's at least easier to understand IMO. |
I'll close this as I think we can keep it as-is unless there's a strong reason to change it or we run into issues. It was fun doing the metaclass but I don't think we should actually use it. |
This is an alternative to #14635.
Fixes #14044.
It uses the more straight-forward solution of fixing our inheritance order.
This is slightly weird to me because the mixins now call super even though they don't have a baseclass.