8000 Inheritance based estimator tags by amueller · Pull Request #14644 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Inheritance based estimator tags #14644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

amueller
Copy link
Member
@amueller amueller commented Aug 13, 2019

This is an alternative to #14635.
Fixes #14044.
It uses the more straight-forward solution of fixing our inheritance order.

This is slightly weird to me because the mixins now call super even though they don't have a baseclass.

@NicolasHug
Copy link
Member
NicolasHug commented Aug 13, 2019

There's no chance reordering the Mixins can be backward-incompatible for libraries that inherit from our estimators, right??

I like this solution better than creating an artificial base class #14635

@amueller
Copy link
Member Author

This is probably cleaner for now.

@amueller
Copy link
Member Author

Also, this is redoing the tags mechanism, so in terms of tags it has some backward-compatibility issues. In terms of the rest it should be fine.

@@ -379,7 +379,7 @@ def _log_marginal_likelihood(self, n_samples, n_features, eigen_vals,
# ARD (Automatic Relevance Determination) regression


class ARDRegression(LinearModel, RegressorMixin):
class ARDRegression(RegressorMixin, LinearModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some how from the names, I'd think the other way is the correct order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Mixins go to the very left so the super call there doesn't go to object. Clearly as Gaël argues here: #14044 (comment) that's much easier to reason about than having an actual class hierarchy....

@@ -539,7 +539,7 @@ def orthogonal_mp_gram(Gram, Xy, n_nonzero_coefs=None, tol=None,
return np.squeeze(coef)


class OrthogonalMatchingPursuit(LinearModel, RegressorMixin, MultiOutputMixin):
class OrthogonalMatchingPursuit(MultiOutputMixin, RegressorMixin, LinearModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda same here, confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally: the more specialized, the more to the left. And all the mixins are more specialized than all the estimators because that's the way mixins work. I argue in #14044 (comment) that I think this is silly and confusing.

@adrinjalali
Copy link
Member

NICE!

@amueller
Copy link
Member Author

I think this will be very helpful for extending the tags (which there is quite a bit of work on in other places). Anyone else interested? ;)

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that parent classes are in the correct MRO order, what's preventing us from iterating over MRO, getting the dict from _more_tags and overwriting existing keys in the merged result (if any)?

That would avoid having to define _get_tags in each mixin (and calling super) -- it took me some time to get what's happening. So here we construct tags by recursively iterating over parent classes in MRO order. It would be less verbose and simpler to just use getmro as we did before, no?

Though I might be missing something, it has been a while since the last PR on this..

@amueller
Copy link
Member Author

I think it's very hacky to iterate over the MRO ourselves. Why not use the built-in python way, which is using super?
The only part that's more verbose is having to call super, which is a single line, right? But then there's no behind the scenes magic.

I'm not sure I understand your point about mixins. How would you add a tag in a mixin without adding a method?

@amueller
Copy link
Member Author

we could shorten it if we don't use update:

def _get_tags(self):
	return dict(list(super._get_tags().items()) + [('multioutput', True)]))

but that doesn't seem much more readable
or we define an append_dicts function?

@rth
Copy link
Member
rth commented Aug 21, 2019

I think it's very hacky to iterate over the MRO ourselves. Why not use the built-in python way, which is using super?

Given all the misconceptions around super() (https://fuhm.org/super-harmful/) I'm not sure that using it is better.

Also code like,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    def _get_tags(self):
        tags = super()._get_tags()
        tags.update({'multioutput': True})
        return tags

really makes no sense to me (or at least it makes one think about MRO more that what I would like). It tightly (and implicitly) locks the implementation of this class with the BaseEstimator class which I wouldn't consider a good design pattern. It's also not possible to programmatically get tags of that mixin, since MultiOutputMixin()._get_tags() raises 'super' object has no attribute '_get_tags' error.

I would very much prefer,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    def _more_tags(self):
        return  {'multioutput': True})

How would you add a tag in a mixin without adding a method?

or for that matter, to answer the above question,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    _more_tags = {'multioutput': True}

and then in BaseEstimator._get_tags assemble it all (with a distinction on whether _more_tags is a dict or a callable). I'm not sure if there were previously discussed issues with this approach.

@rth
Copy link
Member
rth commented Aug 21, 2019

so in terms of tags it has some backward-compatibility issues.

Indeed, for projects that did start using tags (e.g. imbalanced-learn) it would break CI. I know tags are private, but if we can avoiding changing them too often (unless there is a major reason for it) it might be best :) Also if they every want to support multiple scikit-learn versions (imbalanced-learn doesn't) they would be in an awkward situation to having to define both _more_tags and _get_tags..

It tightly (and implicitly) locks the implementation of this class with the BaseEstimator class which I wouldn't consider a good design pattern.

To expand on that, imagine some contrib project defines a new mixin with tags. With this PR it would do this with the super mechanism. So then the implementation of tags would be spread between scikit-learn and their contrib project, we can certainly change it but it would be breaking changes. While if all the logic is on the BaseEstimator (assuming _more_tags works as before) we can change this mechanism in some ways later, hopefully in a transparent way for depending projects.

@amueller
Copy link
Member Author

@rth I agree that the current implementation is not natural and the coupling between the Mixins and BaseEstimator is weird, but I would argue that's because we're using Mixins in the first place. If we would be using a class hierarchy that would be much cleaner, see #14635 and my comments at #14044 (comment)

I don't see any benefits of using mixins instead of having these inherit from BaseEstimator.

I agree that going from _more_tags to _get_tags is awkward, but then again this is private API. I'm not sure really how to deal with this. But the design of having two functions seems pretty awkward to me.

So then the implementation of tags would be spread between scikit-learn and their contrib project, we can certainly change it but it would be breaking changes

I don't get this argument. It's spread out in any case. The issue is that it's spread out and this PR changes the api and therefore breaks the coupling. But I wouldn't say it makes it more coupled than it was before.

I agree that there's misconceptions about super but I think this will still be much easier for a contributor or third-party author to wrap their head around than us manually walking the MRO.

@amueller
Copy link
Member Author

In particular, if we assemble the tags by some magic in BaseEstimator, there is no way for users to have any expectations about how it works. If we do it via inheritance, the mechanism is obvious (even though maybe hard to understand), and it's a standard python mechanism.

If we walk the MRO in a private function in BaseEstimator, that's very magic and the user has no way to figure out if we walk it forward or backward unless they read the code for the magic.

@NicolasHug
Copy link
Member

I overall agree with @amueller's position.

I would also argue that super's (somewhat confusing) behavior is something that a third-party package developer should know about. We definitely should, at the very least.

Walking the MRO in the BaseEstimator class is basically us manually re-implementing super's logic. I'm not sure how it can make things less confusing, since the behavior will still be super's behavior. It will rather obfuscate the logic into something non-standard. The benefit of super is that it's a Python 3 standard.

Regarding backward incompatibility: we made clear it's private. I understand the inconvenience for authors, but I feel like most libraries only support a handful of estimators, so the required changes are limited.

@rth
Copy link
Member
rth commented Aug 22, 2019

I see the appeal of using inheritance with super, but I as long as we are using mixins that would be difficult I think?

I would also argue that super's (somewhat confusing) behavior is something that a third-party package developer should know about. We definitely should, at the very least.

It's not just super behavior. It's that we are combining recursive function calls with multiple inheritance MRO resolution. I did a quick poll (but I suspect a fair amount of correct responses are from people following this thread ^^) so I'll mark it as inconclusive, I should have added "don't do that" option :).

Walking the MRO in the BaseEstimator class is basically us manually re-implementing super's logic. I'm not sure how it can make things less confusing, since the behavior will still be super's behavior. It will rather obfuscate the logic into something non-standard.

It's the other way around. inspect.getmro(cls) is exactly cls.__mro__ for new style classes, and super is defined as,

Super Binding

    If a is an instance of super, then the binding super(B, obj).m() searches
    obj.__class__.__mro__ for the base class A immediately preceding B and
    then invokes the descriptor with the call: 
    A.__dict__['m'].__get__(obj, obj.__class__).

so super is an application of cls.__mro__, and with recursive super calls we are manually (and inefficiently) reconstructing cls.__mro__.

In the end it's about whether we prefer to build tags recursively or with a flat loop. In either case we need to document how they are constructed because saying "we use super recursively so it's just standard python inheritance" will leave most users perplexed.

@rth
Copy link
Member
rth commented Aug 22, 2019

In particular, if we assemble the tags by some magic in BaseEstimator, there is no way for users to have any expectations about how it works.

I would say the opposite. If we compute tags in BaseEstimator, yes we need to document how they are computed but we fully control the mechanism in BaseEstimator._get_tags in 20 lines of code, both for scikit-learn and all contrib projects that may use it (unless they overwrite _get_tags of course). All children classes can do is to add or overwrite parent tags in MRO order if we say so.

If we use recursive super calls, each of the subclass have access to all tags. Yes, it probably adds/overwrites a few but we can't make any guarantees about it. It may also very well drop all of them (or make some other operations) due to an implementation mistake. So the only way to be sure of the tags you get, is to read through parent classes MRO, and check what each _get_tags is doing. We would also typically need more tests to ensure that _get_tags of various estimators are working as expected.

One could argue about tests for _more_tags as well, but that's why if we change the API I would be in favor of if being a plain dict (or a property in the rare cases when it it has to be dynamic) as fewer things need to be tested then.

@rth
Copy link
Member
rth commented Aug 26, 2019

Putting aside the discussion about tags API, I think it would be great to merge the mixin re-ordering changes.

…ixin_order

# Conflicts:
#	sklearn/preprocessing/label.py
@amueller
Copy link
Member Author
amueller commented Sep 4, 2019

see #14884 for a refactoring without all the niceties the refactoring allows :P

# Conflicts:
#	sklearn/base.py
#	sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
#	sklearn/tests/test_base.py
@rth
Copy link
Member
rth commented Sep 5, 2019

@amueller thanks for resolving conflicts!

I think it's a horrible design to assemble stuff in the base class

I would like to note that this is the design that you proposed in the original estimator tags PR, so you probably didn't think that at the time :)

and I think super is a much cleaner design.

I agree that this is debatable. Personally I would rather prefer #14892 but if there is enough support for this PR (maybe @jnothman, since you followed the initial PR) I will not argue further about it :)

@rth rth changed the title Mixin order fix for tags Inheritance based estimator tags Sep 5, 2019
@rth
Copy link
Member
rth commented Sep 5, 2019

@amueller Changed the title to reflect more the contents of this PR since #14884 was merged. Please edit it if you think of something better.

@amueller
Copy link
Member Author
amueller commented Sep 5, 2019

I would like to note that this is the design that you proposed in the original estimator tags PR, so you probably didn't think that at the time :)

No, I definitely thought that at the time. But I was willing to use a terrible design to get the thing going. I assumed that rearranging the mixins would prolong the debate for too long and I wanted to move forward with the tags (that already had taken a couple of years).

@rth
Copy link
Member
rth commented Sep 9, 2019

To summarize the situation on this issue, Nicolas is +1 I think, I'm +0. We would need one (or ideally 2) more approvals to merge this since it's a bit controversial.

Anyone interested in tags and willing to review this? :)

@amueller
Copy link
Member Author
amueller commented Sep 9, 2019

yeah ideally I'd have some other parts of the technical committee look at it before it gets merged ;)

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with it (although not a TC)

You're also adding the support nan tag for HGBT, which we forgot to set in its PR, which reminds me of thinking about ways to maintain the consistency of the tags and the actual implementation.

@@ -638,6 +638,11 @@ def n_iter_(self):
check_is_fitted(self)
return len(self._predictors)

def _get_tags(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't remove the _more_tags method above

@NicolasHug
Copy link
Member

about ways to maintain the consistency of the tags and the actual implementation

From my minimal experience (esp about the nan tag in HBDT), if you don't set the tags appropriately the common tests will fail: we have tests to make sure estimators support what they claim they support, and tests to make sure they don't support what they're not supposed to support.

@jnothman
Copy link
Member
jnothman commented Sep 10, 2019 via email

@rth
Copy link
Member
rth commented Sep 10, 2019

Weren't we considering another option which kept the _more_tags syntax?

Well there was #14884 that was closed due to the lack of positive feedback. A third alternative is to keep things as they are now in master.

@NicolasHug
Copy link
Member

A third alternative is to keep things as they are now in master.

I'm fine with that too honestly

@amueller
Copy link
Member Author

You're also adding the support nan tag for HGBT, which we forgot to set in its PR, which reminds me of thinking about ways to maintain the consistency of the tags and the actual implementation.

I'm pretty sure at least the NaN tag has to be consistent, right? Or is it only testing in one direction? Our test might not assert that it fails if you don't have the tag, but I thought it did.

# Conflicts:
#	sklearn/preprocessing/_function_transformer.py
@NicolasHug
Copy link
Member

The tag was set and yes the tests would fail in case of an inconsistency.

@amueller
Copy link
Member Author
amueller commented Sep 10, 2019

Weren't we considering another option which kept the _more_tags syntax?

The only solution I can come up with is the following:

def recurse(cls):
    def f(obj):
        s = super(cls, obj)
        if hasattr(s, 'recurse'):
            s.recurse()
        cls.more(obj)
    return f

class RecursionMeta(type):
    def __new__(cls, name, bases, dct):
        x = super().__new__(cls, name, bases, dct)
        x.recurse = recurse(x)
        return x

class A(metaclass=RecursionMeta):
    def more(self):
        print("more A")
        
class B(A):
    def more(self):
        print("more B")
    
class C(A):
    def more(self):
        print("more C")

class D(B, C):
    pass

d = D()
d.recurse()

To me, this is slightly more elegant than walking the MRO, but probably harder to understand. Also, it requires a common metaclass for everything that wants to add tags.

What this solution does is in effect add a method recurse (i.e. _get_tags)
to each class, so that the resulting hierarchy is something like this:

class A:
    def more(self):
        print("more A")
    
    def recurse(self):
        A.more(self)
        
class B(A):
    def more(self):
        print("more B")
    
    def recurse(self):
        super().recurse()
        B.more(self)
        
        
class C(A):
    def more(self):
        print("more C")
        
    def recurse(self):
        super().recurse()
        C.more(self)

class D(B, C):
    pass

[edit: fixed recurse to be a method]

@amueller
Copy link
Member Author

@NicolasHug you wanted a programming interview question, right? Can you come up with a solution that doesn't require metaclasses?

@NicolasHug
Copy link
Member

Stared at the white board for more than 10 minutes so I guess I'm not getting an offer.

@NicolasHug
Copy link
Member

regarding the metaclass solution, I'd rather keep the current implementation. It's not standard but it's at least easier to understand IMO.

@amueller
Copy link
Member Author

I'll close this as I think we can keep it as-is unless there's a strong reason to change it or we run into issues. It was fun doing the metaclass but I don't think we should actually use it.

@amueller amueller closed this Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Estimator tag overwriting and update in _get_tags
5 participants
0