Inheritance based estimator tags #14644

amueller · 2019-08-13T17:33:07Z

This is an alternative to #14635.
Fixes #14044.
It uses the more straight-forward solution of fixing our inheritance order.

This is slightly weird to me because the mixins now call super even though they don't have a baseclass.

NicolasHug · 2019-08-13T18:58:50Z

There's no chance reordering the Mixins can be backward-incompatible for libraries that inherit from our estimators, right??

I like this solution better than creating an artificial base class #14635

amueller · 2019-08-13T19:01:45Z

This is probably cleaner for now.

amueller · 2019-08-13T20:02:27Z

Also, this is redoing the tags mechanism, so in terms of tags it has some backward-compatibility issues. In terms of the rest it should be fine.

adrinjalali · 2019-08-14T13:27:56Z

sklearn/linear_model/bayes.py

@@ -379,7 +379,7 @@ def _log_marginal_likelihood(self, n_samples, n_features, eigen_vals,
 # ARD (Automatic Relevance Determination) regression


-class ARDRegression(LinearModel, RegressorMixin):
+class ARDRegression(RegressorMixin, LinearModel):


Some how from the names, I'd think the other way is the correct order.

Why? Mixins go to the very left so the super call there doesn't go to object. Clearly as Gaël argues here: #14044 (comment) that's much easier to reason about than having an actual class hierarchy....

adrinjalali · 2019-08-14T13:28:39Z

sklearn/linear_model/omp.py

@@ -539,7 +539,7 @@ def orthogonal_mp_gram(Gram, Xy, n_nonzero_coefs=None, tol=None,
        return np.squeeze(coef)


-class OrthogonalMatchingPursuit(LinearModel, RegressorMixin, MultiOutputMixin):
+class OrthogonalMatchingPursuit(MultiOutputMixin, RegressorMixin, LinearModel):


kinda same here, confused.

Generally: the more specialized, the more to the left. And all the mixins are more specialized than all the estimators because that's the way mixins work. I argue in #14044 (comment) that I think this is silly and confusing.

adrinjalali · 2019-08-14T13:29:31Z

NICE!

amueller · 2019-08-21T19:09:31Z

I think this will be very helpful for extending the tags (which there is quite a bit of work on in other places). Anyone else interested? ;)

rth

So now that parent classes are in the correct MRO order, what's preventing us from iterating over MRO, getting the dict from _more_tags and overwriting existing keys in the merged result (if any)?

That would avoid having to define _get_tags in each mixin (and calling super) -- it took me some time to get what's happening. So here we construct tags by recursively iterating over parent classes in MRO order. It would be less verbose and simpler to just use getmro as we did before, no?

Though I might be missing something, it has been a while since the last PR on this..

amueller · 2019-08-21T20:39:52Z

I think it's very hacky to iterate over the MRO ourselves. Why not use the built-in python way, which is using super?
The only part that's more verbose is having to call super, which is a single line, right? But then there's no behind the scenes magic.

I'm not sure I understand your point about mixins. How would you add a tag in a mixin without adding a method?

amueller · 2019-08-21T20:46:09Z

we could shorten it if we don't use update:

def _get_tags(self):
	return dict(list(super._get_tags().items()) + [('multioutput', True)]))

but that doesn't seem much more readable
or we define an append_dicts function?

rth · 2019-08-21T20:59:43Z

I think it's very hacky to iterate over the MRO ourselves. Why not use the built-in python way, which is using super?

Given all the misconceptions around super() (https://fuhm.org/super-harmful/) I'm not sure that using it is better.

Also code like,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    def _get_tags(self):
        tags = super()._get_tags()
        tags.update({'multioutput': True})
        return tags

really makes no sense to me (or at least it makes one think about MRO more that what I would like). It tightly (and implicitly) locks the implementation of this class with the BaseEstimator class which I wouldn't consider a good design pattern. It's also not possible to programmatically get tags of that mixin, since MultiOutputMixin()._get_tags() raises 'super' object has no attribute '_get_tags' error.

I would very much prefer,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    def _more_tags(self):
        return  {'multioutput': True})

How would you add a tag in a mixin without adding a method?

or for that matter, to answer the above question,

class MultiOutputMixin:
    """Mixin to mark estimators that support multioutput."""
    _more_tags = {'multioutput': True}

and then in BaseEstimator._get_tags assemble it all (with a distinction on whether _more_tags is a dict or a callable). I'm not sure if there were previously discussed issues with this approach.

rth · 2019-08-21T21:16:33Z

so in terms of tags it has some backward-compatibility issues.

Indeed, for projects that did start using tags (e.g. imbalanced-learn) it would break CI. I know tags are private, but if we can avoiding changing them too often (unless there is a major reason for it) it might be best :) Also if they every want to support multiple scikit-learn versions (imbalanced-learn doesn't) they would be in an awkward situation to having to define both _more_tags and _get_tags..

It tightly (and implicitly) locks the implementation of this class with the BaseEstimator class which I wouldn't consider a good design pattern.

To expand on that, imagine some contrib project defines a new mixin with tags. With this PR it would do this with the super mechanism. So then the implementation of tags would be spread between scikit-learn and their contrib project, we can certainly change it but it would be breaking changes. While if all the logic is on the BaseEstimator (assuming _more_tags works as before) we can change this mechanism in some ways later, hopefully in a transparent way for depending projects.

amueller · 2019-08-21T23:40:26Z

@rth I agree that the current implementation is not natural and the coupling between the Mixins and BaseEstimator is weird, but I would argue that's because we're using Mixins in the first place. If we would be using a class hierarchy that would be much cleaner, see #14635 and my comments at #14044 (comment)

I don't see any benefits of using mixins instead of having these inherit from BaseEstimator.

I agree that going from _more_tags to _get_tags is awkward, but then again this is private API. I'm not sure really how to deal with this. But the design of having two functions seems pretty awkward to me.

So then the implementation of tags would be spread between scikit-learn and their contrib project, we can certainly change it but it would be breaking changes

I don't get this argument. It's spread out in any case. The issue is that it's spread out and this PR changes the api and therefore breaks the coupling. But I wouldn't say it makes it more coupled than it was before.

I agree that there's misconceptions about super but I think this will still be much easier for a contributor or third-party author to wrap their head around than us manually walking the MRO.

amueller · 2019-08-21T23:44:52Z

In particular, if we assemble the tags by some magic in BaseEstimator, there is no way for users to have any expectations about how it works. If we do it via inheritance, the mechanism is obvious (even though maybe hard to understand), and it's a standard python mechanism.

If we walk the MRO in a private function in BaseEstimator, that's very magic and the user has no way to figure out if we walk it forward or backward unless they read the code for the magic.

NicolasHug · 2019-08-22T20:50:59Z

I overall agree with @amueller's position.

I would also argue that super's (somewhat confusing) behavior is something that a third-party package developer should know about. We definitely should, at the very least.

Walking the MRO in the BaseEstimator class is basically us manually re-implementing super's logic. I'm not sure how it can make things less confusing, since the behavior will still be super's behavior. It will rather obfuscate the logic into something non-standard. The benefit of super is that it's a Python 3 standard.

Regarding backward incompatibility: we made clear it's private. I understand the inconvenience for authors, but I feel like most libraries only support a handful of estimators, so the required changes are limited.

rth · 2019-08-22T21:36:26Z

I see the appeal of using inheritance with super, but I as long as we are using mixins that would be difficult I think?

I would also argue that super's (somewhat confusing) behavior is something that a third-party package developer should know about. We definitely should, at the very least.

It's not just super behavior. It's that we are combining recursive function calls with multiple inheritance MRO resolution. I did a quick poll (but I suspect a fair amount of correct responses are from people following this thread ^^) so I'll mark it as inconclusive, I should have added "don't do that" option :).

Walking the MRO in the BaseEstimator class is basically us manually re-implementing super's logic. I'm not sure how it can make things less confusing, since the behavior will still be super's behavior. It will rather obfuscate the logic into something non-standard.

It's the other way around. inspect.getmro(cls) is exactly cls.__mro__ for new style classes, and super is defined as,

Super Binding

    If a is an instance of super, then the binding super(B, obj).m() searches
    obj.__class__.__mro__ for the base class A immediately preceding B and
    then invokes the descriptor with the call: 
    A.__dict__['m'].__get__(obj, obj.__class__).

so super is an application of cls.__mro__, and with recursive super calls we are manually (and inefficiently) reconstructing cls.__mro__.

In the end it's about whether we prefer to build tags recursively or with a flat loop. In either case we need to document how they are constructed because saying "we use super recursively so it's just standard python inheritance" will leave most users perplexed.

rth · 2019-08-22T22:26:52Z

In particular, if we assemble the tags by some magic in BaseEstimator, there is no way for users to have any expectations about how it works.

I would say the opposite. If we compute tags in BaseEstimator, yes we need to document how they are computed but we fully control the mechanism in BaseEstimator._get_tags in 20 lines of code, both for scikit-learn and all contrib projects that may use it (unless they overwrite _get_tags of course). All children classes can do is to add or overwrite parent tags in MRO order if we say so.

If we use recursive super calls, each of the subclass have access to all tags. Yes, it probably adds/overwrites a few but we can't make any guarantees about it. It may also very well drop all of them (or make some other operations) due to an implementation mistake. So the only way to be sure of the tags you get, is to read through parent classes MRO, and check what each _get_tags is doing. We would also typically need more tests to ensure that _get_tags of various estimators are working as expected.

One could argue about tests for _more_tags as well, but that's why if we change the API I would be in favor of if being a plain dict (or a property in the rare cases when it it has to be dynamic) as fewer things need to be tested then.

rth · 2019-08-26T13:34:06Z

Putting aside the discussion about tags API, I think it would be great to merge the mixin re-ordering changes.

…ixin_order # Conflicts: # sklearn/preprocessing/label.py

amueller · 2019-09-04T18:39:28Z

see #14884 for a refactoring without all the niceties the refactoring allows :P

# Conflicts: # sklearn/base.py # sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py # sklearn/tests/test_base.py

rth · 2019-09-05T16:25:59Z

@amueller thanks for resolving conflicts!

I think it's a horrible design to assemble stuff in the base class

I would like to note that this is the design that you proposed in the original estimator tags PR, so you probably didn't think that at the time :)

and I think super is a much cleaner design.

I agree that this is debatable. Personally I would rather prefer #14892 but if there is enough support for this PR (maybe @jnothman, since you followed the initial PR) I will not argue further about it :)

rth · 2019-09-05T16:28:06Z

@amueller Changed the title to reflect more the contents of this PR since #14884 was merged. Please edit it if you think of something better.

amueller · 2019-09-05T18:57:08Z

I would like to note that this is the design that you proposed in the original estimator tags PR, so you probably didn't think that at the time :)

No, I definitely thought that at the time. But I was willing to use a terrible design to get the thing going. I assumed that rearranging the mixins would prolong the debate for too long and I wanted to move forward with the tags (that already had taken a couple of years).

rth · 2019-09-09T16:30:12Z

To summarize the situation on this issue, Nicolas is +1 I think, I'm +0. We would need one (or ideally 2) more approvals to merge this since it's a bit controversial.

Anyone interested in tags and willing to review this? :)

amueller · 2019-09-09T17:22:30Z

yeah ideally I'd have some other parts of the technical committee look at it before it gets merged ;)

adrinjalali

I'm happy with it (although not a TC)

You're also adding the support nan tag for HGBT, which we forgot to set in its PR, which reminds me of thinking about ways to maintain the consistency of the tags and the actual implementation.

NicolasHug · 2019-09-10T10:59:11Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

@@ -638,6 +638,11 @@ def n_iter_(self):
        check_is_fitted(self)
        return len(self._predictors)

+    def _get_tags(self):


You didn't remove the _more_tags method above

NicolasHug · 2019-09-10T11:06:55Z

about ways to maintain the consistency of the tags and the actual implementation

From my minimal experience (esp about the nan tag in HBDT), if you don't set the tags appropriately the common tests will fail: we have tests to make sure estimators support what they claim they support, and tests to make sure they don't support what they're not supposed to support.

jnothman · 2019-09-10T12:06:55Z

Weren't we considering another option which kept the _more_tags syntax?

rth · 2019-09-10T12:45:52Z

Weren't we considering another option which kept the _more_tags syntax?

Well there was #14884 that was closed due to the lack of positive feedback. A third alternative is to keep things as they are now in master.

NicolasHug · 2019-09-10T12:49:49Z

A third alternative is to keep things as they are now in master.

I'm fine with that too honestly

amueller · 2019-09-10T14:59:13Z

You're also adding the support nan tag for HGBT, which we forgot to set in its PR, which reminds me of thinking about ways to maintain the consistency of the tags and the actual implementation.

I'm pretty sure at least the NaN tag has to be consistent, right? Or is it only testing in one direction? Our test might not assert that it fails if you don't have the tag, but I thought it did.

# Conflicts: # sklearn/preprocessing/_function_transformer.py

NicolasHug · 2019-09-10T15:25:46Z

The tag was set and yes the tests would fail in case of an inconsistency.

amueller · 2019-09-10T15:42:16Z

Weren't we considering another option which kept the _more_tags syntax?

The only solution I can come up with is the following:

def recurse(cls):
    def f(obj):
        s = super(cls, obj)
        if hasattr(s, 'recurse'):
            s.recurse()
        cls.more(obj)
    return f

class RecursionMeta(type):
    def __new__(cls, name, bases, dct):
        x = super().__new__(cls, name, bases, dct)
        x.recurse = recurse(x)
        return x

class A(metaclass=RecursionMeta):
    def more(self):
        print("more A")
        
class B(A):
    def more(self):
        print("more B")
    
class C(A):
    def more(self):
        print("more C")

class D(B, C):
    pass

d = D()
d.recurse()

To me, this is slightly more elegant than walking the MRO, but probably harder to understand. Also, it requires a common metaclass for everything that wants to add tags.

What this solution does is in effect add a method recurse (i.e. _get_tags)
to each class, so that the resulting hierarchy is something like this:

class A:
    def more(self):
        print("more A")
    
    def recurse(self):
        A.more(self)
        
class B(A):
    def more(self):
        print("more B")
    
    def recurse(self):
        super().recurse()
        B.more(self)
        
        
class C(A):
    def more(self):
        print("more C")
        
    def recurse(self):
        super().recurse()
        C.more(self)

class D(B, C):
    pass

[edit: fixed recurse to be a method]

amueller · 2019-09-10T16:02:00Z

@NicolasHug you wanted a programming interview question, right? Can you come up with a solution that doesn't require metaclasses?

NicolasHug · 2019-09-10T16:56:49Z

Stared at the white board for more than 10 minutes so I guess I'm not getting an offer.

NicolasHug · 2019-09-10T16:58:16Z

regarding the metaclass solution, I'd rather keep the current implementation. It's not standard but it's at least easier to understand IMO.

amueller · 2019-09-23T16:05:39Z

I'll close this as I think we can keep it as-is unless there's a strong reason to change it or we run into issues. It was fun doing the metaclass but I don't think we should actually use it.

amueller added 6 commits August 12, 2019 11:58

use super for tags

db527b6

use inheritance to allow overwriting tags

5982331

typo

3a9b17e

fix pep8 line length

5e9a79d

remove TagsBase baseclass

4f822cc

fix order of mixin and BaseEstimator

8eab08a

amueller force-pushed the mixin_order branch from ac8a13d to 8eab08a Compare August 13, 2019 17:37

fix OVR classifier

507e411

amueller mentioned this pull request Aug 13, 2019

Estimator tag overwriting and update in _get_tags #14044

Closed

oeo8

ce29977

adrinjalali reviewed Aug 14, 2019

View reviewed changes

amueller added 2 commits August 16, 2019 12:11

Merge branch 'master' into mixin_order

6db8b65

Merge branch 'master' into mixin_order

e6e8708

rth reviewed Aug 21, 2019

View reviewed changes

Merge branch 'mixin_order' of github.com:amueller/scikit-learn into m…

a28d115

…ixin_order # Conflicts: # sklearn/preprocessing/label.py

amueller mentioned this pull request Sep 4, 2019

fix mixin inheritance order, allow overwriting tags #14884

Merged

Merge branch 'master' into mixin_order

5305bdb

# Conflicts: # sklearn/base.py # sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py # sklearn/tests/test_base.py

amueller mentioned this pull request Sep 5, 2019

Allow overwriting tags by rewriting the mechanism to use inheritance #14635

Closed

undo merge mistake

8ff29c7

rth mentioned this pull request Sep 5, 2019

Less verbose estimator tags mechanism #14892

Closed

rth changed the title ~~Mixin order fix for tags~~ Inheritance based estimator tags Sep 5, 2019

amueller added 2 commits September 5, 2019 14:58

Merge branch 'master' into mixin_order

93e773c

fix KnnImputer tags

6a7a638

adrinjalali approved these changes Sep 10, 2019

View reviewed changes

NicolasHug reviewed Sep 10, 2019

View reviewed changes

Merge branch 'master' into mixin_order

51259a7

# Conflicts: # sklearn/preprocessing/_function_transformer.py

amueller closed this Sep 23, 2019

Uh oh!

Inheritance based estimator tags #14644

Inheritance based estimator tags #14644

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!