MNT add reduce to loss objects #30356

adrinjalali · 2024-11-27T16:43:01Z

This adds __reduce__ with the same format as it was already included in sgd_fast loss objects.

loss.__reduce__()
(<cyfunction __pyx_unpickle_CyHalfTweedieLossIdentity at 0x793fed21c520>,
 (_loss.CyHalfTweedieLossIdentity, 8221022, (5.0,)))

to the other simpler format:

Hinge().__reduce__()
(sklearn.linear_model._sgd_fast.Hinge, (1.0,))

This doesn't change anything wrt pickle, but it makes skops.io be able to save/load these objects.

cc @lesteve @glemaitre

Since some objects are removed from sgd_fast in this release, and the errors are triggered in skops for this 1.6 release, it'd be nice to backport this to the release.

github-actions · 2024-11-27T16:45:57Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: de0b93a. Link to the linter CI: here}

lesteve · 2024-11-28T09:39:40Z

The change seems small enough but at the same time I find it weird that skops care about this, the main reason being probably because I know close to nothing about skops ...

Can you post a snippet reproducing the issue and have a short summary about skops expectations so that we can go back to it if the need arises?

So far my understanding is that skops does not want to rely on the first element of __reduce__ output (reconstruct function) for security reason and expects to be able to recreate any object by using the class and the remaing elements of __reduce__ doing something like:

# I am assuming that you get the sklearn_cls and reduce_info from skops persisted file somehow
reduce_info = recontruct_func, *args
sklearn_cls(args)

lesteve · 2024-11-28T09:47:38Z

sklearn/_loss/_loss.pyx.tp

@@ -818,6 +818,9 @@ cdef inline double_pair cgrad_hess_exponential(
 cdef class CyLossFunction:
    """Base class for convex loss functions."""

+    def __reduce__(self):


Is it needed actually? It seems to me like CyLossFunction is only used as a base class and you are not not really meant to create an instance of CyLossFunction.

this is needed since not all losses have a parameter, which means they won't have our custom __reduce__ generated for them. Note the {{if param is not None}} around the __reduce__ implementation bellow.

adrinjalali

pickling works by asking objects how they should be constructed, which is done via __reduce__. In many cases, there is no need for a custom constructor function, since objects can simply be constructed by calling their constructors. However, this doesn't generalise if the object does some fancy things in their __init__ method. Therefore tools such as cython work around it by creating a constructor function which is returned by __reduce__.

However, that beats the purpose of not calling random functions not clearly related to the object to be created when constructing objects.

Now when it comes to skops, if not caring about this, I'm not sure then how we're supposed to reconstruct the objects. We could have custom code which completely ignores the output of __reduce__ and has custom code for each class to construct them with the right arguments, but that seems somewhat unnecessary when we can have these __reduce__ methods here.

An alternative is to have something like __skops_init_args__ which should return what can be given to the constructor. I'd be happier with that than changing the __reduce__ here, but it would bring more discussions as whether or not we want to have skops-specific code here and if it's the right thing to do. So @glemaitre suggested we simply implement these __reduce__ methods and it won't bring any discussions since we've had them for the old loss objects and we're bringing the same thing here.

Does this help @lesteve ?

adrinjalali · 2024-11-28T09:51:35Z

sklearn/_loss/_loss.pyx.tp

@@ -818,6 +818,9 @@ cdef inline double_pair cgrad_hess_exponential(
 cdef class CyLossFunction:
    """Base class for convex loss functions."""

+    def __reduce__(self):


this is needed since not all losses have a parameter, which means they won't have our custom __reduce__ generated for them. Note the {{if param is not None}} around the __reduce__ implementation bellow.

adrinjalali · 2024-11-28T09:59:56Z

The change seems small enough but at the same time I find it weird that skops care about this, the main reason being probably because I know close to nothing about skops ...

We persist objects pickle-free in skops.io, so the pickle machinery becomes very relevant there BTW 😁

jeremiedbb

The change looks minimal and i'm fine with it if it makes things easier for skops.

I'm wondering if we can add some tests in scikit-learn directly to not rely on skops only.

I also think it deserves a changelog entry since it's a regression introduced in 1.6

adrinjalali · 2024-11-28T13:03:05Z

This isn't really a regression for people who don't use skops and use pickle. This doesn't change pickle-ability of objects, which is already tested in our common tests. Also, this is a change in private modules which users shouldn't be directly accessing anyway. So I don't think it warrants a changelog entry.

lesteve · 2024-11-28T13:19:46Z

Thanks for the explanation @adrinjalali, it does help! Edit: sorry pinged wrong Adrin originally ...

I am fine with the change to make life easier for skops for 1.6, even if an ideal world skops could deal with the default Cython __reduce__ implementation but I have to say I don't understand the 8221022 and why is useful at all in the following snippet so 🤷

In [4]: from sklearn._loss._loss import CyHalfTweedieLossIdentity

In [5]: loss = CyHalfTweedieLossIdentity(2)

In [6]: loss.__reduce__()
Out[6]: 
(<cyfunction __pyx_unpickle_CyHalfTweedieLossIdentity at 0x7b4424573e00>,
 (_loss.CyHalfTweedieLossIdentity, 8221022, (2.0,)))

I double-checked and indeed the __reduce__ were implemented in sgd before, see

scikit-learn/sklearn/linear_model/_sgd_fast.pyx.tp

Lines 228 to 229 in 6e90391

    
           def __reduce__(self): 
        
               return Hinge, (self.threshold,)

. I am going to wild-guess because otherwise cython class instances were not serializable at the time, see e.g. this very old #946 (comment). I found also the Cython doc that seems to imply that now there is a default Cython __reduce__ implementation in some cases.

I guess one of my hidden question was whether there was something scikit-learn specific but looking at https://github.com/skops-dev/skops/blob/fb356747ff465658ec4f8359b61fda895206d17f/skops/io/_numpy.py for example I guess that you have some custom logic for other types like numpy or scipy types.

glemaitre

LGTM

MNT add __reduce__ to loss objects

de0b93a

adrinjalali added this to the 1.6 milestone Nov 27, 2024

github-actions bot added the cython label Nov 27, 2024

adrinjalali mentioned this pull request Nov 27, 2024

MNT Sklearn1.6 compatibility skops-dev/skops#447

Merged

lesteve reviewed Nov 28, 2024

View reviewed changes

adrinjalali commented Nov 28, 2024

View reviewed changes

jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Nov 28, 2024

jeremiedbb reviewed Nov 28, 2024

View reviewed changes

lesteve approved these changes Nov 28, 2024

View reviewed changes

glemaitre approved these changes Nov 29, 2024

View reviewed changes

glemaitre merged commit 0d9fb78 into scikit-learn:main Nov 29, 2024
38 checks passed

adrinjalali deleted the loss/reduce branch November 29, 2024 11:25

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 4, 2024

MNT add __reduce__ to loss objects (scikit-learn#30356)

7ef8c3d

jeremiedbb pushed a commit that referenced this pull request Dec 6, 2024

MNT add __reduce__ to loss objects (#30356)

26384be

virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024

MNT add __reduce__ to loss objects (scikit-learn#30356)

3455768

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT add reduce to loss objects #30356

MNT add reduce to loss objects #30356

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MNT add __reduce__ to loss objects #30356

MNT add __reduce__ to loss objects #30356

Uh oh!

Conversation

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MNT add reduce to loss objects #30356

MNT add reduce to loss objects #30356