-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT add __reduce__ to loss objects #30356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The change seems small enough but at the same time I find it weird that Can you post a snippet reproducing the issue and have a short summary about So far my understanding is that # I am assuming that you get the sklearn_cls and reduce_info from skops persisted file somehow
reduce_info = recontruct_func, *args
sklearn_cls(args) |
@@ -818,6 +818,9 @@ cdef inline double_pair cgrad_hess_exponential( | |||
cdef class CyLossFunction: | |||
"""Base class for convex loss functions.""" | |||
|
|||
def __reduce__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed actually? It seems to me like CyLossFunction
is only used as a base class and you are not not really meant to create an instance of CyLossFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed since not all losses have a parameter, which means they won't have our custom __reduce__
generated for them. Note the {{if param is not None}}
around the __reduce__
implementation bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pickling works by asking objects how they should be constructed, which is done via __reduce__
. In many cases, there is no need for a custom constructor function, since objects can simply be constructed by calling their constructors. However, this doesn't generalise if the object does some fancy things in their __init__
method. Therefore tools such as cython work around it by creating a constructor function which is returned by __reduce__
.
However, that beats the purpose of not calling random functions not clearly related to the object to be created when constructing objects.
Now when it comes to skops
, if not
8000
caring about this, I'm not sure then how we're supposed to reconstruct the objects. We could have custom code which completely ignores the output of __reduce__
and has custom code for each class to construct them with the right arguments, but that seems somewhat unnecessary when we can have these __reduce__
methods here.
An alternative is to have something like __skops_init_args__
which should return what can be given to the constructor. I'd be happier with that than changing the __reduce__
here, but it would bring more discussions as whether or not we want to have skops-specific code here and if it's the right thing to do. So @glemaitre suggested we simply implement these __reduce__
methods and it won't bring any discussions since we've had them for the old loss objects and we're bringing the same thing here.
Does this help @lesteve ?
@@ -818,6 +818,9 @@ cdef inline double_pair cgrad_hess_exponential( | |||
cdef class CyLossFunction: | |||
"""Base class for convex loss functions.""" | |||
|
|||
def __reduce__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed since not all losses have a parameter, which means they won't have our custom __reduce__
generated for them. Note the {{if param is not None}}
around the __reduce__
implementation bellow.
We persist objects pickle-free in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks minimal and i'm fine with it if it makes things easier for skops.
I'm wondering if we can add some tests in scikit-learn directly to not rely on skops only.
I also think it deserves a changelog entry since it's a regression introduced in 1.6
This isn't really a regression for people who don't use skops and use pickle. This doesn't change pickle-ability of objects, which is already tested in our common tests. Also, this is a change in private modules which users shouldn't be directly accessing anyway. So I don't think it warrants a changelog entry. |
Thanks for the explanation @adrinjalali, it does help! Edit: sorry pinged wrong Adrin originally ... I am fine with the change to make life easier for skops for 1.6, even if an ideal world skops could deal with the default Cython
I double-checked and indeed the scikit-learn/sklearn/linear_model/_sgd_fast.pyx.tp Lines 228 to 229 in 6e90391
__reduce__ implementation in some cases.
I guess one of my hidden question was whether there was something scikit-learn specific but looking at https://github.com/skops-dev/skops/blob/fb356747ff465658ec4f8359b61fda895206d17f/skops/io/_numpy.py for example I guess that you have some custom logic for other types like numpy or scipy types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This adds
__reduce__
with the same format as it was already included insgd_fast
loss objects.to the other simpler format:
This doesn't change anything wrt
pickle
, but it makesskops.io
be able to save/load these objects.cc @lesteve @glemaitre
Since some objects are removed from
sgd_fast
in this release, and the errors are triggered inskops
for this1.6
release, it'd be nice to backport this to the release.