8000 [WIP] Make SGD support Cython fused types by yenchenlin · Pull Request #6889 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] Make SGD support Cython fused types #6889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yenchenlin
Copy link
Contributor

Current implementation of SGDClassifier and SGDRegressor doesn't allow the user to specify the dtype they want and may cause a MemoryError when the user wants to train the model since it will try to copy input data into np.float64. (See #5776).

To address the problem, this PR wants to make SGD related algorithms in scikit-learn supports Cython fused types.

@jnothman
Copy link
Member

What's interesting about this error is that it gets triggered for Classification but not Regression. I'll try to think about it or look closely at the code. What I'd like you to experiment with is creating a minimum example code snippet that fails to compile.

@agramfort
Copy link
Member

would be neat to have this

@yenchenlin for looking into it

@yenchenlin
Copy link
Contributor Author
yenchenlin commented Jun 17, 2016

Hello @agramfort, @jnothman and @MechCoder,
I doubt that there are some bugs in Cython to make Fused Types and Inheritance work together.

I've created a small example here,
you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

Any idea?

@MechCoder
Copy link
Member

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together. See https://groups.google.com/forum/#!topic/cython-users/HPgmlIMabz0

@MechCoder
Copy link
Member

Can someone come up with a better idea other than duplication of loss and dloss methods in LossFunction to handle float and double separately?

@jnothman
Copy link
Member
jnothman commented Jun 20, 2016

I've created a small example here, you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

It's interesting that it works with floating in the pyx; only when you add the pxd it breaks.

@jnothman
Copy link
Member

Ah, no. It fails at C compilation with just the pyx.

@MechCoder
Copy link
Member
MechCoder commented Jun 29, 2016

@yenchenlin @jnothman What do you think of this approach?

cdef class LossFunction:
    cdef float float_loss(self, float p, float y) nogil:
        return 0.
    cdef double double_loss(self, double p, double y) nogil:
        return 0.   

cdef class DerivedLossFunction(LossFunction):

    cdef floating _loss(floating p, floating y) nogil:
        if floating is float:
           ....
        else:
            ....       

    cdef float float_loss(self, float p, float y) nogil:
        return _loss(p, y)

    cdef double double_loss(self, double p, double y) nogil:
        return _loss(p, y)

And then in plain_sgd and elsewhere call the required methods according to the dtype.

(Of course, this means the method signatures are duplicated twice, but SGD is commonly used and hence it might be worth it)

@jnothman
Copy link
Member

As I pointed out in hangouts, I don't think we need fused type support for the loss functions. We need fused type support for X as represented by the SequentialDataset object. I'm sure that'll have its own problems, but I think this PR has led us up the garden path.

@MechCoder
Copy link
Member

Also, do we really care that p and y are not upcasted to double?

@jnothman
Copy link
Member

Our comments overlapped, but the implication is: no, we don't care.

@MechCoder
Copy link
Member

@yenchenlin Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class? No worries, if you are currently busy with coordinate descent. We can postpone this to after that.

@jnothman
Copy link
Member

Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class?

I suspect we'll find we need to use a templated def cppclass SequentialDataset[T] and instantiate for the appropriate type. (If that doesn't work out it might be changed to using void *s and an element size argument...?)

@yenchenlin
Copy link
Contributor Author

@MechCoder sorry to see this now.

Will do!

@rth
Copy link
Member
rth commented Apr 20, 2018

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together.

I wonder if the situation improved since: this was with Cython ~ v0.20 (also confirmed in v0.23dev) and we are now at v0.28..

Update (2019): It didn't.

@jjerphan
Copy link
Member
jjerphan commented Jul 6, 2021

To follow-up with @yenchenlin's experiments:

I doubt that there are some bugs in Cython to make Fused Types and Inheritance work together.

I've created a small example here,
you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

Any idea?

I have been able to compile your project after having changed doubleto floating using Cython 0.29.23.

Still I get a similar problem when recompiling this branch after rebasing on main:

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
        cdef floating loss(self, floating p, floating y) nogil
        cdef floating dloss(self, floating p, floating y) nogil


    cdef class Classification(LossFunction):
        cdef floating loss(self, floating p, floating y) nogil
                         ^
    ------------------------------------------------------------

    sklearn/linear_model/_sgd_fast.pxd:17:22: '__pyx_fuse_0loss' redeclared

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    """Helper to load LossFunction from sgd_fast.pyx to sag_fast.pyx"""

    from cython cimport floating

    cdef class LossFunction:
        cdef floating loss(self, floating p, floating y) nogil
                         ^
    ------------------------------------------------------------

    sklearn/linear_model/_sgd_fast.pxd:7:22: Previous declaration is here

To follow up with @rth's comment:

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together.

I wonder if the situation improved since: this was with Cython ~ v0.20 (also confirmed in v0.23dev) and we are now at v0.28..

Update (2019): It didn't.

Some problems might have been addressed in the latest release, for instance the ones of the thread mentioned by @MechCoder.

@jjerphan
Copy link
Member
jjerphan commented Jul 7, 2021

Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class?

I suspect we'll find we need to use a templated def cppclass SequentialDataset[T] and instantiate for the appropriate type. (If that doesn't work out it might be changed to using void *s and an element size argument...?)

As #13243 implemented a template for SequentialDataset, is there still a need to support double and float (via fused types) for losses?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0