-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Add Spline Transformer #17027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can I work on this, or it needs some further discussion? |
@Reksbril Thanks for volunteering. It needs discussion first. |
Any comment from a core developer is very welcome. Especially regarding the question whether it is worth to start a PR. |
I'm not very familiar with the topic, so I can't comment on practical considerations. I have used splines for interpolation in the past but not so much in the ML context. Overall it seems to be a fairly standard and well established approach that would pass the inclusion criteria. I'm a bit surprised spline regression isn't more mainstream in the python ML ecosystem. In particular, I can't find any earlier issues in scikit-learn about this. If we ever want to go beyond 1d-splines for multi-dimensional data, the placement and number of knots seems less straightforward as discussed e.g. in this SO answer. Also the ESL books says on this topic,
MARS is implemented in https://github.com/scikit-learn-contrib/py-earth BTW. Another question I have is why B-splines and say not smoothing splines, which would have fewer hyper-parameters? It would be nice to have a few examples maybe using patsy, on how this would compare for linear models and multi-dimensional data e.g. with Maybe @agramfort @ogrisel would have other comments? |
Me and @amueller have been interested in splines in the context of GAMs which would start with 1-D splines for each feature. I have been planning on pushing this forward for scikit-learn. |
B-splines are just a numerical convenient 1-D basis for splines and available in scipy. You can represent a smoothing spline = natural cubic spline in form of a B-spline. For scikit-learn, it would be nice to have splines available at all. Penalties are more tricky due to API constraints (SLEP006 sample properties and maybe also feature names rings a bell) as the
Nice to hear:smirk: |
@thomasjpfan I like splines not so much for their interpretability, but for their flexibility in modelling continuous features in a smooth and controllable way (good mix between manual and automatic). As a counterexample, the fashionable and trendy decision tree based methods have discontinuities all over the place. Depending on the application, this may be a concern. |
I think a lot of the feature-based machine learning community could learn
more about spline bases in predictive modelling, and it would be valuable
to have these available and discussed.
|
Great initiative - I really miss splines in scikit-learn. In practical applications, I very often work with natural cubic splines, see its options in R. They are very stable, have acceptable extrapolation properties and use astonishingly few extra parameters compared to a polynomial approximation. It would be great if one could (optionally) pass the knot positions. |
Very nice to see spline-based features made their way to scikit-learn! Today I posted a short demo for multivariate spline-based transformer able to capture correlation between features, in case this could be useful for future developments. https://gist.github.com/ecm0/fe8966f9170409cfbc4f34c919462f98 |
@ecm0 Nice to hear you like splines. |
@lorentzenchr By the way, I'm happy to help in case multivariate splines are considered in the future. |
You're welcome to open an issue to propose and motivate new functionality. But note that we have a high barrier for new features, see this FAQ section. |
@lorentzenchr I had not thought about combining PolynomialFeatures and SplineTransformer. I agree this does the same as what I did in my demo. So this is already covered by the new implementation. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the workflow you want to enable
I propose to add a
SplineTransformer
topreprocessing
. This is similiar toPolynomialFeatures
, but gives more flexibility (and numerical stability) for linear models to deal with continuous numerical features.Describe your proposed solution
Add
SplineTransformer
and internally use scipy for splines. Start with- 1-dimensional b-splines
- equidistant knots
- quantile based knots
Additional context
Patsy has an implementation of those that matches the R versions.
References
Eilers, Marx "Flexible Smoothing with B-splines and Penalties" passes the scikit-learn inclusion criteria by some margin 😏
The text was updated successfully, but these errors were encountered: