-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Add Huber Estimator to sklearn linear models #5291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HuberLoss is already implemented in ensemble/GradientBoosting.py. I feel like we need to have a more centralized loss module, rather than reimplementing them when needed. What are your thoughts? |
I agree with you. Has there been an ongoing discussion already that I've missed? |
#5044 has some comments about it, but no consensus has been reached yet. |
I just looked through the code in
What do you think? |
Btw, this fixes #4990 |
If there is a way of doing it currently, it may be worth benchmarking it to see how expensive it is. If there is a significant slowdown, you should go ahead if it solves a pressing issue. |
I don't think this would be so useful. This PR uses the gradient w.r.t. linear model parameters (size is n_features). GradientBoosting uses the gradient with respect to the predictions (size is n_samples). So the code is different. The one part that could be possibly shared is computing the objective value but the regularizer is different. |
Okay, then! |
6caebb4
to
096ecef
Compare
I am working on robust regression for Spark's MLlib project based on Prof. Art Owen's paper, A robust hybrid of lasso and ridge regression. In MLlib/Breeze, since we don't support L-BFGS-B while the scaling factor in Eq.(6) \sigma has to be >= 0, we're going to replace it by \exp(\sigma). However, the second derivative of Huber loss is not continuous, this will cause some stability issue since L-BFGS requires it for guaranteed convergence. The workaround I'm going to implement will be Pseudo-Huber loss function which can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. BTW, in robust regression, the scaling factor \sigma has to be estimated as well, and this is \epsilon in your case. This value can not be a constant. Imagine that, when the optimization is just started with some initial condition, if the initial guess is not good, then most of the training instances will be treated as outliners. As a result, \epsilon will be larger, but will be one of the parameters that will be estimated. See the details in Prof. Art Owen's paper in section 4. Thanks. |
Thanks for the comment and the link to the paper. (And it comes at a time when my benchmarks weren't looking too great) Previously I was using grid search to find out the optimal value of epsilon, but it always corresponded to the lowest value of epsilon. (i,e assuming both X and y are centered and scaled) To describe the paper in a short manner
|
You are right. But you may want to replace \sigma to \exp(\alpha) so you don't need to have the condition that \sigma > 0. In theory, the hssian is not continuous, so LBFGS may not work well. But I don't know the exact impact on this. We may need to do some benchmark on this. |
Also, for Pseudo-Huber loss, there is no proof that it will be jointly convex with \sigma. Although I guess it will if we go through the proof. |
Great I'll try two things. Change the present loss function to accomodate minimizing sigma. (and after that) |
Sounds great. Let me know the result, so I can learn from you when I implement this in Spark. Thanks. |
@dbtsai I've made changes to the loss function, but I'm not getting good results. Could you please verify if the loss function is right? |
@MechCoder I will compare the note I have in my home tonight. What do you mean you don't get good result? How do you test it? Also, when |
Can you try to make |
I tried that as well, but it seems that epsilon has almost no effect (for both very high and very low values ) since since
Or am I understanding it wrong? |
Just in case you are interested with the plot generation |
oops seems like I made a mistake. Just a second. |
cae9e26
to
8eee69e
Compare
@dbtsai I modified the loss function just before your comment :P . I have commented out the gradient out for now and I set I just wanted to have an approximate idea if the loss function is correct. After making the changes to the loss function (it should be clearer now), I am able to replicate the behavior of ridge for high values of epsilon. (note that the lines coincide) |
|
||
Parameters | ||
---------- | ||
w: ndarray, shape (n_features + 1,) or (n_features + 2,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: space in front of :
for consistency
lgtm apart from nitpicks. Maybe adding an attribute that stores which points are outliers on the training set would be interesting. |
feel free to merge once tests pass. we can always add a an attribute for the outliers. |
I have 3 more minor todos
Will address in a while |
ping us when we should look again
|
3a3a230
to
be913c9
Compare
@agramfort done!! |
Add gradient calculation in _huber_loss_and_gradient Add tests to check the correctness of the loss and gradient Fix for old scipy Add parameter sigma for robust linear regression Add gradient formula to robust _huber_loss_and_gradient Add fit_intercept option and fix tests Add docs to HuberRegressor and the helper functions Add example demonstrating ridge_regression vs huber_regression Add sample_weight implementation Add scaling invariant huber test Remove exp and add bounds to fmin_l_bfgs_b Add sparse data support Add more tests and refactoring of code Add narrative docs review huber regressor Minor additions to docs and tests Minor fixes that deals with dealing with NaN values in targets and old verions of SciPy and NumPy Add HuberRegressor to robust estimator Refactored computation of gradient and make docs render properly Temp Remove float64 dtype conversion trivial optimizations and add a note about R Remove sample_weights special_casing address @amueller comments
be913c9
to
7e79adc
Compare
Tests pass!! Merging with master :D |
[MRG+1] Add Huber Estimator to sklearn linear models
great work @MechCoder ! |
thanks @MechCoder ! 🍻 |
thanks @MechCoder ! 🍻
Yes. Awesome!!
|
Add robust regression model that filters outliers based on http://statweb.stanford.edu/~owen/reports/hhu.pdf