8000 [MRG+1] Add Huber Estimator to sklearn linear models by MechCoder · Pull Request #5291 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Add Huber Estimator to sklearn linear models #5291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 25, 2016

Conversation

MechCoder
Copy link
Member

Add robust regression model that filters outliers based on http://statweb.stanford.edu/~owen/reports/hhu.pdf

  • Add fix for random OverflowErrors.
  • Add documentation to the helper function
  • Add extensive testing
  • Add narrative docs
  • Add example
  • Support for sparse data
  • Support sample_weights

@jmschrei
Copy link
Member

HuberLoss is already implemented in ensemble/GradientBoosting.py. I feel like we need to have a more centralized loss module, rather than reimplementing them when needed. What are your thoughts?

@MechCoder
Copy link
Member Author

I agree with you. Has there been an ongoing discussion already that I've missed?

@jmschrei
Copy link
Member

#5044 has some comments about it, but no consensus has been reached yet.

@MechCoder
Copy link
Member Author

I just looked through the code in sklearn.ensemble.gradient_boosting . I do agree that the code refactoring has to be done but I'm not sure if that should block this PR.

  1. The current code in HuberLoss does not take into account regularization it looks like the alpha is not the regularization alpha, but a way to calculate the epsilon value)
  2. I can also wrap my methods with calls to HuberLoss(n_c)(y, pred) and negative_gradient(). But there is some code duplication which might be expensive while calculating the loss and the gradient.

What do you think?

@MechCoder
Copy link
Member Author

Btw, this fixes #4990

@jmschrei
Copy link
Member

If there is a way of doing it currently, it may be worth benchmarking it to see how expensive it is. If there is a significant slowdown, you should go ahead if it solves a pressing issue.

@mblondel
Copy link
Member

I feel like we need to have a more centralized loss module, rather than reimplementing them when needed. What are your thoughts?

I don't think this would be so useful. This PR uses the gradient w.r.t. linear model parameters (size is n_features). GradientBoosting uses the gradient with respect to the predictions (size is n_samples). So the code is different. The one part that could be possibly shared is computing the objective value but the regularizer is different.

@jmschrei
Copy link
Member

Okay, then!

@MechCoder MechCoder force-pushed the huber_loss branch 5 times, most recently from 6caebb4 to 096ecef Compare September 25, 2015 20:51
@dbtsai
Copy link
dbtsai commented Sep 26, 2015

I am working on robust regression for Spark's MLlib project based on Prof. Art Owen's paper, A robust hybrid of lasso and ridge regression. In MLlib/Breeze, since we don't support L-BFGS-B while the scaling factor in Eq.(6) \sigma has to be >= 0, we're going to replace it by \exp(\sigma). However, the second derivative of Huber loss is not continuous, this will cause some stability issue since L-BFGS requires it for guaranteed convergence. The workaround I'm going to implement will be Pseudo-Huber loss function which can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees.

BTW, in robust regression, the scaling factor \sigma has to be estimated as well, and this is \epsilon in your case. This value can not be a constant. Imagine that, when the optimization is just started with some initial condition, if the initial guess is not good, then most of the training instances will be treated as outliners. As a result, \epsilon will be larger, but will be one of the parameters that will be estimated. See the details in Prof. Art Owen's paper in section 4. Thanks.

@MechCoder
Copy link
Member Author

Thanks for the comment and the link to the paper. (And it comes at a time when my benchmarks weren't looking too great)

Previously I was using grid search to find out the optimal value of epsilon, but it always corresponded to the lowest value of epsilon. (i,e assuming both X and y are centered and scaled)

To describe the paper in a short manner

  1. It seems that the epsilon over here corresponds more to the the parameter M in the paper which is said to be fixed at 1.35.
  2. In addition to that the gradient term y - X*w -c is scaled down by a factor sigma which makes the algorithm scaling independent.
  3. And since the new function as described in 8 is jointly convex, we could optimize sigma together with the coefficients right?

@dbtsai
Copy link
dbtsai commented Sep 28, 2015

You are right. But you may want to replace \sigma to \exp(\alpha) so you don't need to have the condition that \sigma > 0. In theory, the hssian is not continuous, so LBFGS may not work well. But I don't know the exact impact on this. We may need to do some benchmark on this.

@dbtsai
Copy link
dbtsai commented Sep 28, 2015

Also, for Pseudo-Huber loss, there is no proof that it will be jointly convex with \sigma. Although I guess it will if we go through the proof.

@MechCoder
Copy link
Member Author

Great I'll try two things.

Change the present loss function to accomodate minimizing sigma. (and after that)
And after that I can try the pseudo Huber loss to check if there is any noticeable change in convergence (etc).

@dbtsai
Copy link
dbtsai commented Sep 28, 2015

Sounds great. Let me know the result, so I can learn from you when I implement this in Spark. Thanks.

@MechCoder
Copy link
Member Author

@dbtsai I've made changes to the loss function, but I'm not getting good results. Could you please verify if the loss function is right?

@MechCoder
Copy link
Member Author

Good results meaning that this is the plot that I generated from the coefficients :/

figure_1

The red line is the one from the HuberRegressor and the green is the RidgeRegression. As you can clearly see that it is not what it is supposed to look like.

@dbtsai
Copy link
dbtsai commented Sep 29, 2015

@MechCoder I will compare the note I have in my home tonight. What do you mean you don't get good result? How do you test it? Also, when \epsilon is large, does it converge to normal LiR? Thanks.

@dbtsai
Copy link
dbtsai commented Sep 29, 2015

Can you try to make \epsilon very large and see if you can reproduce the RidgeRegression?

@MechCoder
Copy link
Member Author

I tried that as well, but it seems that epsilon has almost no effect (for both very high and very low values ) since

since |(y - X'w) / exp(sigma)| < M

|y - X'w| < M*exp(sigma) . So the limit will change for every iteration the loss function is called no?

Or am I understanding it wrong?

@MechCoder
Copy link
Member Author

Just in case you are interested with the plot generation

(https://gist.github.com/MechCoder/8205f0fce4395a9ab907)

@MechCoder
Copy link
Member Author

oops seems like I made a mistake. Just a second.

@MechCoder MechCoder force-pushed the huber_loss branch 2 times, most recently from cae9e26 to 8eee69e Compare September 29, 2015 20:24
@dbtsai
Copy link
dbtsai commented Sep 29, 2015

Here is the note I compute dL/d\sigma Let's compare if we get the same formula.

img_0271

@MechCoder
Copy link
Member Author

@dbtsai I modified the loss function just before your comment :P . I have commented out the gradient out for now and I set approx_grad=True in fmin_l_bfgs_b.

I just wanted to have an approximate idea if the loss function is correct. After making the changes to the loss function (it should be clearer now), I am able to replicate the behavior of ridge for high values of epsilon. (note that the lines coincide)

figure_1


Parameters
----------
w: ndarray, shape (n_features + 1,) or (n_features + 2,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: space in front of : for consistency

@amueller
Copy link
Member

lgtm apart from nitpicks. Maybe adding an attribute that stores which points are outliers on the training set would be interesting.

@amueller
Copy link
Member

feel free to merge once tests pass. we can always add a an attribute for the outliers.
we do need an entry to whatsnew and a "versionadded" tag.

@MechCoder
Copy link
Member Author

I have 3 more minor todos

  1. Add a property for the outliers
  2. Check the gradient doc rendering
  3. Modify the documentation of the example.

Will address in a while

@agramfort
Copy link
Member
agramfort commented Feb 25, 2016 via email

@MechCoder
Copy link
Member Author

@agramfort done!!

Add gradient calculation in _huber_loss_and_gradient

Add tests to check the correctness of the loss and gradient

Fix for old scipy

Add parameter sigma for robust linear regression

Add gradient formula to robust _huber_loss_and_gradient

Add fit_intercept option and fix tests

Add docs to HuberRegressor and the helper functions

Add example demonstrating ridge_regression vs huber_regression

Add sample_weight implementation

Add scaling invariant huber test

Remove exp and add bounds to fmin_l_bfgs_b

Add sparse data support

Add more tests and refactoring of code

Add narrative docs

review huber regressor

Minor additions to docs and tests

Minor fixes that deals with dealing with NaN values in targets
and old verions of SciPy and NumPy

Add HuberRegressor to robust estimator

Refactored computation of gradient and make docs render properly

Temp

Remove float64 dtype conversion

trivial optimizations and add a note about R

Remove sample_weights special_casing

address @amueller comments
@MechCoder
Copy link
Member Author

Tests pass!! Merging with master :D

MechCoder added a commit that referenced this pull request Feb 25, 2016
[MRG+1] Add Huber Estimator to sklearn linear models
@MechCoder MechCoder merged commit 540c7c6 into scikit-learn:master Feb 25, 2016
@MechCoder MechCoder deleted the huber_loss branch February 25, 2016 23:44
@agramfort
Copy link
Member

great work @MechCoder !

@amueller
Copy link
Member

thanks @MechCoder ! 🍻

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Feb 26, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0