[MRG+1] Add Huber Estimator to sklearn linear models #5291

MechCoder · 2015-09-21T00:29:19Z

Add robust regression model that filters outliers based on http://statweb.stanford.edu/~owen/reports/hhu.pdf

jmschrei · 2015-09-21T00:41:56Z

HuberLoss is already implemented in ensemble/GradientBoosting.py. I feel like we need to have a more centralized loss module, rather than reimplementing them when needed. What are your thoughts?

MechCoder · 2015-09-21T02:01:54Z

I agree with you. Has there been an ongoing discussion already that I've missed?

jmschrei · 2015-09-21T02:10:59Z

#5044 has some comments about it, but no consensus has been reached yet.

MechCoder · 2015-09-21T20:59:48Z

I just looked through the code in sklearn.ensemble.gradient_boosting . I do agree that the code refactoring has to be done but I'm not sure if that should block this PR.

The current code in HuberLoss does not take into account regularization it looks like the alpha is not the regularization alpha, but a way to calculate the epsilon value)
I can also wrap my methods with calls to HuberLoss(n_c)(y, pred) and negative_gradient(). But there is some code duplication which might be expensive while calculating the loss and the gradient.

What do you think?

MechCoder · 2015-09-21T21:03:27Z

Btw, this fixes #4990

jmschrei · 2015-09-21T22:12:46Z

If there is a way of doing it currently, it may be worth benchmarking it to see how expensive it is. If there is a significant slowdown, you should go ahead if it solves a pressing issue.

mblondel · 2015-09-22T13:57:04Z

I feel like we need to have a more centralized loss module, rather than reimplementing them when needed. What are your thoughts?

I don't think this would be so useful. This PR uses the gradient w.r.t. linear model parameters (size is n_features). GradientBoosting uses the gradient with respect to the predictions (size is n_samples). So the code is different. The one part that could be possibly shared is computing the objective value but the regularizer is different.

jmschrei · 2015-09-22T18:33:14Z

Okay, then!

dbtsai · 2015-09-26T23:52:40Z

I am working on robust regression for Spark's MLlib project based on Prof. Art Owen's paper, A robust hybrid of lasso and ridge regression. In MLlib/Breeze, since we don't support L-BFGS-B while the scaling factor in Eq.(6) \sigma has to be >= 0, we're going to replace it by \exp(\sigma). However, the second derivative of Huber loss is not continuous, this will cause some stability issue since L-BFGS requires it for guaranteed convergence. The workaround I'm going to implement will be Pseudo-Huber loss function which can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees.

BTW, in robust regression, the scaling factor \sigma has to be estimated as well, and this is \epsilon in your case. This value can not be a constant. Imagine that, when the optimization is just started with some initial condition, if the initial guess is not good, then most of the training instances will be treated as outliners. As a result, \epsilon will be larger, but will be one of the parameters that will be estimated. See the details in Prof. Art Owen's paper in section 4. Thanks.

MechCoder · 2015-09-28T19:09:06Z

Thanks for the comment and the link to the paper. (And it comes at a time when my benchmarks weren't looking too great)

Previously I was using grid search to find out the optimal value of epsilon, but it always corresponded to the lowest value of epsilon. (i,e assuming both X and y are centered and scaled)

To describe the paper in a short manner

It seems that the epsilon over here corresponds more to the the parameter M in the paper which is said to be fixed at 1.35.
In addition to that the gradient term y - X*w -c is scaled down by a factor sigma which makes the algorithm scaling independent.
And since the new function as described in 8 is jointly convex, we could optimize sigma together with the coefficients right?

dbtsai · 2015-09-28T19:25:38Z

You are right. But you may want to replace \sigma to \exp(\alpha) so you don't need to have the condition that \sigma > 0. In theory, the hssian is not continuous, so LBFGS may not work well. But I don't know the exact impact on this. We may need to do some benchmark on this.

dbtsai · 2015-09-28T19:27:47Z

Also, for Pseudo-Huber loss, there is no proof that it will be jointly convex with \sigma. Although I guess it will if we go through the proof.

MechCoder · 2015-09-28T20:55:27Z

Great I'll try two things.

Change the present loss function to accomodate minimizing sigma. (and after that)
And after that I can try the pseudo Huber loss to check if there is any noticeable change in convergence (etc).

dbtsai · 2015-09-28T21:25:10Z

Sounds great. Let me know the result, so I can learn from you when I implement this in Spark. Thanks.

MechCoder · 2015-09-29T18:36:34Z

@dbtsai I've made changes to the loss function, but I'm not getting good results. Could you please verify if the loss function is right?

MechCoder · 2015-09-29T18:40:16Z

Good results meaning that this is the plot that I generated from the coefficients :/

The red line is the one from the HuberRegressor and the green is the RidgeRegression. As you can clearly see that it is not what it is supposed to look like.

dbtsai · 2015-09-29T18:40:17Z

@MechCoder I will compare the note I have in my home tonight. What do you mean you don't get good result? How do you test it? Also, when \epsilon is large, does it converge to normal LiR? Thanks.

dbtsai · 2015-09-29T18:41:30Z

Can you try to make \epsilon very large and see if you can reproduce the RidgeRegression?

MechCoder · 2015-09-29T18:46:34Z

I tried that as well, but it seems that epsilon has almost no effect (for both very high and very low values ) since

since |(y - X'w) / exp(sigma)| < M

|y - X'w| < M*exp(sigma) . So the limit will change for every iteration the loss function is called no?

Or am I understanding it wrong?

MechCoder · 2015-09-29T19:04:51Z

Just in case you are interested with the plot generation

(https://gist.github.com/MechCoder/8205f0fce4395a9ab907)

MechCoder · 2015-09-29T19:30:50Z

oops seems like I made a mistake. Just a second.

dbtsai · 2015-09-29T20:26:17Z

Here is the note I compute dL/d\sigma Let's compare if we get the same formula.

MechCoder · 2015-09-29T20:29:55Z

@dbtsai I modified the loss function just before your comment :P . I have commented out the gradient out for now and I set approx_grad=True in fmin_l_bfgs_b.

I just wanted to have an approximate idea if the loss function is correct. After making the changes to the loss function (it should be clearer now), I am able to replicate the behavior of ridge for high values of epsilon. (note that the lines coincide)

amueller · 2016-02-25T17:31:40Z

sklearn/linear_model/huber.py

+
+    Parameters
+    ----------
+    w: ndarray, shape (n_features + 1,) or (n_features + 2,)


nitpick: space in front of : for consistency

amueller · 2016-02-25T17:49:44Z

lgtm apart from nitpicks. Maybe adding an attribute that stores which points are outliers on the training set would be interesting.

amueller · 2016-02-25T20:01:16Z

feel free to merge once tests pass. we can always add a an attribute for the outliers.
we do need an entry to whatsnew and a "versionadded" tag.

MechCoder · 2016-02-25T20:11:03Z

I have 3 more minor todos

Add a property for the outliers
Check the gradient doc rendering
Modify the documentation of the example.

Will address in a while

agramfort · 2016-02-25T20:31:57Z

ping us when we should look again

MechCoder · 2016-02-25T21:27:20Z

@agramfort done!!

@amueller

Add gradient calculation in _huber_loss_and_gradient Add tests to check the correctness of the loss and gradient Fix for old scipy Add parameter sigma for robust linear regression Add gradient formula to robust _huber_loss_and_gradient Add fit_intercept option and fix tests Add docs to HuberRegressor and the helper functions Add example demonstrating ridge_regression vs huber_regression Add sample_weight implementation Add scaling invariant huber test Remove exp and add bounds to fmin_l_bfgs_b Add sparse data support Add more tests and refactoring of code Add narrative docs review huber regressor Minor additions to docs and tests Minor fixes that deals with dealing with NaN values in targets and old verions of SciPy and NumPy Add HuberRegressor to robust estimator Refactored computation of gradient and make docs render properly Temp Remove float64 dtype conversion trivial optimizations and add a note about R Remove sample_weights special_casing address @amueller comments

MechCoder · 2016-02-25T23:43:52Z

Tests pass!! Merging with master :D

[MRG+1] Add Huber Estimator to sklearn linear models

agramfort · 2016-02-26T09:17:51Z

great work @MechCoder !

amueller · 2016-02-26T15:47:25Z

thanks @MechCoder ! 🍻

GaelVaroquaux · 2016-02-26T15:47:51Z

thanks @MechCoder ! 🍻

Yes. Awesome!!

MechCoder force-pushed the huber_loss branch 5 times, most recently from 6caebb4 to 096ecef Compare September 25, 2015 20:51

MechCoder force-pushed the huber_loss branch 2 times, most recently from cae9e26 to 8eee69e Compare September 29, 2015 20:24

amueller reviewed Feb 25, 2016
View reviewed changes

MechCoder force-pushed the huber_loss branch from 3a3a230 to be913c9 Compare February 25, 2016 21:25

MechCoder force-pushed the huber_loss branch from be913c9 to 7e79adc Compare February 25, 2016 21:31

MechCoder added a commit that referenced this pull request Feb 25, 2016

Merge pull request #5291 from MechCoder/huber_loss

540c7c6

[MRG+1] Add Huber Estimator to sklearn linear models

MechCoder merged commit 540c7c6 into scikit-learn:master Feb 25, 2016

MechCoder deleted the huber_loss branch February 25, 2016 23:44

dbtsai mentioned this pull request Jul 4, 2016

[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator apache/spark#8013

Closed

amueller mentioned this pull request Sep 13, 2016

Huber loss regression in linear models? #4990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Add Huber Estimator to sklearn linear models #5291

[MRG+1] Add Huber Estimator to sklearn linear models #5291

[MRG+1] Add Huber Estimator to sklearn linear models #5291

[MRG+1] Add Huber Estimator to sklearn linear models #5291

Conversation

Choose a reason for hiding this comment