[MRG] implement least absolute deviation loss in GBDTs #13896

NicolasHug · 2019-05-16T15:19:00Z

This PR implements the least absolute deviation (or MAE) loss for GBDTs.

It's not as trivial as it seems, since the leaves values have to be updated after the tree is trained. Indeed for MAE, we train the trees to fit the gradients (or more accuratly a Newton–Raphson step), but we need them to predict the median of the residuals. See the original paper for more details.

See also the current implementation of this for the old version of GBDTs, in particular _update_terminal_region() in sklearn/ensemble/gb_losses.py

Will ping when this is ready.

This is a bit slower than LightGBM, since the leaves values update is not done in parallel here.

python benchmarks/bench_hist_gradient_boosting.py --lightgbm --problem regression --n-samples-max 5000000 --n-trees 50 --loss least_absolute_deviation

NicolasHug · 2019-05-16T15:59:54Z

I think it's ready, ping @ogrisel @glemaitre @adrinjalali since you guys are the most familiar with the code ;)

adrinjalali · 2019-06-14T16:10:46Z

sklearn/ensemble/_hist_gradient_boosting/loss.py

-            # loss
+            # If the hessians are constant, we consider they are equal to 1.
+            # - This is correct for the half LS loss
+            # - For LAD loss, hessians are actually 0, but they are always


if that's the case, doesn't it make sense to actually set them to 0? I'm kinda worried about maintainability of this.

or even have a parameter in the loss class which is the function which gives you the constant hessians, i.e. for LAD it'd be np.zeros.

IMO it's more maintainable to have the convention that constant hessians are always 1 rather than have custom constant hessians for each loss. Especially when these hessians are never used, like here.

Also, that's something that's always implicit and not explained in the papers, but hessians need to be treated as 1 (even though they're 0): in order for the leaf value computation to be an average instead of a sum.

sklearn/ensemble/_hist_gradient_boosting/loss.py

sklearn/ensemble/_hist_gradient_boosting/tests/test_loss.py

sklearn/ensemble/_hist_gradient_boosting/loss.py

glemaitre

It looks good otherwise. You will need an entry in what's new as well.

adrinjalali · 2019-06-19T10:31:05Z

I just went to check our docs, and noticed/remembered we still don't have good examples/user guide for these classes. But other than that, it seems the loss function names are not consistent with the other ensemble methods, for instance the GBR calls this LAD, I think.

adrinjalali · 2019-06-19T10:32:58Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_loss.py

    ('binary_crossentropy', 2, 1),
    ('categorical_crossentropy', 3, 3),
 ])
 @pytest.mark.skipif(Y_DTYPE != np.float64,
                    reason='Need 64 bits float precision for numerical checks')
-def test_numerical_gradients(loss, n_classes, prediction_dim):
+@pytest.mark.parametrize('seed', range(1))


^^

I can remove it, it's just convenient when testing locally

NicolasHug · 2019-06-19T12:02:12Z

we still don't have good examples/user guide for these classes.

don't worry that's still on my stack, I'm just waiting for a few more features

the loss function names are not consistent with the other ensemble methods

right, we broke name consistency with the other implementation already ('ls' vs 'least-squares', 'deviance' vs 'binary-crossentropy', etc.)

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…solute_loss_hist_gbdt

…t-learn into absolute_loss_hist_gbdt

adrinjalali

LGTM :)

…solute_loss_hist_gbdt

NicolasHug · 2019-08-02T14:48:48Z

@glemaitre you were OK with this I think, do you wanna have another look so we can merge? Thanks!

…solute_loss_hist_gbdt

NicolasHug · 2019-08-21T13:33:43Z

Ping @glemaitre ;)

Maybe @ogrisel too?

glemaitre · 2019-08-23T10:00:05Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

@@ -668,7 +672,8 @@ class HistGradientBoostingRegressor(BaseHistGradientBoosting, RegressorMixin):

    Parameters
    ----------
-    loss : {'least_squares'}, optional (default='least_squares')
+    loss : {'least_squares', 'least_absolute_deviation'}, \
+            optional (default='least_squares')


do we remove right away the optional and the parenthesis?

I'd rather keep the current docstring consistent with the other entries but I'm +1 in updating all of them in another PR

OK fine with me

glemaitre · 2019-08-23T10:07:36Z

I am still fine with the PR. But before to merge, would it make sense to add a test in test_compare_lightgbm.py to check that we get a similar result?

…solute_loss_hist_gbdt

glemaitre · 2019-08-23T15:28:20Z

and regarding the test, WDYT?

NicolasHug · 2019-08-23T15:32:01Z

I agree. It currently fails, I'm investigating why. This has to do with the initial predictions being different from lightgbm and sklearn.

NicolasHug · 2019-08-23T18:04:45Z

OK I sorted it out. Added a comment:

    # - We don't check the least_absolute_deviation loss here. This is because
    #   LightGBM's computation of the median (used for the initial value of
    #   raw_prediction) is a bit off (they'll e.g. return midpoints when there
    #   is no need to.). Since these tests only run 1 iteration, the
    #   discrepancy between the initial values leads to biggish differences in
    #   the predictions. These differences are much smaller with more
    #   iterations.

glemaitre · 2019-09-09T08:27:09Z

OK make sense. Merging then. Thanks @NicolasHug

NicolasHug added 5 commits May 16, 2019 08:45

first hack

eb35824

revert changes to grower

68832fc

some more

6bada43

some comments

f9ec2b2

update loss test

8dbbf7c

NicolasHug mentioned this pull request May 16, 2019

MAINT increase numerical gradient check tolerance to make the test stable #13885

Merged

nit

6e90e2a

NicolasHug changed the title ~~[WIP] implement least absolute deviation loss in GBDTs~~ [MRG] implement least absolute deviation loss in GBDTs May 16, 2019

Added test for coverage

d394dd6

adrinjalali reviewed Jun 14, 2019

View reviewed changes

glemaitre reviewed Jun 19, 2019

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/loss.py Outdated Show resolved Hide resolved

glemaitre reviewed Jun 19, 2019

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_loss.py Outdated Show resolved Hide resolved

glemaitre reviewed Jun 19, 2019

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/loss.py Show resolved Hide resolved

glemaitre reviewed Jun 19, 2019

View reviewed changes

adrinjalali reviewed Jun 19, 2019

View reviewed changes

NicolasHug and others added 5 commits June 19, 2019 08:14

Update sklearn/ensemble/_hist_gradient_boosting/tests/test_loss.py

0e69acd

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

addressed comments

77a86c1

Merge branch 'master' of github.com:scikit-learn/scikit-learn into ab…

6d3a71e

…solute_loss_hist_gbdt

Added what's new entry

78f9734

Merge branch 'absolute_loss_hist_gbdt' of github.com:NicolasHug/sciki…

d32ff16

…t-learn into absolute_loss_hist_gbdt

adrinjalali approved these changes Jun 19, 2019

View reviewed changes

NicolasHug added 2 commits June 21, 2019 10:05

Merge branch 'master' of github.com:scikit-learn/scikit-learn into ab…

2396298

…solute_loss_hist_gbdt

Merge branch 'master' of github.com:scikit-learn/scikit-learn into ab…

e3b01a2

…solute_loss_hist_gbdt

NicolasHug added 2 commits August 21, 2019 09:24

Merge branch 'master' of github.com:scikit-learn/scikit-learn into ab…

74bf2b6

…solute_loss_hist_gbdt

Added doc section

f113dfc

oops

5504aa7

glemaitre reviewed Aug 23, 2019

View reviewed changes

Merge branch 'master' of github.com:scikit-learn/scikit-learn into ab…

b6afd2d

…solute_loss_hist_gbdt

Added comment about why we don't compare with lightgbm

8ecc0ef

glemaitre merged commit 5cf88db into scikit-learn:master Sep 9, 2019

thomasjpfan mentioned this pull request Feb 22, 2020

ENH Support sample weights in HGBT #14696

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] implement least absolute deviation loss in GBDTs #13896

[MRG] implement least absolute deviation loss in GBDTs #13896

[MRG] implement least absolute deviation loss in GBDTs #13896

[MRG] implement least absolute deviation loss in GBDTs #13896

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment