10000 Don't allow criterion='mae' for gradient boosting estimators · Issue #18263 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Don't allow criterion='mae' for gradient boosting estimators #18263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NicolasHug opened this issue Aug 26, 2020 · 6 comments · Fixed by #18326
Closed

Don't allow criterion='mae' for gradient boosting estimators #18263

NicolasHug opened this issue Aug 26, 2020 · 6 comments · Fixed by #18326
Assignees

Comments

@NicolasHug
Copy link
Member

The MAE criterion for trees was introduced in #6667. This PR also started exposing the criterion parameter to GradientBoostingClassifier and GradientBoostingRegressor, thus allowing 'mae', 'mse', and 'friedman_mse'. Before that, the GBDTs were hardcoded to use 'friedman_mse'.

I think we should stop allowing criterion='mae' for GBDTs.

My understanding of Gradient Boosting is that the trees should be predicting gradients using a least squares criterion. If we want to minimize the absolute error, we should be using loss='lad', but the criterion used for splitting the tree nodes should still be a least-squares ('mse' or 'friedman_mse'). I think that splitting the gradients using mae isn't methodologically correct.

In his original paper, Friedman does mention the possibility to fit a tree to the residuals using an lad criterion. But never does he suggest that one could fit the trees to the gradients using lad, which is what we are currently allowing.

I ran some benchmarks on the PMLB dataset (most datasets are balanced hence accuracy is a decent measure).

image

We can see that using criterion=mae usually perfoms worse than using mse or friedman_mse, even when loss=lad. Also, criterion=mae is 60 times slower than the other criteria (see notebook for details).

Note: From the benchmarks, friedman_mse does seem to (marginally) outperform mse, so I guess keeping it as the default makes sense. CC @thomasjpfan @lorentzenchr

@lorentzenchr
Copy link
Member

My understanding of Gradient Boosting is that the trees should be predicting gradients using a least squares criterion. If we want to minimize the absolute error, we should be using loss='lad', but the criterion used for splitting the tree nodes should still be a least-squares ('mse' or 'friedman_mse'). I think that splitting the gradients using mae isn't methodologically correct.

@NicolasHug I see it the same way.

@glemaitre
Copy link
Member

Something not related to the theoretical part but rather the implementation, I would have a second look at the implementation of the weighted median for this criterion. In the gradient boosting, we initialize the weight to an array of ones even if sample_weight is None:

sample_weight = _check_sample_weight(sample_weight, X)

So we will probably fall back to the weighted implementation of the median and which choices have been made there. To be honest, I never look at this part of the code :)

@NicolasHug
Copy link
Member Author

Seems like we're OK to do this. Tagging it as Help Wanted.

For contributors

The goal here is to deprecate passing 'mae' as the criterion to both GradientBoostingClassifier and GradientBoostingRegressor. Please make sure to read our contributing guidelines about deprecation

You can comment below with "take" to get this issue assigned to you

@madhuracj
Copy link
Contributor

take

@nikhilreddybilla28
Copy link

take

@NicolasHug
Copy link
Member Author

@nikhilreddybilla28, this issue was already claimed by @madhuracj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0