-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
GBRT quantile loss terminal region update and possible feature request #4599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not sure I understand. Could you please put the diff of your changes in a gist.github.com or push them as a branch in your repo and make it more explicit which is which in the labels of your plot for each of the 4 curves (cyan, red, blue and green)? |
Sorry for not being clearer, I don't have any code that performs the feature I'm asking. Temporarily I just made a new loss function [1] which would have the desired effect. I'll try and explain the plot better. I had some private data for an energy use forecasting problem. I plotted (.95 - .05) percentiles and (.8 - .2) percentiles predicted for both the existing loss function and [1].
With the existing code I was having problems with predicted quantiles being ordered properly (i.e. 20th percentile being less than the median) and having the expected number of samples falling in the quantile ranges. I hope that helps, thanks for your time. |
thanks for looking into this @jwkvam . It seems that the quantile loss needs some closer attention (more than i can dedicate currently). |
The loss in [1] seems to be the same as implemented in #924, but y_pred[:, k] += learning_rate * tree.predict(X).ravel() In ESL 2nd Ed, Algoritm 10.3 (p.361) instructs to find terminal regions in trees by a squared error criterion on the gradients, but then to set the value of terminal nodes according to the minimal loss in that region, which is the quantile in our case (and not the tree predicting the gradient). |
I have been playing around with the quantile loss implementation in GBRT. I have had somewhat mixed results. In the plot below are the differences between the .05 and .95 quantiles and between the .2 and .8 quantiles. The jagged plots are with using the default algorithm. The smooth plots just replace the
{,_}update_terminal_region{s,}()
functions with the ones in theLeastSquaresError
class. I'm getting better results with the new quantiles i.e. ~90% of my predictions fall within the .05-.95 quantiles and ~60% fall within the .2-.8 quantiles. With the default terminal updates, with my dataset I'm getting about 5-10% lower.I'm wondering if it makes sense to make this an option, i.e. a boolean to just use the MSE estimates of the leaves in the tree to update the prediction values. In other words if the more sophisticated steps don't happen to work as well as the default one.
I realize this isn't probably a very strong argument yet but I wanted to see how others felt.
This may be related to #4210?
cc @pprett @glouppe
The text was updated successfully, but these errors were encountered: