8000 Store the OOB Loss for `GradientBoostingClassifier` · Issue #23400 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Store the OOB Loss for GradientBoostingClassifier #23400
Closed
@multimeric

Description

@multimeric

Describe the workflow you want to enable

Currently the only OOB-related performance metric we store on GradientBoostingClassifier is oob_improvement_, which is an array of OOB loss decreases per iteration. However, it would also be useful to track the actual OOB loss values for each iteration. This can be used as an estimate of the generalization error, and might bypass the need for cross validation in some cases. This would also help it integrate into my #23391 framework.

Describe your proposed solution

I propose we add a new attribute: oob_score_ (or alternatively oob_loss_) to GradientBoostingClassifier. This would only be set in cases where subsample < 1. It would be updated in each iteration. Currently we are already calculating this, we just throw it away:

if do_oob:
self.train_score_[i] = loss_(
y[sample_mask],
raw_predictions[sample_mask],
sample_weight[sample_mask],
)
self.oob_improvement_[i] = old_oob_score - loss_(
y[~sample_mask],
raw_predictions[~sample_mask],
sample_weight[~sample_mask],
)

Describe alternatives you've considered, if relevant

You might think that the cumsum of the oob_improvement_ would give us the loss values, and this is almost true, except for the fact that we need the OOB loss for the first iteration, which isn't stored anywhere. So this doesn't solve the issue.

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0