Store the OOB Loss for GradientBoostingClassifier

Describe the workflow you want to enable

Currently the only OOB-related performance metric we store on GradientBoostingClassifier is oob_improvement_, which is an array of OOB loss decreases per iteration. However, it would also be useful to track the actual OOB loss values for each iteration. This can be used as an estimate of the generalization error, and might bypass the need for cross validation in some cases. This would also help it integrate into my #23391 framework.

Describe your proposed solution

I propose we add a new attribute: oob_score_ (or alternatively oob_loss_) to GradientBoostingClassifier. This would only be set in cases where subsample < 1. It would be updated in each iteration. Currently we are already calculating this, we just throw it away:

scikit-learn/sklearn/ensemble/_gb.py

Lines 758 to 768 in 32f9dea

    
           if do_oob: 
        
               self.train_score_[i] = loss_( 
        
                   y[sample_mask], 
        
                   raw_predictions[sample_mask], 
        
                   sample_weight[sample_mask], 
        
               ) 
        
               self.oob_improvement_[i] = old_oob_score - loss_( 
        
                   y[~sample_mask], 
        
                   raw_predictions[~sample_mask], 
        
                   sample_weight[~sample_mask], 
        
               )

Describe alternatives you've considered, if relevant

You might think that the cumsum of the oob_improvement_ would give us the loss values, and this is almost true, except for the fact that we need the OOB loss for the first iteration, which isn't stored anywhere. So this doesn't solve the issue.

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if do_oob:
	self.train_score_[i] = loss_(
	y[sample_mask],
	raw_predictions[sample_mask],
	sample_weight[sample_mask],
	)
	self.oob_improvement_[i] = old_oob_score - loss_(
	y[~sample_mask],
	raw_predictions[~sample_mask],
	sample_weight[~sample_mask],
	)

Uh oh!

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions