RFC Unify old GradientBoosting estimators and HGBT

Current situation

We have the unfortunate situation to have 2 different versions of gradient boosting, the old estimators (GradientBoostingClassifier and GradientBoostingRegressor) as well as the new ones using binning and histogram strategies similar to LightGBM (HistGradientBoostingClassifier and HistGradientBoostingRegressor).

This makes advertising the new ones harder, e.g. #26826, and also result in a larger feature gap between those two.
Based on discussions in #27139 and during a monthly meeting (maybe not documented), I'd like to call for comments on the following:

Proposition

Unify both types of gradient boosting in a single class, i.e. the old names GradientBoostingClassifier and make them switch the underlying estimator class based on a parameter value, e.g. max_bins (None->old classes, integer->new classes).

Note that binning and histograms are not the only difference.

Comparison

Algorithm

The old GBT uses Friedman gradient boosting with a line search step. (The lines search sometimes, e.g. for log loss, uses a 2. order approximation and is therefore, sometimes, called "hybrid gradient-Newton boosting"). The trees are learned on the gradients. A tree searches for the best split among all (veeeery many) split candidates for all features. After a single tree is fit, the terminal node values are re-computed which corresponds to a line search step.

The new HGBT uses a 2. order approximation of the loss, i.e. gradients and hessians (XGBoost paper, therefore sometimes called Newton boosting). In addition, it bins/discretizes the features X and uses a histogram of gradients/hessians/counts per feature. A tree then searches for the best split candidate, but there are only n_features * n_bins candidates (muuuuch less than in GBT).

estimator	trees train on	node values (consequence of tree train)	features `X`
GBT	gradients	recomputed in lines search	use as is
HGBT	gradients/hessians	`sum(gradient)/sum(hessian)`	bin/discretize

In fact, one could use 2. order loss (gradients and hessians) without binning X, and vice-versa, use binning with fitting trees on gradients (without hessians).

Parameters

`HistGradientBoostingRegressor`	`GradientBoostingRegressor`	Same	Comment
loss	loss	✅
quantile	alpha	❌
learning_rate	learning_rate	✅
max_iter	n_estimators	❌	#12807 (comment)
max_leaf_nodes	max_leaf_nodes	✅
max_depth	max_depth	✅
min_samples_leaf	min_samples_leaf	✅
l2_regularization	learning_rate	✅
max_features	max_features	✅
max_bins	⛔ (nonsense)	❌
categorical_features	⛔	❌
monotonic_cst	⛔	❌	#27305
interaction_cst	⛔	❌
warm_start	warm_start	✅
early_stopping	⛔	❌
scoring	⛔	❌
validation_fraction	validation_fraction	✅
n_iter_no_change	n_iter_no_change	✅
tol	tol	✅
verbose	verbose	✅
random_state	random_state	✅
class_weight	⛔	❌
⛔	subsample	❌	#16062
⛔ (nonsense)	criterion	❌
⛔	min_samples_split	❌
⛔	min_weight_fraction_leaf	❌
⛔	min_impurity_decrease	❌
⛔	init	❌	#27109
⛔	ccp_alpha	❌

In fact, only the quantile/alpha and max_iter/n_estimator parameters are conflicting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC Unify old GradientBoosting estimators and HGBT #27873

Current situation

Proposition

Comparison

Algorithm

Parameters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC Unify old GradientBoosting estimators and HGBT #27873

Description

Current situation

Proposition

Comparison

Algorithm

Parameters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions