10000 RFC Unify old GradientBoosting estimators and HGBT · Issue #27873 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

RFC Unify old GradientBoosting estimators and HGBT #27873

@lorentzenchr

Description

@lorentzenchr

Current situation

We have the unfortunate situation to have 2 different versions of gradient boosting, the old estimators (GradientBoostingClassifier and GradientBoostingRegressor) as well as the new ones using binning and histogram strategies similar to LightGBM (HistGradientBoostingClassifier and HistGradientBoostingRegressor).

This makes advertising the new ones harder, e.g. #26826, and also result in a larger feature gap between those two.
Based on discussions in #27139 and during a monthly meeting (maybe not documented), I'd like to call for comments on the following:

Proposition

Unify both types of gradient boosting in a single class, i.e. the old names GradientBoostingClassifier and make them switch the underlying estimator class based on a parameter value, e.g. max_bins (None->old classes, integer->new classes).

Note that binning and histograms are not the only difference.

Comparison

Algorithm

The old GBT uses Friedman gradient boosting with a line search step. (The lines search sometimes, e.g. for log loss, uses a 2. order approximation and is therefore, sometimes, called "hybrid gradient-Newton boosting"). The trees are learned on the gradients. A tree searches for the best split among all (veeeery many) split candidates for all features. After a single tree is fit, the terminal node values are re-computed which corresponds to a line search step.

The new HGBT uses a 2. order approximation of the loss, i.e. gradients and hessians (XGBoost paper, therefore sometimes called Newton boosting). In addition, it bins/discretizes the features X and uses a histogram of gradients/hessians/counts per feature. A tree then searches for the best split candidate, but there are only n_features * n_bins candidates (muuuuch less than in GBT).

estimator trees train on node values (consequence of tree train) features X
GBT gradients recomputed in lines search use as is
HGBT gradients/hessians sum(gradient)/sum(hessian) bin/discretize

In fact, one could use 2. order loss (gradients and hessians) without binning X, and vice-versa, use binning with fitting trees on gradients (without hessians).

Parameters

HistGradientBoostingRegressor GradientBoostingRegressor Same Comment
loss loss
quantile alpha
learning_rate learning_rate
max_iter n_estimators #12807 (comment)
max_leaf_nodes max_leaf_nodes
max_depth max_depth
min_samples_leaf min_samples_leaf
l2_regularization learning_rate
max_features max_features
max_bins ⛔ (nonsense)
categorical_features
monotonic_cst #27305
interaction_cst
warm_start warm_start
early_stopping
scoring
validation_fraction validation_fraction
n_iter_no_change n_iter_no_change
tol tol
verbose verbose
random_state random_state
class_weight
subsample #16062
⛔ (nonsense) criterion
min_samples_split
min_weight_fraction_leaf
min_impurity_decrease
init #27109
ccp_alpha

In fact, only the quantile/alpha and max_iter/n_estimator parameters are conflicting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Discussion

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0