8000 ENH Support sample weights in HGBT by adrinjalali · Pull Request #14696 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Support sample weights in HGBT #14696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Feb 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
376c477
check consistent lengths
adrinjalali Aug 20, 2019
f9e0a1b
changes to loss and gradient_boosting.py
adrinjalali Aug 20, 2019
3793661
pep8
adrinjalali Aug 21, 2019
e02126c
merge upstream/master
adrinjalali Aug 21, 2019
4e686d4
revert loss.py to not take into account sample weights, the caller ha…
adrinjalali Aug 30, 2019
8cd484c
merge upstream/master
adrinjalali Aug 30, 2019
a9c30d5
gb tests pass, loss has a different average method
adrinjalali Aug 30, 2019
f09beb6
loss handles sample weight
adrinjalali Aug 30, 2019
c922d30
more fixes for tests
adrinjalali Aug 30, 2019
39282a0
pep8
adrinjalali Aug 30, 2019
934323e
fix constant hessian and sample weight
adrinjalali Aug 31, 2019
54d3c27
fix classification losses, and test
adrinjalali Aug 31, 2019
df28919
fix the test
adrinjalali Aug 31, 2019
56b385a
minor fix
adrinjalali Sep 1, 2019
434d0d0
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Sep 3, 2019
66f7ad3
address comments, move sample_weight to cython
adrinjalali Sep 3, 2019
a1440bb
change loss API
adrinjalali Sep 4, 2019
34ad5a5
_loss perf improvement
adrinjalali Sep 4, 2019
892a5b5
adding more tests
adrinjalali Sep 6, 2019
bd9ae1d
merge upstream/master
adrinjalali Sep 14, 2019
78b8f64
fixing more of the sample weight for LAD
adrinjalali Sep 17, 2019
11a07dd
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Sep 18, 2019
eb3f8f1
fix LAD sample weight API
adrinjalali Sep 18, 2019
cccde96
apply Guillaume's suggestions
adrinjalali Sep 18, 2019
7500c78
almost weighted binning
adrinjalali Sep 18, 2019
ccc527b
weighted quantiles
adrinjalali Sep 19, 2019
868a2f0
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Sep 19, 2019
bbe5fce
merge upstream/master
adrinjalali Sep 26, 2019
35cf4b6
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Oct 1, 2019
a2f79b2
add loss tests, and fixes
adrinjalali Oct 1, 2019
0ff3b72
fix missing arg
adrinjalali Oct 2, 2019
ed24433
add comment
adrinjalali Oct 2, 2019
05140c4
Merge branch 'master' into pr-14696
NicolasHug Oct 16, 2019
7656657
fix missing loss param in test
NicolasHug Oct 17, 2019
ec570ee
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Oct 17, 2019
44b5d1c
factorized tests and used reasonable values for raw_predictions for g…
NicolasHug Oct 17, 2019
728b32e
Added test for init_gradient_and_hessians
NicolasHug Oct 17, 2019
3a0b62f
fix typo in test
NicolasHug Oct 17, 2019
9e0e7b6
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Oct 18, 2019
cb8b94c
Added test for sum_hessians in histogram
NicolasHug Oct 18, 2019
90550af
make test pass, but need to unit test binning
NicolasHug Oct 18, 2019
885c0b9
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Oct 22, 2019
b233c63
fix test
NicolasHug Oct 22, 2019
964c68a
Added some tests for binning (failing)
NicolasHug Oct 22, 2019
8e4d207
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…
NicolasHug Oct 28, 2019
0931920
WIP
NicolasHug Oct 28, 2019
0616163
Slight test refactoring
NicolasHug Oct 28, 2019
9987ff7
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Oct 30, 2019
6116a0d
whats_new
adrinjalali Oct 30, 2019
9ff57a6
add content to user guide
adrinjalali Oct 30, 2019
2ffee14
remove todo, fix pep8
adrinjalali Oct 31, 2019
ec8b37d
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Nov 1, 2019
d6e0482
pdp notimplementederror
adrinjalali Nov 1, 2019
942d542
apply Nicolas's suggestions
adrinjalali Nov 4, 2019
5d16e27
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Nov 4, 2019
45b360c
fix
adrinjalali Nov 4, 2019
fc2ad10
revert return_ones
adrinjalali Nov 4, 2019
ac3df84
add sw to the benchmark
adrinjalali Nov 5, 2019
dbbe1ee
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Nov 5, 2019
577a4f3
check -> test in test
adrinjalali Nov 5, 2019
916eaa5
typo
adrinjalali Nov 5, 2019
d0775e4
merge upstream/master
adrinjalali Nov 6, 2019
aba5160
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Nov 11, 2019
1938175
pass sample_weight to loss's init, and set hessians_are_constant there
adrinjalali Nov 11, 2019
469b6d9
simply hessians_are_constant
adrinjalali Nov 11, 2019
30964d8
merge upstream/master
adrinjalali Jan 6, 2020
b759142
pass tests after merge
adrinjalali Jan 6, 2020
76dc710
sample with replacement before binning
adrinjalali Jan 6, 2020
9bc22d9
pass ints to choice and don't always subsample
adrinjalali Jan 14, 2020
df55f08
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Jan 14, 2020
a4877f9
fix local var
adrinjalali Jan 14, 2020
d5f98d7
Revert "fix local var"
adrinjalali Feb 11, 2020
cfaa057
Revert "pass ints to choice and don't always subsample"
adrinjalali Feb 11, 2020
f5960f2
Revert "sample with replacement before binning"
adrinjalali Feb 11, 2020
c9030bc
merge upstream/master
adrinjalali Feb 11, 2020
13d120a
CLN Address comments
thomasjpfan Feb 22, 2020
dcccf01
address Thomas's comments
adrinjalali Feb 24, 2020
1c81ebd
Merge branch 'hgbt/sample_weights' of github.com:adrinjalali/scikit-l…
adrinjalali Feb 24, 2020
47b11d6
Merge remote-tracking branch 'upstream/master' into hgbt/sample_weights
adrinjalali Feb 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions benchmarks/bench_hist_gradient_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@
parser.add_argument('--n-samples-max', type=int, default=int(1e6))
parser.add_argument('--n-features', type=int, default=20)
parser.add_argument('--max-bins', type=int, default=255)
parser.add_argument('--random-sample-weights', action="store_true",
default=False,
help="generate and use random sample weights")
args = parser.parse_args()

n_leaf_nodes = args.n_leaf_nodes
Expand All @@ -46,6 +49,7 @@ def get_estimator_and_data():
n_features=args.n_features,
n_classes=args.n_classes,
n_clusters_per_class=1,
n_informative=args.n_classes,
random_state=0)
return X, y, HistGradientBoostingClassifier
elif args.problem == 'regression':
Expand All @@ -60,15 +64,30 @@ def get_estimator_and_data():
np.bool)
X[mask] = np.nan

X_train_, X_test_, y_train_, y_test_ = train_test_split(
X, y, test_size=0.5, random_state=0)
if args.random_sample_weights:
sample_weight = np.random.rand(len(X)) * 10
else:
sample_weight = None

if sample_weight is not None:
(X_train_, X_test_, y_train_, y_test_,
sample_weight_train_, _) = train_test_split(
X, y, sample_weight, test_size=0.5, random_state=0)
else:
X_train_, X_test_, y_train_, y_test_ = train_test_split(
X, y, test_size=0.5, random_state=0)
sample_weight_train_ = None


def one_run(n_samples):
X_train = X_train_[:n_samples]
X_test = X_test_[:n_samples]
y_train = y_train_[:n_samples]
y_test = y_test_[:n_samples]
if sample_weight is not None:
sample_weight_train = sample_weight_train_[:n_samples]
else:
sample_weight_train = None
assert X_train.shape[0] == n_samples
assert X_test.shape[0] == n_samples
print("Data size: %d samples train, %d samples test."
Expand All @@ -93,7 +112,7 @@ def one_run(n_samples):
if loss == 'default':
loss = 'least_squares'
est.set_params(loss=loss)
est.fit(X_train, y_train)
est.fit(X_train, y_train, sample_weight=sample_weight_train)
sklearn_fit_duration = time() - tic
tic = time()
sklearn_score = est.score(X_test, y_test)
Expand All @@ -110,7 +129,7 @@ def one_run(n_samples):
lightgbm_est = get_equivalent_estimator(est, lib='lightgbm')

tic = time()
lightgbm_est.fit(X_train, y_train)
lightgbm_est.fit(X_train, y_train, sample_weight=sample_weight_train)
lightgbm_fit_duration = time() - tic
tic = time()
lightgbm_score = lightgbm_est.score(X_test, y_test)
Expand All @@ -127,7 +146,7 @@ def one_run(n_samples):
xgb_est = get_equivalent_estimator(est, lib='xgboost')

tic = time()
xgb_est.fit(X_train, y_train)
xgb_est.fit(X_train, y_train, sample_weight=sample_weight_train)
xgb_fit_duration = time() - tic
tic = time()
xgb_score = xgb_est.score(X_test, y_test)
Expand All @@ -144,7 +163,7 @@ def one_run(n_samples):
cat_est = get_equivalent_estimator(est, lib='catboost')

tic = time()
cat_est.fit(X_train, y_train)
cat_est.fit(X_train, y_train, sample_weight=sample_weight_train)
cat_fit_duration = time() - tic
tic = time()
cat_score = cat_est.score(X_test, y_test)
Expand Down
36 changes: 34 additions & 2 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -856,8 +856,7 @@ leverage integer-based data structures (histograms) instead of relying on
sorted continuous values when building the trees. The API of these
estimators is slightly different, and some of the features from
:class:`GradientBoostingClassifier` and :class:`GradientBoostingRegressor`
are not yet supported: in particular sample weights, and some loss
functions.
are not yet supported, for instance some loss functions.

These estimators are still **experimental**: their predictions
and their API might change without any deprecation cycle. To use them, you
Expand Down Expand Up @@ -957,6 +956,39 @@ If no missing values were encountered for a given feature during training,
then samples with missing values are mapped to whichever child has the most
samples.

Sample weight support
---------------------

:class:`HistGradientBoostingClassifier` and
:class:`HistGradientBoostingRegressor` sample support weights during
:term:`fit`.

The following toy example demonstrates how the model ignores the samples with
zero sample weights:

>>> X = [[1, 0],
... [1, 0],
... [1, 0],
... [0, 1]]
>>> y = [0, 0, 1, 0]
>>> # ignore the first 2 training samples by setting their weight to 0
>>> sample_weight = [0, 0, 1, 1]
>>> gb = HistGradientBoostingClassifier(min_samples_leaf=1)
>>> gb.fit(X, y, sample_weight=sample_weight)
HistGradientBoostingClassifier(...)
>>> gb.predict([[1, 0]])
array([1])
>>> gb.predict_proba([[1, 0]])[0, 1]
0.99...

As you can see, the `[1, 0]` is comfortably classified as `1` since the first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the probability show more comfort?

gb.predict_proba([[1, 0]])[0, 1]
# 0.99...

two samples are ignored due to their sample weights.

Implementation detail: taking sample weights into account amounts to
multiplying the gradients (and the hessians) by the sample weights. Note that
the binning stage (specifically the quantiles computation) does not take the
weights into account.

Low-level parallelism
---------------------

Expand Down
4 changes: 2 additions & 2 deletions doc/whats_new/v0.22.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,11 +401,11 @@ Changelog
<glemaitre>` and :user:`Caio Oliveira <caioaao>` and :pr:`15138` by
:user:`Jon Cusick <jcusick13>`..

- Many improvements were made to
- |MajorFeature| Many improvements were made to
:class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor`:

- |MajorFeature| Estimators now natively support dense data with missing
- |Feature| Estimators now natively support dense data with missing
values both for training and predicting. They also support infinite
values. :pr:`13911` and :pr:`14406` by `Nicolas Hug`_, `Adrin Jalali`_
and `Olivier Grisel`_.
Expand Down
4 changes: 4 additions & 0 deletions doc/whats_new/v0.23.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,10 @@ Changelog
:mod:`sklearn.ensemble`
.......................

- |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingRegressor` now support
:term:`sample_weight`. :pr:`14696` by `Adrin Jalali`_ and `Nicolas Hug`_.

- |API| Added boolean `verbose` flag to classes:
:class:`ensemble.VotingClassifier` and :class:`ensemble.VotingRegressor`.
:pr:`15991` by :user:`Sam Bail <spbail>`,
Expand Down
96 changes: 80 additions & 16 deletions sklearn/ensemble/_hist_gradient_boosting/_loss.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,51 @@ def _update_gradients_least_squares(

n_samples = raw_predictions.shape[0]
for i in prange(n_samples, schedule='static', nogil=True):
# Note: a more correct exp is 2 * (raw_predictions - y_true)
# but since we use 1 for the constant hessian value (and not 2) this
# is strictly equivalent for the leaves values.
gradients[i] = raw_predictions[i] - y_true[i]


def _update_gradients_hessians_least_squares(
G_H_DTYPE_C [::1] gradients, # OUT
G_H_DTYPE_C [::1] hessians, # OUT
const Y_DTYPE_C [::1] y_true, # IN
const Y_DTYPE_C [::1] raw_predictions, # IN
const Y_DTYPE_C [::1] sample_weight): # IN

cdef:
int n_samples
int i

n_samples = raw_predictions.shape[0]
for i in prange(n_samples, schedule='static', nogil=True):
# Note: a more correct exp is 2 * (raw_predictions - y_true) * sample_weight
# but since we use 1 for the constant hessian value (and not 2) this
# is strictly equivalent for the leaves values.
gradients[i] = (raw_predictions[i] - y_true[i]) * sample_weight[i]
hessians[i] = sample_weight[i]


def _update_gradients_hessians_least_absolute_deviation(
G_H_DTYPE_C [::1] gradients, # OUT
G_H_DTYPE_C [::1] hessians, # OUT
const Y_DTYPE_C [::1] y_true, # IN
const Y_DTYPE_C [::1] raw_predictions, # IN
const Y_DTYPE_C [::1] sample_weight): # IN

cdef:
int n_samples
int i

n_samples = raw_predictions.shape[0]
for i in prange(n_samples, schedule='static', nogil=True):
# gradient = sign(raw_predicition - y_pred) * sample_weight
gradients[i] = sample_weight[i] * (2 *
(y_true[i] - raw_predictions[i] < 0) - 1)
hessians[i] = sample_weight[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this works because sample_weight is non-negative? If so, lets leave a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this works because sample_weight is non-negative?

No, this works because:

  • accounting for SW means we need to multiply gradients and hessians by SW
  • without SW, the hessian of this loss is constant and is equal to 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the math. lightgbm does the same thing with the l1 loss.

https://github.com/microsoft/LightGBM/blob/4adb9ff71f41f6b5c7a51f667a8fb9adf38cf602/src/objective/regression_objective.hpp#L225-L229

When I see derivative of sign(x), I think of the dirac delta function: https://en.wikipedia.org/wiki/Sign_function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I see what you did here: #13896 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, also, if you look at the sklearn wrapper in lightgbm, all hessians and gradients are simply multiplied by SW.



def _update_gradients_least_absolute_deviation(
G_H_DTYPE_C [::1] gradients, # OUT
const Y_DTYPE_C [::1] y_true, # IN
Expand All @@ -49,44 +91,66 @@ def _update_gradients_hessians_binary_crossentropy(
G_H_DTYPE_C [::1] gradients, # OUT
G_H_DTYPE_C [::1] hessians, # OUT
const Y_DTYPE_C [::1] y_true, # IN
const Y_DTYPE_C [::1] raw_predictions): # IN
const Y_DTYPE_C [::1] raw_predictions, # IN
const Y_DTYPE_C [::1] sample_weight): # IN
cdef:
int n_samples
Y_DTYPE_C p_i # proba that ith sample belongs to positive class
int i

n_samples = raw_predictions.shape[0]
for i in prange(n_samples, schedule='static', nogil=True):
p_i = _cexpit(raw_predictions[i])
gradients[i] = p_i - y_true[i]
hessians[i] = p_i * (1. - p_i)
if sample_weight is None:
for i in prange(n_samples, schedule='static', nogil=True):
p_i = _cexpit(raw_predictions[i])
gradients[i] = p_i - y_true[i]
hessians[i] = p_i * (1. - p_i)
else:
for i in prange(n_samples, schedule='static', nogil=True):
p_i = _cexpit(raw_predictions[i])
gradients[i] = (p_i - y_true[i]) * sample_weight[i]
hessians[i] = p_i * (1. - p_i) * sample_weight[i]


def _update_gradients_hessians_categorical_crossentropy(
G_H_DTYPE_C [:, ::1] gradients, # OUT
G_H_DTYPE_C [:, ::1] hessians, # OUT
const Y_DTYPE_C [::1] y_true, # IN
const Y_DTYPE_C [:, ::1] raw_predictions): # IN
const Y_DTYPE_C [:, ::1] raw_predictions, # IN
const Y_DTYPE_C [::1] sample_weight): # IN
cdef:
int prediction_dim = raw_predictions.shape[0]
int n_samples = raw_predictions.shape[1]
int k # class index
int i # sample index
Y_DTYPE_C sw
# p[i, k] is the probability that class(ith sample) == k.
# It's the softmax of the raw predictions
Y_DTYPE_C [:, ::1] p = np.empty(shape=(n_samples, prediction_dim))
Y_DTYPE_C p_i_k

for i in prange(n_samples, schedule='static', nogil=True):
# first compute softmaxes of sample i for each class
for k in range(prediction_dim):
p[i, k] = raw_predictions[k, i] # prepare softmax
_compute_softmax(p, i)
# then update gradients and hessians
for k in range(prediction_dim):
p_i_k = p[i, k]
gradients[k, i] = p_i_k - (y_true[i] == k)
hessians[k, i] = p_i_k * (1. - p_i_k)
if sample_weight is None:
for i in prange(n_samples, schedule='static', nogil=True):
# first compute softmaxes of sample i for each class
for k in range(prediction_dim):
p[i, k] = raw_predictions[k, i] # prepare softmax
_compute_softmax(p, i)
# then update gradients and hessians
for k in range(prediction_dim):
p_i_k = p[i, k]
gradients[k, i] = p_i_k - (y_true[i] == k)
hessians[k, i] = p_i_k * (1. - p_i_k)
else:
for i in prange(n_samples, schedule='static', nogil=True):
# first compute softmaxes of sample i for each class
for k in range(prediction_dim):
p[i, k] = raw_predictions[k, i] # prepare softmax
_compute_softmax(p, i)
# then update gradients and hessians
sw = sample_weight[i]
for k in range(prediction_dim):
p_i_k = p[i, k]
gradients[k, i] = (p_i_k - (y_true[i] == k)) * sw
hessians[k, i] = (p_i_k * (1. - p_i_k)) * sw


cdef inline void _compute_softmax(Y_DTYPE_C [:, ::1] p, const int i) nogil:
Expand Down
Loading
0