[MRG] FIX ensure equivalence between resampling and sample_weight in AdaBoost#14252
[MRG] FIX ensure equivalence between resampling and sample_weight in AdaBoost#14252glemaitre wants to merge 6 commits intoscikit-learn:masterfrom
Conversation
There was a problem hiding this comment.
So we previously supported regressors without sample_weight support but we did so incorrectly?
| estimator.fit(X_, y_) | ||
| # create a bootstrap sample with observations having non-null weights | ||
| n_samples = X.shape[0] | ||
| sample_mask = sample_weight > 0 |
There was a problem hiding this comment.
Here, I assume that we cannot have negative weights. Does it make sense?
The bootstrapping was equivalent to what was done previously. However, the previous way of constructing the bootstrap sample did not ensure that passing Then, there is a bug when computing the error vector where we should not consider the samples with 0 weight to find the maximum and normalize properly the computed error (which influence the weight of each weak learner). Now I see that we could break code which was using regressor without |
| @staticmethod | ||
| def _fit_estimator(estimator, X, y, sample_weight): | ||
| """Fit an estimator checking support for sample weight.""" | ||
| try: |
There was a problem hiding this comment.
@jnothman I meant something like:
try:
return estimator.fit(X, y, sample_weight=sample_weight)
except TypeError:
# generate a bootstrap based on sample_weight
bootstrap_idx = random_state.choice(
np.arange(sample_weight.size), size=sample_weight.size,
replace=True, p=sample_weight
)
return estimator.fit(safe_indexing(X, bootstrap_idx),
safe_indexing(y, bootstrap_idx))| sample_weight_bootstrap = (np.ones((n_samples,)) * | ||
| np.bincount(indices, minlength=n_samples)) | ||
| # combine the bootstrap sample with the original sample_weight | ||
| sample_weight_bootstrap *= sample_weight |
There was a problem hiding this comment.
This might be normalized to sum to 1
|
Uhm I am having second thought here. The failure of all ensemble algorithms is due to the bootstrap sample. For instance, let's define When using
I am more inclined with 2. @ogrisel @jnothman any thoughts (if my explanation is good enough and not confusing :)) |
|
closing in favor of #14294 |
Toward solving #14191
AdaBoostRegressordid not ensure that resamplingXandywas equivalent to usingsample_weight. The current changes ensure that this is true keeping the rest of the algorithm unchanged. HoweverAdaBoostRegressorwill now require thebase_estimatorto supportsample_weightas doesAdaBoostClassifier. In addition, some additional tests were fixed since they were not checking the right thing.