Closed
Description
In various parts of the code, we have tests for sample_weight
support, including in metrics, and for individual estimators. we have some common estimator checks for class_weight
, but not really for sample_weight
functionality (only for weight type invariance).
Recent implementations of sample_weight
include #10933 (KMeans) and #10803 (density estimation). But as well as estimators we have things like common tests for evaluation metrics.
Invariance testing for sample weights should include:
sample_weight=np.ones(len(X))
makes the same model assample_weight=None
sample_weight=random
can make a different model tosample_weight=None
sample_weight=s
for integer arrays
makes the same model asX=np.repeat(X, s, axis=0), y=np.repeat(y, s, axis=0)
(although there may be exceptions to this depending on how the estimator defines iteration, convergence, etc., as in Test test_weighted_vs_repeated is somehow flaky #11236)sample_weight=s * k
for arrays
and positive constantk
makes the same model assample_weight=s
I wonder if it is possible to establish a generic test for this, e.g. something like:
def check_sample_weight_invariance(data_args, fit, is_equal):
"""
Parameters
----------
data_args : dict
Keyword arguments to pass to fit, and which would need to be repeated
to test equivalence to integer sample weights.
fit : callable
Passed data args, returns a model that can be compared with is_equal
is_equal : callable
Passed two models returned from fit, returns a bool to indicate equality
between models
"""