SLEP006: default routing #26179

adrinjalali · 2023-04-14T14:03:53Z

In the context of:

we've also discussed the possibility of developing a default routing strategy for certain metadata. In most cases this is sample_weight and probably groups in scikit-learn itself.

This would mean, after the introduction of SLEP006, this code would work, and route sample_weight to every object which accepts it (since that's what we think is suitable in this case):

# The exact API is TBD
sklearn.set_config(enable_auto_routing=True)

GridSearchCV(
	LogisticRegression(), 
	scorer=a_scorer_supporting_sample_weight, ...
).fit(..., sample_weight=sw)

We can then decide whether or not we want the auto-routing to be enabled by default. One major thing to consider here is that with auto-routing enabled, behavior of the same code can change from version to version for two main reasons:

we change our mind / find bugs / etc in the routing, and how we want to route things
estimator A might not support sample_weight in version x, but starts supporting it in version x+1, and with default routing the behavior of the same code changes

Notes

auto-routing can always be overridden by the user, for more advanced usecases.
third party developers can use the same mechanism, for sample_weight or other metadata if they see fit
we might deem SLEP006: globally setting request values #26050 unnecessary if we develop this feature

cc @scikit-learn/core-devs

The text was updated successfully, but these errors were encountered:

ogrisel · 2025-01-10T15:06:37Z

For the case of sample_weight I think:

the general rule would be to ensure that the resulting estimator implements the reweightnig/repetition equivalence described in the glossary (see also: DOC update and improve the sample_weight entry in the glossary #30564),
it should be always routed to sub-estimators by *SearchCV, Pipeline, FeatureUnion and ColumnTransformer,
if a pipeline is fitted with weights, but some step does not accept sample_weight, we should find a way to warn the user that the result might not be correct from a statistical point of view,
most estimators with a scoring parameter (e.g. *SearchCV) should route the weights to the underlying scorer.

Some estimators such as BaggingClassifier/Regressor should consume the weights to do the resampling but not broadcast to the underlying estimator. We could allow the user to override this (e.g. perform uniform resampling + route the weights to the underlying estimator) but I am not sure how to express that with the routing API only.

There exist meta-estimators such as RANSACRegressor that are even trickier: at a different points its fit method, it either iteratively fits the sub-estimator without weight but under a weight-aware resampled training set or fits the sub-estimator with weights on the final outlier filtered but otherwise non-resampled dataset (#15836). Furthermore, the meta-estimator fit needs to compare weight-aware score values, but those score functions should not be passed the weights when computed on a weight-aware resampled dataset.

github-actions bot added the Needs Triage Issue requires triage label Apr 14, 2023

adrinjalali mentioned this issue Apr 14, 2023

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

thomasjpfan added API and removed Needs Triage Issue requires triage labels Apr 21, 2023

adrinjalali mentioned this issue May 14, 2024

FEAT allow metadata to be transformed in a Pipeline #28901

Merged

glemaitre added this to Metadata routing May 16, 2024

ogrisel mentioned this issue Sep 6, 2024

Make check_sample_weights_invariance cv-aware #29796

Merged

ogrisel mentioned this issue Jan 10, 2025

List of estimators with known incorrect handling of sample_weight #16298

Open

54 tasks

antoinebaker mentioned this issue Feb 24, 2025

ENH default routing policy for sample weight #30887

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLEP006: default routing #26179

SLEP006: default routing #26179

SLEP006: default routing #26179

SLEP006: default routing #26179

Comments

Notes