8000 SLEP006: default routing · Issue #26179 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

SLEP006: default routing #26179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adrinjalali opened this issue Apr 14, 2023 · 1 comment
Open

SLEP006: default routing #26179

adrinjalali opened this issue Apr 14, 2023 · 1 comment
Labels

Comments

@adrinjalali
Copy link
Member

In the context of:

we've also discussed the possibility of developing a default routing strategy for certain metadata. In most cases this is sample_weight and probably groups in scikit-learn itself.

This would mean, after the introduction of SLEP006, this code would work, and route sample_weight to every object which accepts it (since that's what we think is suitable in this case):

# The exact API is TBD
sklearn.set_config(enable_auto_routing=True)

GridSearchCV(
	LogisticRegression(), 
	scorer=a_scorer_supporting_sample_weight, ...
).fit(..., sample_weight=sw)

We can then decide whether or not we want the auto-routing to be enabled by default. One major thing to consider here is that with auto-routing enabled, behavior of the same code can change from version to version for two main reasons:

  • we change our mind / find bugs / etc in the routing, and how we want to route things
  • estimator A might not support sample_weight in version x, but starts supporting it in version x+1, and with default routing the behavior of the same code changes

Notes

  • auto-routing can always be overridden by the user, for more advanced usecases.
  • third party developers can use the same mechanism, for sample_weight or other metadata if they see fit
  • we might deem SLEP006: globally setting request values #26050 unnecessary if we develop this feature

cc @scikit-learn/core-devs

@ogrisel
Copy link
Member
ogrisel commented Jan 10, 2025

For the case of sample_weight I think:

Some estimators such as BaggingClassifier/Regressor should consume the weights to do the resampling but not broadcast to the underlying estimator. We could allow the user to override this (e.g. perform uniform resampling + route the weights to the underlying estimator) but I am not sure how to express that with the routing API only.

There exist meta-estimators such as RANSACRegressor that are even trickier: at a different points its fit method, it either iteratively fits the sub-estimator without weight but under a weight-aware resampled training set or fits the sub-estimator with weights on the final outlier filtered but otherwise non-resampled dataset (#15836). Furthermore, the meta-estimator fit needs to compare weight-aware score values, but those score functions should not be passed the weights when computed on a weight-aware resampled dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

3 participants
0