-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Describe set_{method}_request()
API, expose _MetadataRequester
, or expose _BaseScorer
#31360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/take |
set_{method}_request()
API, expose _MetadataRequester
, or make a base class for Scorersset_{method}_request()
API, expose _MetadataRequester
, or expose _BaseScorer
@MagicDake thanks for your enthusiasm! However I am not sure this issue is yet ready for someone to go implement a solution. The suggested actions read (to me) like a list of alternatives - so the first step here is to have a discussion about the options and form a consensus. One way to do that is for you to describe how you understand the issue and its solution(s). |
Okay, based on our deep dive into the source code, here's a more informed and detailed English reply you can post for Hi @betatim , Thanks for the guidance and for encouraging a thorough discussion before implementation. I completely agree that reaching a consensus on the approach is the best way forward. Having looked into the relevant parts of the codebase (notably Understanding of the Issue: The core challenge is to provide a clear, public, and user-friendly way for custom scorers to participate in scikit-learn's metadata routing mechanism. Users, like Jacob-Stevens-Haas, find it difficult to make their scorers request and consume metadata (e.g., My key takeaways from the source code are:
Proposed Solutions (Informed by Source Code): Given that
Path Forward: I believe a combination of these, perhaps starting with improved documentation (1) and then working towards a public The key is that much of the underlying machinery (thanks to I'm keen to discuss these points and any other perspectives the team has. My aim is to contribute to a solution that is both powerful and user-friendly. Best regards, |
Exactly - I know metadata routing is newish and evolving, and I'm interested in what other considerations and plans y'all have. I'd be hesitant to suggest anything without that knowledge.
I don't know that it's inviable; after all, it is possible to just subclass
|
Two ancillary confusions I've run into trying to understand metadata
|
As a side note about AI/automatically generated work/comments for scikit-learn https://scikit-learn.org/stable/developers/contributing.html#automated-contributions-policy Please review and thinking about the content before posting it. That way we can make sure it is helpful and doesn't lead to a lot of spam. |
Hi @Jacob-
9A9D
Stevens-Haas,
You're right to question the terminology here, as it can be a bit circular at first glance.
Think of the estimator as the "router device" and the MetadataRouter object as its "configuration sheet" or "routing table."
This is another good catch regarding potentially ambiguous phrasing in the documentation.
Essentially, it gives you an inspectable data structure representing the routing rules, rather than a binary blob. Hope this helps clarify those points! Let me know your thoughts. |
@betatim I feel you... @MagicDake if this is a genuine translation of your own words, can make it shorter? Tell the translator to assume I know enough to understand a reply in two or three sentences, and if I don't, I'll ask for clarification. I also don't need the flowery language... I trust everyone's good intent. Returning to the main point... I think either of the options in the title depends upon the last question, i.e. the difference between Looks like there's supporting information in SLEP006, it's reference implementation, and the PR that added |
Describe the issue linked to the documentation
TL;DR: Metadata routing for scoring could either use a base class or documentation of how to write
set_score_request()
.Currently the Metadata Estimator Dev Guide has examples of a metadata-consuming estimator and a metadata-routing estimator. However, the metadata routing is also designed for scorers and CV splitters which may or may not be estimators. Fortunately,
sklearn.model_selection
exposesBaseCrossValidator
, which likeBaseEstimator
, subclasses_MetadataRequester
. Unfortunately,there's no base class for scorers.the base class for scorers,_BaseScorer
, is not public.I don't understand how to string together the relevant methods that should be a part of
set_score_params
, The current workaround is to simply subclassBaseEstimator
, even if I'm not making an estimator, or to subclass_MetadataRequester
, even though its not part of the public API.Or usemake_scorer
to pin the kwargs when instantiating the meta-estimator, rather than infit()
My use case is for scoring a time-series model where the data generating mechanism is known to the experiment, but not the model, and I need to compare the fit model to the true data generating mechanism. I understand how to use a custom scorer in
RandomizedSearchCV
, and the metadata API explains how meta estimators likeRandomizedSearchCV
can pass additional arguments to my custom scorer.Suggest a potential alternative/fix
_MetadataRequester
_BaseScorer
set_{method}_request()
methods. It looks like_MetadataRequester
uses a descriptorRequestMethod
, which relies on an instance having a_get_metadata_request
method and a_metadata_request
attribute (which it doesn't really describe).EDIT: Found out about
_BaseScorer
. Also, removedmake_scorer
workaround, as it doesn't result in passing the fitted estimator to the_score_func
The text was updated successfully, but these errors were encountered: