-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[API] A public API for creating and using multiple scorers in the sklearn-ecosystem #28299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
MutliMetricScorer
This is a good proposal. I'm not sure if we have bandwidth to prioritize this right now, but I'd be happy to see this happen. WDYT @scikit-learn/core-devs |
Let me know if you need help with the bandwidth or if there's any refinements that need to be considered into the API! I am admittedly not familiar with all the nuances that might arise but I am happy to spend the time to investigate further into such a proposal. |
From what I am reading, the quick solution is to have I recall wanting to make this change ~4 years ago, but ran into an API limitation when using it with GridSearch. Looking 8000 over the code now, I think we can make this happen. I'll be happy to look into it. |
I've not read in great detail, but just want to note a few things that come to my mind quickly:
|
@thomasjpfan To be honest, I don't really mind how it's implemented, as long as its advertised as non-breaking. I'm sure breaking changes will need to happen at some point but at least it will then be considered something to be documented and advertised. Some considerations:
from sklearn.metrics import get_scorer
# Strawman user: "I know about `get_scorer`, I will use that to get
# multiple scorers and evaluate them sequentially"
metrics = {"acc": get_scorer("accuracy"), "bal_acc": get_scorer("balanced_accuracy")}
scores = {metric_name: scorer(estimator, X, y) for metric_name, scorer in metrics.items()}
# Pro-user: "I know about this `check_multimetric_scoring` from <somewhere>, I can use that for efficient caching".
mm = check_multimetric_scoring(
{"acc": get_scorer("accuracy"), "bal_acc": get_scorer("balanced_accuracy")}, # Boiler plate
backwards_compatable_default_set_to_default_False=True
)
scores = mm(estimator, X, y) I think it could benefit users and maintainability burden to centralize getting scorers. The benefit is primarily no new methods nor concepts to be introduced for metrics, and everything is in one place for users but also for any maintenance considerations. mm = get_scorer(["accuracy", "balanced_accuracy"])
scores = mm(estimator, X, Y) |
I did a little more code search on the use of
I also found this search rather interesting, I search for the usage of https://github.com/search?q=_MultimetricScorer%28+path%3A*.py&type=code&ref=advsearch |
@jnothman From my reading of this issue, this is more about making
This is currently possible in from sklearn import svm, datasets
from sklearn.model_selection import cross_validate
from sklearn.metrics import accuracy_score, f1_score
from pprint import pprint
iris = datasets.load_iris()
def my_scroer(est, X, y):
y_pred = est.predict(X)
return {"roc": accuracy_score(y, y_pred), "auc": f1_score(y, y_pred, average="micro")}
svc = svm.SVC()
results = cross_validate(svc, iris.data, iris.target, scoring=my_scroer)
pprint(results)
We have all the tools to build it now. REF: #12385 In any case, I think this is orthogonal to making multi-metric scoring public. I opened #28360 to add multi-metric support to mutli_scorer = check_scoring(scoring=["r2", "roc_auc", "accuracy"])
# or
mutli_scorer = check_scoring(scorers={
"acc": "accuracy",
"custom": make_scorer(
user_custom_metric,
response_method="predict",
greater_is_better=False
)
}) |
Describe the workflow you want to enable
I would like a public stable interface for multiple scorers that can be developed against for the sklearn eco-system.
Without this, it makes it difficult for libraries to provide any consistent API for dealing with evaluation with multiple scorers unless they:
cross_validate
for evaluation as its the only place user input from multiple metrics can be funneled directly through to sklearn for evaluation.Why developers may prefer an externally sklearn supported multi-metric API:
Context for suggestion:
In re-developing Auto-Sklearn, we perform Hyperparameter Optimization, which can include evaluating many metrics. We require custom evaluation protocols not trivially satisfied by
cross_validate
or the related family of provided sklearn functions. Previously, AutoSklearn would implement it's own metrics, however we'd like to extend this to any sklearn compliant scorer. Using a_MultiMetricScorer
is ideal for their caching and handling of model response values to fit the scorer. Ideally we could also access this cache but that is a secondary concern for now.I had previous solutions which emulated
_MultiMetricScorer
but they broke with sklearn1.3
and1.4
due to changes in scorers. I'm unsure how to reliably build a stable API against sklearn for multiple metrics.An example use case where a user may want to evaluate against
Describe your proposed solution
My proposed solution would involve making some variant of
_MultiMetricScorer
public API. Perhaps this could be made accessible through a non-backwards breaking change toget_scorer
This would allow a user to pass in a
MultiMetricScorer
which I can act upon, or at the very least, alist[str]
I can reliably convert to one.This might cause inconsistency issues internally with
sklearn
which could be problematic. One additional change that might be required would be to add a new non-backwards breaking default tocheck_scoring(..., *, allow_multi_scoring: bool = False)
.** Issues with this proposal **
Scorer
class API, perhaps this suggestion make no sense without a publicScorer
API. However I think that even if the_MultiMetric
class were to remain hidden but there is a publicly advertised method to construct one that had reliable usage semantics, then both classes can remain hidden.Describe alternatives you've considered, if relevant
This easiest solution in most cases is rely on the private
_check_multimetric_scoring
and just instantiating a_MultiMetricScorer
, relying on private functionality.Previous solutions relied using the private
_MultiMetricScorer
and family of_BaseScorer
and it's previous sub-families.Understandably, these private classes are subject to change and broke with 1.3 changes to metadata routing and 1.4 with changes to the
_Scorer
hierarchy.I will rely on private functionality if I have to but it makes developing a library against sklearn quite difficult due to versioning.
If this will not be supported, I will likely go with some wrapper class that is dependent upon the version of scikit-learn in use.
Additional context
Currently, the only way to use multiple scorers for a model is through the interface to
cross_validate(scoring=["a", "b", "c"])
or topermutation_importance
:_check_multimetric_scoring
_MultiMetricScorer
check_scoring
Further Comments
Having access to the transformed cached predictions post scoring would be useful as well but I think that lies outside the scope for now.
The text was updated successfully, but these errors were encountered: