Add a list_scorers function to sklearn.metrics #10712

jnothman · 2018-02-26T23:07:57Z

The scoring parameter allows users to specify a scoring method by name. Currently a list of names is available by getting it wrong:

>>> from sklearn.metrics import get_scorer
>>> get_scorer('rubbish')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joel/repos/scikit-learn/sklearn/metrics/scorer.py", line 239, in get_scorer
    % (scoring, sorted(scorers)))
ValueError: 'rubbish' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

I think this error message, and maintaining it, is getting a bit absurd. Instead we should have a function sklearn.metrics.list_scorers implemented in sklearn/metrics/scorer.py and the error message should say "Use sklearn.metrics.list_scorers to get valid strings.". Perhaps we would eventually have list_scorers allow users to filter scorers by task type (binary classification, multiclass classification, multilabel classification, regression, etc.), or even to provide metadata about each scorer (a description, for instance), but initially we should just be able to list them.

The text was updated successfully, but these errors were encountered:

danielleshwed · 2018-02-27T02:08:54Z

I'd like to try this :)

jnothman · 2018-02-27T02:48:55Z

Go for it.

…

On 27 February 2018 at 13:08, Danielle Shwed ***@***.***> wrote: I'd like to try this :) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#10712 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6zmUjwQR58l8APWYP3dqOhcO0LCWks5tY2O3gaJpZM4SUFM3> .

qinhanmin2014 · 2018-02-27T03:59:01Z

@jnothman It seems that we already have a simple way

from sklearn.metrics import SCORERS
print(SCORERS.keys())

to list all the scorers (which is used to generate the above error message)?
How about just improve the error message "Use sklearn.metrics.SCORERS.keys() to get valid strings."

jnothman · 2018-02-27T05:20:13Z

you're right. we could do that. I had once thought list_scorers would be better to remove deprecated. We could consider list_scorers if we want to list by task type, etc.

…

On 27 February 2018 at 14:59, Hanmin Qin ***@***.***> wrote: @jnothman <https://github.com/jnothman> It seems that we already have a simple way from sklearn.metrics import SCORERSprint(SCORERS.keys()) to list all the scorers (which is used to generate the above error message)? How about just improve the error message "Use sklearn.metrics.SCORERS.keys() to get valid strings." — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10712 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6xPNHR_dEIuoWBCMbf-rUmaoWs2pks5tY32HgaJpZM4SUFM3> .

jnothman · 2018-02-27T06:16:41Z

sorted(SCORERS.keys()) might be better advice.

But I still think there's a usability problem in the length of the list returned and its heterogeneity.

Let's make these scorers usable (I think @amueller will agree):

Let's define:

def list_scorers(task=None):
    """Get a list of named scorers with filtering applied

    Parameters
    ----------
    task : str, optional
        Regression tasks: 'regression', 'multiple-regression'
        Classification tasks: 'binary', 'multiclass', 'multilabel', 'multiple-classification'
        Clustering tasks: 'clustering', 'clustering-without-truth'
        Etc.

    Returns
    -------
    list of str
        In lexicographic order
    """

An alternative to this is to provide some kind of structured data that can be interpreted as a dataframe and let the user do their own filtering:

[{'name': 'f1', 'binary': True, 'multiclass': False, 'regression': False},
 ...]

Personally, I think list_scorers is more directly useful and should be available.

Either way, we would not just be storing SCORERS, but would have a _register_scorer(scorer, tasks=[...]) function (or perhaps a public one!) that catalogues scorer metadata.

Notes:

We may also need to provide a way to say that the estimator needs to have predict_proba, or either predict_proba or decision_function, although this duplicates the arguments to make_scorer
We could imagine validating, in cross_validate for instance, that the scorer metadata matched the estimator and the target type. This may create backwards compatibility issues, so should be avoided for now. For instance, our metadata might be a bit dogmatic: we might not want 'multiclass' as a task for f1_micro, since it is identical to accuracy, but existing code may still use it.

qinhanmin2014 · 2018-02-27T13:44:42Z

I'm +0 for such a function.
(1) We have sorted(SCORERS.keys()) to list all the scorers
(2) We have a table in the user guide to show different kinds of scorers (regression, classification, clustering) and corresponding metrics.
So currently, according to my limited knowledge, I can't fully understand the usage of list_scorers.

jnothman · 2018-02-27T21:57:09Z

You might be right, but:

It makes it easier for users (and automated tools) to find metrics appropriate for their task. Currently the docs list a mix of classification metrics without distinguishing binary from multiclass etc. clearly.
Currently a bad choice (e.g. 'recall' for multiclass) will result in the cross validation crashing after. If user gets their scorer from a list for their task, they might recognise that their search will break before it gets as far as scoring. We could also provide a warning or error at validation time if the metric seems unsuitable.
It may make the docs easier to maintain
It makes it easier for frameworks built around scikit-learn to offer users appropriate choices without having to maintain their own scorer metadata
Making register_scorer public would allow users to use custom scorers by name, ma 8000 king the interface for scorer selection more consistent.

Opinion @amueller?

mohamed-ali · 2018-02-28T18:28:02Z

@jnothman, if there is a consensus, and nobody is working on this, I'd like to take it.

jnothman · 2018-03-01T00:44:16Z

at the moment there is no consensus. you can always take a risk of trying to implement it, show whether it is useful, champion it, etc. and seeing if others come on board.

jnothman · 2018-03-01T07:01:33Z

Also, you should check if @danielleshwed hopes to work on it

mohamed-ali · 2018-03-17T18:45:08Z

I think this can be added at least to the testing functions, just like sklearn.utils.testing.all_estimators.

amueller · 2018-04-30T23:41:02Z

not sold on this one. Seems harder to maintain. And in the end, even for other interfaces, the user needs to select, right? So it's more a matter of pruning down which are appropriate for which task to select from... Automatic selection of metrics is not really a thing, right? My notebooks (and book?) have SCORERS.keys(). And "recall" is also a bad choice for binary, right? So I'm not sure what makes a "good" choice.

jnothman · 2018-05-01T07:31:44Z

"recall" is a good choice if you're optimising logistic loss... And as one of a few diagnostic measures. I'm happy to close this, though.

amueller · 2018-05-01T13:06:56Z

Only monitoring recall and not precision means you could just change the class_weight and get better, right?

amueller · 2018-05-01T13:07:30Z

maybe the docs should do SCORERS.keys() instead of the the error message, though.

jnothman · 2018-05-01T22:01:25Z

yes to both

qinhanmin2014 · 2018-05-02T03:09:05Z

+1 to simplify the error message with sorted(SCORERS.keys()), marking as good first issue.

princejha95 · 2018-05-03T08:15:46Z

Can I try this if it's okay ?

qinhanmin2014 · 2018-05-03T08:19:55Z

@princejha95 Please go ahead :)

princejha95 · 2018-05-03T09:39:34Z

raise ValueError('%r is not a valid scoring value. '

                        'Valid options are %s'

                        % (scoring, sorted(scorers)))

                        'For Valid options use sorted(SCORERS.keys())'

```
                        % (scoring))
```

By doing this, new error message is 'ValueError: 'rubbish' is not a valid scoring value. For Valid options use sorted(SCORERS.keys())'

jnothman · 2018-05-03T09:42:48Z

looks fine. please submit a pull request

princejha95 · 2018-05-03T13:03:45Z

I have changed the error message as suggested by @qinhanmin2014 above. Now if user gives a scoring value which is not present in SCORER dictionary, then the error message will suggest user that he can find valid scoring values from sorted(SCORERS.keys()).

jnothman added Easy Well-defined and straightforward way to resolve Enhancement help wanted labels Feb 26, 2018

jnothman mentioned this issue Feb 26, 2018

[MRG] Enhancement: Add MAPE as an evaluation metric #10711

Closed

qinhanmin2014 added the good first issue Easy with clear instructions to resolve label May 2, 2018

princejha95 mentioned this issue May 3, 2018

Add a list_scorers function to sklearn.metrics #11060

Closed

princejha95 mentioned this issue May 3, 2018

Simplified error message in get_scorer() function in sklearn.metrics.scorer.py file #11062

Closed

amueller removed the help wanted label Jun 1, 2018

jnothman closed this as completed Jun 27, 2018

qinhanmin2014 mentioned this issue Aug 3, 2018

[MRG] ENH Simplify error message of get_scorer #11738

Merged

jnothman mentioned this issue Sep 9, 2019

[MRG] API Replace scorer brier_score_loss with neg_brier_score #14898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a list_scorers function to sklearn.metrics #10712

Add a list_scorers function to sklearn.metrics #10712

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add a list_scorers function to sklearn.metrics #10712

Add a list_scorers function to sklearn.metrics #10712

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!