8000 Add a list_scorers function to sklearn.metrics · Issue #10712 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Add a list_scorers function to sklearn.metrics #10712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Feb 26, 2018 · 22 comments
Closed

Add a list_scorers function to sklearn.metrics #10712

jnothman opened this issue Feb 26, 2018 · 22 comments
Labels
Easy Well-defined and straightforward way to resolve Enhancement good first issue Easy with clear instructions to resolve

Comments

@jnothman
Copy link
Member

The scoring parameter allows users to specify a scoring method by name. Currently a list of names is available by getting it wrong:

>>> from sklearn.metrics import get_scorer
>>> get_scorer('rubbish')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joel/repos/scikit-learn/sklearn/metrics/scorer.py", line 239, in get_scorer
    % (scoring, sorted(scorers)))
ValueError: 'rubbish' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

I think this error message, and maintaining it, is getting a bit absurd. Instead we should have a function sklearn.metrics.list_scorers implemented in sklearn/metrics/scorer.py and the error message should say "Use sklearn.metrics.list_scorers to get valid strings.". Perhaps we would eventually have list_scorers allow users to filter scorers by task type (binary classification, multiclass classification, multilabel classification, regression, etc.), or even to provide metadata about each scorer (a description, for instance), but initially we should just be able to list them.

@danielleshwed
Copy link
Contributor

I'd like to try this :)

@jnothman
Copy link
Member Author
jnothman commented Feb 27, 2018 via email

@qinhanmin2014
Copy link
Member

@jnothman It seems that we already have a simple way

from sklearn.metrics import SCORERS
print(SCORERS.keys())

to list all the scorers (which is used to generate the above error message)?
How about just improve the error message "Use sklearn.metrics.SCORERS.keys() to get valid strings."

@jnothman
Copy link
Member Author
jnothman commented Feb 27, 2018 via email

@jnothman
Copy link
Member Author

sorted(SCORERS.keys()) might be better advice.

But I still think there's a usability problem in the length of the list returned and its heterogeneity.

Let's make these scorers usable (I think @amueller will agree):

Let's define:

def list_scorers(task=None):
    """Get a list of named scorers with filtering applied

    Parameters
    ----------
    task : str, optional
        Regression tasks: 'regression', 'multiple-regression'
        Classification tasks: 'binary', 'multiclass', 'multilabel', 'multiple-classification'
        Clustering tasks: 'clustering', 'clustering-without-truth'
        Etc.

    Returns
    -------
    list of str
        In lexicographic order
    """

An alternative to this is to provide some kind of structured data that can be interpreted as a dataframe and let the user do their own filtering:

[{'name': 'f1', 'binary': True, 'multiclass': False, 'regression': False},
 ...]

Personally, I think list_scorers is more directly useful and should be available.

Either way, we would not just be storing SCORERS, but would have a _register_scorer(scorer, tasks=[...]) function (or perhaps a public one!) that catalogues scorer metadata.

Notes:

  • We may also need to provide a way to say that the estimator needs to have predict_proba, or either predict_proba or decision_function, although this duplicates the arguments to make_scorer
  • We could imagine validating, in cross_validate for instance, that the scorer metadata matched the estimator and the target type. This may create backwards compatibility issues, so should be avoided for now. For instance, our metadata might be a bit dogmatic: we might not want 'multiclass' as a task for f1_micro, since it is identical to accuracy, but existing code may still use it.

@qinhanmin2014
Copy link
Member

I'm +0 for such a function.
(1) We have sorted(SCORERS.keys()) to list all the scorers
(2) We have a table in the user guide to show different kinds of scorers (regression, classification, clustering) and corresponding metrics.
So currently, according to my limited knowledge, I can't fully understand the usage of list_scorers.

@jnothman
Copy link
Member Author

You might be right, but:

  • It makes it easier for users (and automated tools) to find metrics appropriate for their task. Currently the docs list a mix of classification metrics without distinguishing binary from multiclass etc. clearly.
  • Currently a bad choice (e.g. 'recall' for multiclass) will result in the cross validation crashing after. If user gets their scorer from a list for their task, they might recognise that their search will break before it gets as far as scoring. We could also provide a warning or error at validation time if the metric seems unsuitable.
  • It may make the docs easier to maintain
  • It makes it easier for frameworks built around scikit-learn to offer users appropriate choices without having to maintain their own scorer metadata
  • Making register_scorer public would allow users to use custom scorers by name, ma 8000 king the interface for scorer selection more consistent.

Opinion @amueller?

@mohamed-ali
Copy link
Contributor

@jnothman, if there is a consensus, and nobody is working on this, I'd like to take it.

@jnothman
Copy link
Member Author
jnothman commented Mar 1, 2018 via email

@jnothman
Copy link
Member Author
jnothman commented Mar 1, 2018

Also, you should check if @danielleshwed hopes to work on it

@mohamed-ali
Copy link
Contributor

I think this can be added at least to the testing functions, just like sklearn.utils.testing.all_estimators.

@amueller
Copy link
Member

not sold on this one. Seems harder to maintain. And in the end, even for other interfaces, the user needs to select, right? So it's more a matter of pruning down which are appropriate for which task to select from... Automatic selection of metrics is not really a thing, right? My notebooks (and book?) have SCORERS.keys(). And "recall" is also a bad choice for binary, right? So I'm not sure what makes a "good" choice.

@jnothman
Copy link
Member Author
jnothman commented May 1, 2018 via email

@amueller
Copy link
Member
amueller commented May 1, 2018

Only monitoring recall and not precision means you could just change the class_weight and get better, right?

@amueller
Copy link
Member
amueller commented May 1, 2018

maybe the docs should do SCORERS.keys() instead of the the error message, though.

@jnothman
Copy link
Member Author
jnothman commented May 1, 2018 via email

@qinhanmin2014
Copy link
Member

+1 to simplify the error message with sorted(SCORERS.keys()), marking as good first issue.

@qinhanmin2014 qinhanmin2014 added the good first issue Easy with clear instructions to resolve label May 2, 2018
@princejha95
Copy link

Can I try this if it's okay ?

@qinhanmin2014
Copy link
Member

@princejha95 Please go ahead :)

@princejha95
Copy link

raise ValueError('%r is not a valid scoring value. '

  •                         'Valid options are %s'
    
  •                         % (scoring, sorted(scorers)))
    
  •                         'For Valid options use sorted(SCORERS.keys())'
    
  •                         % (scoring))
    

By doing this, new error message is 'ValueError: 'rubbish' is not a valid scoring value. For Valid options use sorted(SCORERS.keys())'

@jnothman
Copy link
Member Author
jnothman commented May 3, 2018 via email

@princejha95
Copy link

I have changed the error message as suggested by @qinhanmin2014 above. Now if user gives a scoring value which is not present in SCORER dictionary, then the error message will suggest user that he can find valid scoring values from sorted(SCORERS.keys()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Enhancement good first issue Easy with clear instructions to resolve
Projects
None yet
6 participants
0