Function to get scorers for task

I would like to see a utility which would construct a set of applicable scorers for a particular task, returning a Mapping from string to callable scorer. It will be hard to design the API of this right the first time. [Maybe this should be initially developed outside this project and contributed to scikit-learn-contrib, but I think it reduces risk of mis-specifying scorers, so it's of benefit to this project.]

The user will be able to select a subset of the scorers, either with a dict comprehension or with some specialised methods or function parameters. Initially it wouldn't be efficient to run all these scorers, but hopefully we can do something to fix #10802 :|.

Let's take for instance a binary classification task. The function get_applicable_scorers(y, pos_label='yes') for binary y might produce something like:

{
    'accuracy': make_scorer(accuracy_score),
    'balanced_accuracy': make_scorer(balanced_accuracy_score),
    'matthews_corrcoef': make_scorer(matthews_corrcoef),
    'cohens_kappa': make_scorer(cohens_kappa),
    'precision': make_scorer(precision_score, pos_label='yes'),
    'recall': make_scorer(recall_score, pos_label='yes'),
    'f1': make_scorer(f1_score, pos_label='yes'),
    'f0.5': make_scorer(f1_score, pos_label='yes', beta=0.5),
    'f2': make_scorer(f1_score, pos_label='yes', beta=2),
    'specificity': ...,
    'miss_rate': ...,
    ...
    'roc_auc': make_scorer(roc_auc_score, needs_threshold=True),
    'average_precision': make_scorer(average_precision_score, needs_threshold=True),
    'neg_log_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
    'neg_brier_score_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
}

Doing the same for multiclass classification would pass labels as appropriate, and would optionally would get per-class binary metrics, as well as overall multiclass metrics.

I'm not sure how sample_weight fits in here, but ha! we still don't support weighted scoring in cross validation (#1574), so let's not worry about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions