Add Expected Calibration Error

@gpleiss

🚀 Feature

A new metric computing the Expected Calibration Error (ECE) metric from Naeini et al 2015. Useful for determining if a classifier's softmax probability scores are well-calibrated and represent reasonable probabilities.

Motivation

I ran into this metric after seeing Guo et al 2017, which discussed how very large and deep networks are vulnerable to calibration issues (i.e. are systemically over- or under- confident), and suggest temperature scaling as a method for producing reasonably calibrated softmax outputs using a validation dataset. I've implemented a simple version of this metric using the old PyTorch Lightning Metric API, mostly based off of @gpleiss's PyTorch nn.Module implementation of the metric here.

Additional context

There was an interesting preprint discussing alternatives to ECE that also might be worth integrating - the main criticism being that ECE can be reductive in multi-class settings by not considering all probabilities produced by the model and only looking at the probability of the predicted class (i.e. the highest predicted probability). I'd have to think more about this, but it could also be worth adding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 Feature

Motivation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

🚀 Feature

Motivation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions