Classifier metrics (such as accuracy, sensitivity, specificity, precision...) are highly uncertain if they are calculated from a small sample size. Unfortunately, these point estimates are often considered to be exact. We present a Bayesian method to determine metric uncertainty. Our paper explains the underlying concepts and showcases that many published classifiers have surprisingly large metric uncertainties. This repository contains the implementation in Python.
The easiest way to calculate metric uncertainty is via our interactive, browser-based tool. The site may take a few minutes to load. It does not install any packages or execute any code on your machine, it needs to start the environment on the host. This causes the small delay. Please follow this link to the browser-based tool.
If you want to calculate metric uncertainties on a regular basis or even integrate the method into your workflow, feel free to copy this repository.
tutorial.ipynb
should give you an idea how to use the most important parts of the code.
To ensure that all dependencies work as intended, type pip install -r requirements.txt
.
- pymc3 (Gelman-Rubin diagnostics and tests)
- sympy (metric definition)
- Voila (turns my Jupyter Notebook into a standalone application)
- Binder (hosts the application)
@article{toetsch2021classifier,
title={Classifier uncertainty: evidence, potential impact, and probabilistic treatment},
author = {Tötsch, Niklas and Hoffmann, Daniel},
journal={PeerJ Computer Science},
volume={7},
pages={e398},
year={2021},
publisher={PeerJ Inc.}}
If you have questions or comments, please create an issue.