Are Experts Well-Calibrated? An Equivalence-Based Hypothesis Test
<p>The power of the direct and equivalence tests to correctly identify <math display="inline"><semantics> <mrow> <mn>90</mn> <mo>%</mo> </mrow> </semantics></math> well-calibrated experts at <math display="inline"><semantics> <mrow> <mn>90</mn> <mo>%</mo> </mrow> </semantics></math> true level of coverage of elicited intervals.</p> "> Figure 2
<p>The probabilities of the direct and equivalence tests to identify the experts as 90% well-calibrated when true levels of coverage of elicited intervals that are less than 90%.</p> "> Figure 3
<p>The probabilities of the direct test to identify the experts as 90% well-calibrated when true levels of coverage of elicited intervals that are less than 90% for small number of elicited intervals.</p> "> Figure 4
<p>Size of the direct and equivalence tests in testing experts’ calibration on eliciting 90% credible intervals for small number of elicited intervals.</p> "> Figure 5
<p>The power of the direct and non-randomized equivalence tests to correctly identify <math display="inline"><semantics> <mrow> <mn>90</mn> <mo>%</mo> </mrow> </semantics></math> well-calibrated experts at <math display="inline"><semantics> <mrow> <mn>90</mn> <mo>%</mo> </mrow> </semantics></math> true level of coverage of elicited intervals.</p> "> Figure 6
<p>The probabilities of the direct and non-randomized equivalence tests to identify the experts as 90% well-calibrated when true levels of coverage of elicited intervals that are less than 90%.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Direct Comparison as a Hypothesis Test
2.2. Equivalence-Based Hypothesis Test
- experts have an underlying true level of calibration that is fixed for the given set of calibration questions, and
- exposure to calibration questions can be treated as independent and identically distributed experiments, so that the outcomes of the comparisons of the expert’s intervals against the known values are Bernoulli random variables.
2.3. Simulated Data
2.4. Real-Life Data
2.4.1. Four-Step Format Elicited Data
2.4.2. Three-Step Format Elicited Data
3. Results
3.1. Analysis of the Simulated Data
3.1.1. Power Analysis
3.1.2. Different True Levels of Coverage of Intervals
3.1.3. Improving on the Equivalence Test?
3.1.4. Test Properties Established through the Simulation Study
3.2. Analysis of the Real-Life Data
3.2.1. The 4-Step Elicited Data
3.2.2. The 3-Step Elicited Data
3.2.3. Implications for the Real-Life Examples
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Colson, A.R.; Cooke, R.M. Expert elicitation: Using the classical model to validate experts’ judgments. Rev. Environ. Econ. Policy 2018, 12, 113–132. [Google Scholar] [CrossRef] [Green Version]
- O’Hagan, A. Expert Knowledge Elicitation: Subjective but Scientific. Am. Stat. 2019, 73, 69–81. [Google Scholar] [CrossRef] [Green Version]
- Cooke, R. Experts in Uncertainty: Opinion and Subjective Probability in Science; Oxford University Press: Oxford, UK, 1991. [Google Scholar]
- Speirs-Bridge, A.; Fidler, F.; McBride, M.; Flander, L.; Cumming, G.; Burgman, M. Reducing overconfidence in the interval judgments of experts. Risk Anal. 2010, 30, 512–523. [Google Scholar] [CrossRef] [PubMed]
- O’Hagan, A.; Buck, C.; Daneshkhah, A.; Eiser, J.; Garthwaite, P.; Jenkinson, D.; Oakley, J.; Rakow, T. Uncertain Judgements: Eliciting Experts’ Probabilities; Wiley: London, UK, 2006. [Google Scholar]
- EFSA. Guidance on Expert Knowledge Elicitation in Food and Feed Safety Risk Assessment. Eur. Food Saf. Auth. J. 2014, 12, 3734. [Google Scholar]
- Hemming, V.; Hanea, A.M.; Walshe, T.; Burgman, M.A. Weighting and aggregating expert ecological judgments. Ecol. Appl. 2020, 30, e02075. [Google Scholar] [CrossRef] [PubMed]
- Juslin, P.; Winman, A.; Hansson, P. The naïve intuitive statistician: A naïve sampling model of intuitive confidence intervals. Psychol. Rev. 2007, 114, 678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Soll, J.B.; Klayman, J. Overconfidence in interval estimates. J. Exp. Psychol. Learn. Mem. Cogn. 2004, 30, 299. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Teigen, K.H.; Jørgensen, M. When 90% confidence intervals are 50% certain: On the credibility of credible intervals. Appl. Cogn. Psychol. Off. J. Soc. Appl. Res. Mem. Cogn. 2005, 19, 455–475. [Google Scholar] [CrossRef]
- McBride, M.F.; Fidler, F.; Burgman, M.A. Evaluating the accuracy and calibration of expert predictions under uncertainty: Predicting the outcomes of ecological research. Divers. Distrib. 2012, 18, 782–794. [Google Scholar] [CrossRef]
- Budescu, D.V.; Johnson, T.R. A model-based approach for the analysis of the calibration of probability judgments. Judgm. Decis. Mak. 2011, 6, 857–869. [Google Scholar]
- Wellek, S. Testing Statistical Hypotheses of Equivalence and Noninferiority; Chapman and Hall: London, UK; CRC: London, UK, 2010. [Google Scholar]
- Lakens, D.; Scheel, A.M.; Isager, P.M. Equivalence testing for psychological research: A tutorial. Adv. Methods Pract. Psychol. Sci. 2018, 1, 259–269. [Google Scholar] [CrossRef] [Green Version]
- Dharmarathne, H.A.S.G. Exploring the Statistical Aspects of Expert Elicited Experiments. Ph.D. Thesis, University of Melbourne, Parkville, VIC, Australia, 2020. [Google Scholar]
- Hemming, V.; Walshe, T.V.; Hanea, A.M.; Fidler, F.; Burgman, M.A. Eliciting improved quantitative judgements using the IDEA protocol: A case study in natural resource management. PLoS ONE 2018, 13, e0198468. [Google Scholar] [CrossRef] [PubMed]
- Hemming, V.; Burgman, M.A.; Hanea, A.M.; McBride, M.F.; Wintle, B.C. A practical guide to structured expert elicitation using the IDEA protocol. Methods Ecol. Evol. 2018, 9, 169–180. [Google Scholar] [CrossRef] [Green Version]
- Scourse, E.; Aspinall, W.; Chapman, N. Using expert elicitation to characterise long-term tectonic risks to radioactive waste repositories in Japan. J. Risk Res. 2015, 18, 364–377. [Google Scholar] [CrossRef]
Coverage Probability | Number of Elicited Intervals (n) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
80% | 10 | 20 | 30 | 40 | 50 | 80 | 100 | 150 | 200 | 250 |
90% | 10 | 20 | 30 | 40 | 50 | 80 | 100 | 150 | 200 | 250 |
Number of Elicited Intervals (n) | C1 | C2 |
---|---|---|
10 | 9 | 10 |
20 | 18 | 19 |
30 | 27 | 28 |
40 | 36 | 37 |
50 | 45 | 46 |
80 | 72 | 73 |
100 | 90 | 92 |
150 | 134 | 138 |
200 | 178 | 185 |
250 | 222 | 232 |
Expert ID | Num. Elicited Questions | Num. Intervals Covering the Truth |
---|---|---|
52b | 13 | 9 |
54h | 12 | 9 |
64i | 13 | 10 |
Expert ID | Num. Elicited Questions | Num. Intervals Covering the Truth |
---|---|---|
Exp16 | 16 | 14 |
Exp21 | 16 | 12 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dharmarathne, G.; Hanea, A.M.; Robinson, A. Are Experts Well-Calibrated? An Equivalence-Based Hypothesis Test. Entropy 2022, 24, 757. https://doi.org/10.3390/e24060757
Dharmarathne G, Hanea AM, Robinson A. Are Experts Well-Calibrated? An Equivalence-Based Hypothesis Test. Entropy. 2022; 24(6):757. https://doi.org/10.3390/e24060757
Chicago/Turabian StyleDharmarathne, Gayan, Anca M. Hanea, and Andrew Robinson. 2022. "Are Experts Well-Calibrated? An Equivalence-Based Hypothesis Test" Entropy 24, no. 6: 757. https://doi.org/10.3390/e24060757