Binary classifier calibration using an ensemble of piecewise linear regression models

846 Accesses
3 Altmetric
Explore all metrics

Abstract

In this paper, we present a new nonparametric calibration method called ensemble of near-isotonic regression (ENIR). The method can be considered as an extension of BBQ (Naeini et al., in: Proceedings of twenty-ninth AAAI conference on artificial intelligence, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data, we evaluated ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large-scale datasets, as it is $O(N \log N)$ time, where N is the number of samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Article Open access 16 May 2023

Supervised Learning: Classification and Regression

Notes

Note that the running time for the test instance can be reduced to O(1) in any post-processing calibration model by using a simple caching technique that reduces calibration precision in order to decrease calibration time [27].
For classifiers that output scores that are not in the unit interval (e.g., SVM), we use a simple sigmoid transformation $f(x) = \frac{1}{1 + \exp (-x)}$ to transform the scores into the unit interval.
Note that we exclude the highly overfitted model that corresponds to $\lambda = 0$ from the set of models in ENIR.
Note that, as it is recommended in [35], we use the expected degree of freedom of the nearly isotonic regression models, which is equivalent to the number of bins, as the number of parameters in the BIC scoring function.
Note that there could be more than one bin achieving the minimum in Eq. 9, so they should be all merged with the bins that are located next to them.
Note that, to be more precise, RMSE evaluates both calibration and refinement of the predicted probabilities. Refinement accounts for the usefulness of the probabilities by favoring those that are either close to 0 or 1 [6, 7].
The datasets used were as follows: spect, adult, breast, pageblocks, pendigits, ad, mamography, satimage, australian, code rna, colon cancer, covtype, letter unbalanced, letter balanced, diabetes, duke, fourclass, german numer, gisette scale, heart, ijcnn1, ionosphere scale, liver disorders, mushrooms, sonar scale, splice, svmguide1, svmguide3, coil2000, balance, breast cancer, leu, w1a, thyroid sick, scene, uscrime, solar, car34, car4 , protein homology.
It is possible to generalize ELiTE to obtain piecewise polynomial calibration functions; however, we have noticed an inferior results when using piecewise polynomial degrees higher than 1, and we hypothesize it is because of the overfitting to the training data.
Note that an element of $\mathbf {v}$ is zero if and only if there is no change in the slope between two successively predicted points.
An R implementation of ENIR and ELiTE can be found at the following address: https://github.com/pakdaman/calibration.git.

References

Bahnsen AC, Stojanovic A, Aouada D, Ottersten B (2014) Improving credit card fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM international conference on data mining
Barlow RE, Bartholomew DJ, Bremner J, Brunk HD (1972) Statistical inference under order restrictions: theory and application of isotonic regression. Wiley, New York
MATH Google Scholar
Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585
Article Google Scholar
Cavanaugh JE (1997) Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat Probab Lett 33(2):201–208
Article MathSciNet MATH Google Scholar
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers. In: Proceedings of the European conference on principles of data mining and knowledge discovery. Springer, pp 125–136
DeGroot M, Fienberg S (1983) The comparison and evaluation of forecasters. Statistician 32:12–22
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 601–610
Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106
Article Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic press, London
MATH Google Scholar
Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 907–914
Hashemi HB, Yazdani N, Shakery A, Naeini MP (2010) Application of ensemble models in web ranking. In: Proceedings of 5th international symposium on telecommunications (IST). IEEE, pp 726–731
Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
MATH Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–401
Article MathSciNet MATH Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet MATH Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595
Article MATH Google Scholar
Jiang L, Zhang H, Su J (2005) Learning k-nearest neighbor naïve Bayes for ranking. In: Proceedings of the advanced data mining and applications. Springer, pp 175–185
Jiang X, Osl M, Kim J, Ohno-Machado L (2012) Calibrating predictive model estimates to support personalized medicine. J Am Med Inform Assoc 19(2):263–274
Article Google Scholar
Kim S-J, Koh K, Boyd S, Gorinevsky D (2009) $\ell _1$ trend filtering. SIAM Rev 51(2):339–360
Article MathSciNet MATH Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 Nov 2015
Menon A, Jiang X, Vembu S, Elkan C, Ohno-Machado L (2012) Predicting accurate probabilities with a ranking loss. In: Proceedings of the international conference on machine learning, pp 703–710
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the international conference on machine learning, pp 625–632
Naeini MP, Cooper GF (2016a) Binary classifier calibration using an ensemble of linear trend estimation. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM, pp 261–269
Naeini MP, Cooper GF (2016b) Binary classifier calibration using an ensemble of near isotonic regression models. In: 2016 IEEE 16th International Conference on data mining (ICDM). IEEE, pp 360–369
Naeini MP, Cooper GF, Hauskrecht M (2015a) Binary classifier calibration using a Bayesian non-parametric approach. In: Proceedings of the SIAM data mining (SDM) conference
Naeini MP, Cooper G, Hauskrecht M (2015b) Obtaining well calibrated probabilities using Bayesian binning. In: Proceedings of twenty-ninth AAAI conference on artificial intelligence
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Google Scholar
Ramdas A, Tibshirani RJ (2016) Fast and flexible ADMM algorithms for trend filtering. J Comput Graph Stat 25(3):839–858
Article MathSciNet Google Scholar
Robnik-Šikonja M, Kononenko I (2008) Explaining classifications for individual instances. IEEE Trans Knowl Data Eng 20(5):589–600
Article Google Scholar
Russell S, Norvig P (2010) Artificial intelligence: a modern approach. Prentice hall, Englewood Cliffs
MATH Google Scholar
Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
Takahashi K, Takamura H, Okumura M (2009) Direct estimation of class membership probabilities for multiclass classification using multiple scores. Knowl Inf Syst 19(2):185–210
Article Google Scholar
Tibshirani RJ, Hoefling H, Tibshirani R (2011) Nearly-isotonic regression. Technometrics 53(1):54–61
Article MathSciNet Google Scholar
Wallace BC, Dahabreh IJ (2014) Improving class probability estimates for imbalanced data. Knowl Inf Syst 41(1):33–52
Article Google Scholar
Whalen S, Pandey G (2013) A comparative analysis of ensemble classifiers: case studies in genomics. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 807–816
Zadrozny B, Elkan C (2001a) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 204–213
Zadrozny B, Elkan C (2001b) Obtaining calibrated probability estimates from decision trees and naïve Bayesian classifiers. In: Proceedings of the international conference on machine learning, pp 609–616
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 694–699
Zhang H, Su J (2004) Naïve Bayesian classifiers for ranking. In: Proceedings of the European conference on machine learning (ECML). Springer, pp 501–512
Zhong LW, Kwok JT (2013) Accurate probability calibration for multiple classifiers. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1939–1945

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. Research reported in this publication was supported by Grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. It was also supported in part by NIH Grants R01GM088224 and R01LM012095. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This research was also supported by Grant #4100070287 from the Pennsylvania Department of Health. The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions.

Author information

Authors and Affiliations

Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Mahdi Pakdaman Naeini
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Mahdi Pakdaman Naeini
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Gregory F. Cooper

Authors

Mahdi Pakdaman Naeini
View author publications
You can also search for this author in PubMed Google Scholar
Gregory F. Cooper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahdi Pakdaman Naeini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pakdaman Naeini, M., Cooper, G.F. Binary classifier calibration using an ensemble of piecewise linear regression models. Knowl Inf Syst 54, 151–170 (2018). https://doi.org/10.1007/s10115-017-1133-2

Download citation

Received: 14 March 2017
Accepted: 08 November 2017
Published: 17 November 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10115-017-1133-2

Binary classifier calibration using an ensemble of piecewise linear regression models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Supervised Learning: Classification and Regression

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Binary classifier calibration using an ensemble of piecewise linear regression models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Supervised Learning: Classification and Regression

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now