Article

Using asymmetric distributions to improve text classifier probability estimates

Author:

Paul N. BennettAuthors Info & Claims

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Pages 111 - 118

https://doi.org/10.1145/860435.860457

Published: 28 July 2003 Publication History

Abstract

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.

References

[1]

P. N. Bennett. Assessing the calibration of naive bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon, School of Computer Science, 2000.

[2]

P. N. Bennett. Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.

[3]

H. Bourlard and N. Morgan. A continuous speech recognition system embedding mlp into hmm. In NIPS '89, 1989.

[4]

G. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.

[5]

M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32:12--22, 1983.

[6]

M. H. DeGroot and S. E. Fienberg. Comparing probability forecasters: Basic binary concepts and multivariate extensions. In P. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques. Elsevier Science Publishers B.V., 1986.

[7]

P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple bayesian classifier. In ICML '96, 1996.

[8]

R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, Inc., 2001.

Digital Library

[9]

S. T. Dumais and H. Chen. Hierarchical classification of web content. In SIGIR '00, 2000.

Digital Library

[10]

S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, 1998.

Digital Library

[11]

Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296, 1999.

Digital Library

[12]

I. Good. Rational decisions. Journal of the Royal Statistical Society, Series B, 1952.

[13]

T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML '98, 1998.

Digital Library

[14]

S. Kotz, T. J. Kozubowski, and K. Podgorski. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, 2001.

[15]

D. D. Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum, 29(2):13--19, Fall 1995.

Digital Library

[16]

D. D. Lewis. Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources/ testcollections/reuters21578, January 1997.

[17]

D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94, 1994.

Digital Library

[18]

D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training algorithms for linear text classifiers. In SIGIR '96, 1996.

Digital Library

[19]

D. Lindley, A. Tversky, and R. Brown. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, 1979.

[20]

R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In SIGIR '01, 2001.

Digital Library

[21]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI '98, Workshop on Learning for Text Categorization, 1998.

[22]

J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.

[23]

M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In IJCAI '01, 2001.

Digital Library

[24]

R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association, 1969.

[25]

Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR '99, 1999.

Digital Library

[26]

B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In ICML '01, 2001.

Digital Library

[27]

B. Zadrozny and C. Elkan. Reducing multiclass to binary by coupling probability estimates. In KDD '02, 2002.

Cited By

Huang LZhao JZhu BChen HBroucke S(2020)An Experimental Investigation of Calibration Techniques for Imbalanced DataIEEE Access10.1109/ACCESS.2020.30081508(127343-127352)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3008150
Böken B(2020)On the appropriateness of platt scaling in classifier calibrationInformation Systems10.1016/j.is.2020.101641(101641)Online publication date: Sep-2020
https://doi.org/10.1016/j.is.2020.101641
Elguebaly TBouguila N(2015)Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture modelsImage and Vision Computing10.1016/j.imavis.2014.10.01134:C(27-41)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1016/j.imavis.2014.10.011
Show More Cited By

Index Terms

Using asymmetric distributions to improve text classifier probability estimates
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Transforming classifier scores into accurate multiclass probability estimates
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

July 2003

490 pages

ISBN:1581136463

DOI:10.1145/860435

General Chairs:
Charles Clarke
University of Waterloo, Canada
,
Gordon Cormack
University of Waterloo, Canada
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, Pittsburgh, PA
,
David Hawking
Australian National University, Australia
,
Alan Smeaton
Dublin City University, Ireland

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR03

Sponsor:

SIGIR

SIGIR03: The 26th ACM/SIGIR International Symposium on Information Retrieval

July 28 - August 1, 2003

Toronto, Canada

Acceptance Rates

SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
1,230
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang LZhao JZhu BChen HBroucke S(2020)An Experimental Investigation of Calibration Techniques for Imbalanced DataIEEE Access10.1109/ACCESS.2020.30081508(127343-127352)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3008150
Böken B(2020)On the appropriateness of platt scaling in classifier calibrationInformation Systems10.1016/j.is.2020.101641(101641)Online publication date: Sep-2020
https://doi.org/10.1016/j.is.2020.101641
Elguebaly TBouguila N(2015)Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture modelsImage and Vision Computing10.1016/j.imavis.2014.10.01134:C(27-41)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1016/j.imavis.2014.10.011
Yang SKolcz ASchlaikjer AGupta PMacskassy SPerlich CLeskovec JWang WGhani R(2014)Large-scale high-precision topic modeling on twitterProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623336(1907-1916)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2623330.2623336
Chen SHuang J(2014)Rates of convergence of extreme for asymmetric normal distributionStatistics & Probability Letters10.1016/j.spl.2013.10.00384(158-168)Online publication date: Jan-2014
https://doi.org/10.1016/j.spl.2013.10.003
Xiong CZhang L(2013)A Descriptive Bayesian Approach to Modeling and Calibrating Drivers' En Route Diversion BehaviorIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2013.227097414:4(1817-1824)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1109/TITS.2013.2270974
Mitkov REvans ROrăsan CDornescu IRios M(2012)Coreference Resolution: To What Extent Does It Help NLP Applications?Text, Speech and Dialogue10.1007/978-3-642-32790-2_2(16-27)Online publication date: 2012
https://doi.org/10.1007/978-3-642-32790-2_2
Mitchell HMitchell H(2012)Bayesian Decision TheoryData Fusion: Concepts and Ideas10.1007/978-3-642-27222-6_13(273-293)Online publication date: 10-Feb-2012
https://doi.org/10.1007/978-3-642-27222-6_13
Klement WFlach PJapkowicz NMatwin S(2011)Smooth receiver operating characteristics (smROC) curvesProceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II10.5555/2034117.2034131(193-208)Online publication date: 5-Sep-2011
https://dl.acm.org/doi/10.5555/2034117.2034131
Dai KKanoulas EPavlu VAslam J(2011)Variational bayes for modeling score distributionsInformation Retrieval10.1007/s10791-010-9156-214:1(47-67)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1007/s10791-010-9156-2
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents