[go: up one dir, main page]

skip to main content
10.1145/860435.860457acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Using asymmetric distributions to improve text classifier probability estimates

Published: 28 July 2003 Publication History

Abstract

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.

References

[1]
P. N. Bennett. Assessing the calibration of naive bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon, School of Computer Science, 2000.
[2]
P. N. Bennett. Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.
[3]
H. Bourlard and N. Morgan. A continuous speech recognition system embedding mlp into hmm. In NIPS '89, 1989.
[4]
G. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.
[5]
M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32:12--22, 1983.
[6]
M. H. DeGroot and S. E. Fienberg. Comparing probability forecasters: Basic binary concepts and multivariate extensions. In P. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques. Elsevier Science Publishers B.V., 1986.
[7]
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple bayesian classifier. In ICML '96, 1996.
[8]
R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, Inc., 2001.
[9]
S. T. Dumais and H. Chen. Hierarchical classification of web content. In SIGIR '00, 2000.
[10]
S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, 1998.
[11]
Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296, 1999.
[12]
I. Good. Rational decisions. Journal of the Royal Statistical Society, Series B, 1952.
[13]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML '98, 1998.
[14]
S. Kotz, T. J. Kozubowski, and K. Podgorski. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, 2001.
[15]
D. D. Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum, 29(2):13--19, Fall 1995.
[16]
D. D. Lewis. Reuters-21578, distribution 1.0. http://www.daviddlewis.com/resources/ testcollections/reuters21578, January 1997.
[17]
D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94, 1994.
[18]
D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training algorithms for linear text classifiers. In SIGIR '96, 1996.
[19]
D. Lindley, A. Tversky, and R. Brown. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, 1979.
[20]
R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In SIGIR '01, 2001.
[21]
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI '98, Workshop on Learning for Text Categorization, 1998.
[22]
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.
[23]
M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In IJCAI '01, 2001.
[24]
R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association, 1969.
[25]
Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR '99, 1999.
[26]
B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In ICML '01, 2001.
[27]
B. Zadrozny and C. Elkan. Reducing multiclass to binary by coupling probability estimates. In KDD '02, 2002.

Cited By

View all

Index Terms

  1. Using asymmetric distributions to improve text classifier probability estimates

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
      July 2003
      490 pages
      ISBN:1581136463
      DOI:10.1145/860435
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 July 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. active learning
      2. classifier combination
      3. cost-sensitive learning
      4. text classification

      Qualifiers

      • Article

      Conference

      SIGIR03
      Sponsor:

      Acceptance Rates

      SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)An Experimental Investigation of Calibration Techniques for Imbalanced DataIEEE Access10.1109/ACCESS.2020.30081508(127343-127352)Online publication date: 2020
      • (2020)On the appropriateness of platt scaling in classifier calibrationInformation Systems10.1016/j.is.2020.101641(101641)Online publication date: Sep-2020
      • (2015)Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture modelsImage and Vision Computing10.1016/j.imavis.2014.10.01134:C(27-41)Online publication date: 1-Feb-2015
      • (2014)Large-scale high-precision topic modeling on twitterProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623336(1907-1916)Online publication date: 24-Aug-2014
      • (2014)Rates of convergence of extreme for asymmetric normal distributionStatistics & Probability Letters10.1016/j.spl.2013.10.00384(158-168)Online publication date: Jan-2014
      • (2013)A Descriptive Bayesian Approach to Modeling and Calibrating Drivers' En Route Diversion BehaviorIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2013.227097414:4(1817-1824)Online publication date: 1-Dec-2013
      • (2012)Coreference Resolution: To What Extent Does It Help NLP Applications?Text, Speech and Dialogue10.1007/978-3-642-32790-2_2(16-27)Online publication date: 2012
      • (2012)Bayesian Decision TheoryData Fusion: Concepts and Ideas10.1007/978-3-642-27222-6_13(273-293)Online publication date: 10-Feb-2012
      • (2011)Smooth receiver operating characteristics (smROC) curvesProceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II10.5555/2034117.2034131(193-208)Online publication date: 5-Sep-2011
      • (2011)Variational bayes for modeling score distributionsInformation Retrieval10.1007/s10791-010-9156-214:1(47-67)Online publication date: 1-Feb-2011
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media