research-article

Information theoretic regularization for semi-supervised boosting

Authors:

Chi-Hoon LeeAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1017 - 1026

https://doi.org/10.1145/1557019.1557129

Published: 28 June 2009 Publication History

Abstract

We present novel semi-supervised boosting algorithms that incrementally build linear combinations of weak classifiers through generic functional gradient descent using both labeled and unlabeled training data. Our approach is based on extending information regularization framework to boosting, bearing loss functions that combine log loss on labeled data with the information-theoretic measures to encode unlabeled data. Even though the information-theoretic regularization terms make the optimization non-convex, we propose simple sequential gradient descent optimization algorithms, and obtain impressively improved results on synthetic, benchmark and real world tasks over supervised boosting algorithms which use the labeled data alone and a state-of-the-art semi-supervised boosting algorithm.

Supplementary Material

JPG File (p1017-zheng.jpg)

Download
9.77 KB

MP4 File (p1017-zheng.mp4)

Download
69.50 MB

References

[1]

K. Benett, A. Demiriz and R. Maclin. Exploiting unlabeled data in ensemble methods. The Eighth International Conference on Knowledge Discovery and Data Mining, 289--296, 2002.

Digital Library

[2]

D. Bertsekas. Nonlinear Programming, 2nd Edition, Athena Scientific, 1999.

[3]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. The Workshop on Computational Learning Theory, 92--100, 1998.

Digital Library

[4]

S. Boyd and L. Vandenberghe. Convex Optimization, Cambridge University Press, 2004.

Digital Library

[5]

L. Breiman. Prediction games and arcing classifiers. Neural Computation, 11:1493--1517, 1999.

Digital Library

[6]

V. Castelli and T. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. on Information Theory, 42(6):2102--2117, 1996.

[7]

G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14:315--332, 1992.

Digital Library

[8]

O. Chapelle, B. Scholköpf and A. Zien. Semi-Supervised Learning, MIT Press, 2006.

Digital Library

[9]

K. Chen and S. Wang. Regularized boost for semi-supervised learning. Advances in Neural Information Processing Systems 20, 2007.

[10]

I. Cohen and F. Cozman. Risks of semi-supervised learning. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien,55--70, MIT Press, 2006.

[11]

M. Collins, R. Schapire and Y. Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1-3):253--285, 2002.

Digital Library

[12]

A. Corduneanu and T. Jaakkola. Data dependent regularization. Semi-Supervised Learning, O. Chapelle, B. Scholköpf and A. Zien, 163--182, MIT Press, 2006.

[13]

T. Cover and J. Thomas. Elements of Information Theory, John Wiley&Sons, 1991.

Digital Library

[14]

F. d'Alché-Buc, Y. Grandvalet and C. Ambroise. Semi-supervised marginBoost. Advances in Neural Information Processing Systems 14, 553--560, 2002.

[15]

S. Della Pietra, V. Della Pietra and J. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393, 1997.

Digital Library

[16]

Y. Freund and R. Schapire. Experiments with a new boosting algorithm. The Thirteenth International Conference on Machine Learning, 148--156, 1996.

[17]

Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.

Digital Library

[18]

J. Friedman, T.Hastie and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337--407, 2000.

[19]

Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems, 17:529--536, 2004.

[20]

G. Haffari, Y. Wang, S. Wang, G. Mori and F. Jiao. Boosting with incomplete information. The Twenty-Fifth International Conference on Machine Learning, 368--375, 2008.

Digital Library

[21]

T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer, 2009.

[22]

J. Janodet, R. Nock, M. Sebban and H. Suchier. Boosting grammatical inference with confidence oracles. The Twenty-First International Conference on Machine Learning, 54--61, 2004.

Digital Library

[23]

F. Jiao, S. Wang, C. Lee, R. Greiner and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. The Joint 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 209--216, 2006.

Digital Library

[24]

G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. Advances in Neural Information Processing Systems 14, 447--454, 2002.

[25]

C. Lee, S. Wang, F. Jiao, D. Schuurmans and R. Greiner. Learning to model spatial dependency: Semi-supervised discriminative random fields. Advances in Neural Information Processing, 19, 793--800, 2007.

[26]

L. Mason, J. Baxter, P. Bartlett and M. Frean. Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholköpf and D. Schuurmans, editors, 221--246, MIT Press, 2000.

[27]

K. Nigam, A. McCallum, S. Thrun and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning. 39(2/3):135--167, 2000.

Digital Library

[28]

S. Roberts, R. Everson and I. Rezek. Maximum certainty data partitioning. Pattern Recognition, 33(5):833--839, 2000.

[29]

R. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990.

Digital Library

[30]

H. Valizadegan, R. Jin and A. Jain. Semi-supervised boosting for multi-class classification. The European Conference on Machine Learning and Knowledge Discovery in Databases, 522--537, 2008.

[31]

Y. Wang, G. Haffari, S. Wang and G. Mori. Rate distortion based semi-supervised discriminative learning. Technical Report, 2009.

[32]

D. Zhou, O. Bousquet, T. Navin Lal, J. Weston and B. Schölkopf. Learning with local and global consistency. Advances in Neural Information Processing Systems, 16:321--328, 2004.

Digital Library

[33]

J. Zhu, S. Rosset, H. Zhou and T. Hastie. Multiclass AdaBoost. Technical Report, 2005.

[34]

X. Zhu, Z. Ghahramani and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning, 912--919, 2003.

Cited By

Anagnostopoulos CTriantafillou P(2018)Large-scale predictive modeling and analytics through regression queries in data management systemsInternational Journal of Data Science and Analytics10.1007/s41060-018-0163-5Online publication date: 27-Dec-2018
https://doi.org/10.1007/s41060-018-0163-5
Soares RChen HYao X(2017)A Cluster-Based Semisupervised Ensemble for Multiclass ClassificationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2017.27432191:6(408-420)Online publication date: Dec-2017
https://doi.org/10.1109/TETCI.2017.2743219
Zhai SXia TLi ZWang S(2015)A direct boosting approach for semi-supervised classificationProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832810(4025-4032)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832747.2832810
Show More Cited By

Index Terms

Information theoretic regularization for semi-supervised boosting
1. Computing methodologies
  1. Machine learning

Recommendations

Boosting for multiclass semi-supervised learning

We present an algorithm for multiclass semi-supervised learning, which is learning from a limited amount of labeled data and plenty of unlabeled data. Existing semi-supervised learning algorithms use approaches such as one-versus-all to convert the ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Semi-supervised boosting for multi-class classification
ECMLPKDD'08: Proceedings of the 2008th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Most semi-supervised learning algorithms have been designed for binary classification, and are extended to multi-class classification by approaches such as one-against-the-rest. The main shortcoming of these approaches is that they are unable to exploit ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
670
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Anagnostopoulos CTriantafillou P(2018)Large-scale predictive modeling and analytics through regression queries in data management systemsInternational Journal of Data Science and Analytics10.1007/s41060-018-0163-5Online publication date: 27-Dec-2018
https://doi.org/10.1007/s41060-018-0163-5
Soares RChen HYao X(2017)A Cluster-Based Semisupervised Ensemble for Multiclass ClassificationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2017.27432191:6(408-420)Online publication date: Dec-2017
https://doi.org/10.1109/TETCI.2017.2743219
Zhai SXia TLi ZWang S(2015)A direct boosting approach for semi-supervised classificationProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832810(4025-4032)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832747.2832810
Fei HHuan J(2014)Structured Sparse Boosting for Graph ClassificationACM Transactions on Knowledge Discovery from Data10.1145/26293289:1(1-22)Online publication date: 25-Aug-2014
https://dl.acm.org/doi/10.1145/2629328
Chou CChang C(2014)Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection MethodsInformation Retrieval Technology10.1007/978-3-319-12844-3_21(244-255)Online publication date: 2014
https://doi.org/10.1007/978-3-319-12844-3_21
Zhai SXia TTan MWang S(2013)A robust semi-supervised boosting method using linear programming2013 IEEE Global Conference on Signal and Information Processing10.1109/GlobalSIP.2013.6737086(1101-1104)Online publication date: Dec-2013
https://doi.org/10.1109/GlobalSIP.2013.6737086
Soares RHuanhuan Chen Xin Yao (2012)Semisupervised Classification With Cluster RegularizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2012.221448823:11(1779-1792)Online publication date: Nov-2012
https://doi.org/10.1109/TNNLS.2012.2214488
Kawano SMisumi TKonishi S(2012)Semi-Supervised Logistic Discrimination Via Graph-Based RegularizationNeural Processing Letters10.1007/s11063-012-9231-336:3(203-216)Online publication date: 29-May-2012
https://doi.org/10.1007/s11063-012-9231-3
Ferreira AFigueiredo M(2012)Boosting Algorithms: A Review of Methods, Theory, and ApplicationsEnsemble Machine Learning10.1007/978-1-4419-9326-7_2(35-85)Online publication date: 19-Jan-2012
https://doi.org/10.1007/978-1-4419-9326-7_2
Fei HHuan JRao BKrishnapuram BTomkins AYang Q(2010)Boosting with structure information in the functional spaceProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1835804.1835886(643-652)Online publication date: 25-Jul-2010
https://dl.acm.org/doi/10.1145/1835804.1835886
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents