Abstract
We propose a generic multiple classifier system based solely on pairwise classifiers to classify web pages. Web page classification is getting huge attention now because of its use in enhancing the accuracy of search engines and in summarizing web content for small-screen handheld devices. We have used a Support Vector Machine (SVM) as our core pair-wise classifier. The proposed system has produced very encouraging results on the problem web page classification. The proposed solution is totally generic and should be applicable in solving a wide range of multiple class pattern recognition problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Apte and F. Damerau. Automated learning of decision rules for text categorization. ACM TOES 12(2):233–251, 1994.
N. Soonthornphisaj and B. Kijsirikul. Proc. National Computer Science and Engineering Conf., Thailand, 2000.
H. Yu, K. Chang and J. Han. Heterogeneous learner for web page classification, IEEE Int. Conf. on Data Mining (ICDM), pages 538–545, 2002.
H. Yu, J. Han and K. Chang. Positive example based learning for web page classification using SVM. Proc. ACM SIGKDD, 2002.
W. Wong and A. Fu. Incremental document clustering for web page classification. IEEE Int. Conf. on Information Society in the 21st Century: Emerging Technologies and New Challenges (ISO), 2000.
D. Mladenic. Turning Yahoo! into an automatic web-page classifier. Proc. of the 13th European Conf. on Artificial Intelligence (ECAI’98), pages 473–474, 1998.
M. Mlandenić, M. Diligenti, M. Gori, M. Maggini and V. Milutinovic. Web page classification using special information. Workshop su NLP e Web: la sfida della multimodalita tra approcci simbolici e apprtoacci ststistici, Bulgaria, 2002.
M. Mlandenić, M. Diligenti, M. Gori, M. Maggini and V. Milutinovic. Web page classification using visual layout analysis. Proc. IEEE Int. Conf. on Data Mining (ICDM), 2002.
E. Glover, K. Tsioutsiouliklis, S. Lawrence, D. Pennock and G. Flake. Using web structure for classifying and describing web pages. Proc. 11th WWW Conf., 2002.
A. Sirvatham and K. Kumar. Web page classification based on document structure. IEEE Indian Council National Student Paper Contest, 2001.
O. Kwon, J. Lee, Web page classification based on k-nearest neighbor approach. 15th Int. Workshop on Information Retrieval with Asian Languages (IRAL), 2000.
G. Attardi; A. Gulli; F. Sebastiani. Automatic Web page categorization by link and context analysis. THAI-ETIS European Symposium on Telematics, Hypermedia and Artificial Intelligence, pages 1–15, 1999.
X. Peng, B. Choi. Automatic web page classification in a dynamic and hierarchical way. IEEE Int. Conf. on Data Mining, 2002.
V. Loia and P. Luongo. An evolutionary approach to automatic web page categorization and updating. Int. Conf. on Web Intelligence, pages 292–302, 2001.
M. Tsukada, T. Washio and H. Motoda: Automatic web-page classification by using machine learning methods. Int. Conf. on Web Intelligence, pages 303–313, 2001.
A. F. R. Rahman and M. C. Fairhurst. Selective partition algorithm for finding regions of maximum pair-wise dissimilarity among statistical class models. Pattern Recognition Letters, 18(7):605–611, 1997.
A. F. R. Rahman and M. C. Fairhurst, “A novel pair-wise recognition scheme for handwritten characters in the frame-work of a multi-expert configuration”. Lecture Notes in Computer Science: 1311, A. Del Bimbo (Ed.), pages 624–631, 1997.
P. Argentiero, R. Chin, and P. Beaudet. An automated approach to the design of decision tree classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 4(1):51–57, 1982.
K. Anisimovich, V. Rybkin, A. Shamis, and V. Tereshchenko. Using combination of structural, feature and raster classifiers for recognition of handprinted characters. In Proc. 4th Int. Conf. on Document Analysis and Recognition, ICDAR97, vol. 2, pages 881–885, 1997.
P. Jonghyun, C. Sung-Bae, L. Kwanyong, and L. Yillbyung. Multiple recognizers system using two-stage combination. In Proc. of the 13th Int. Conf. on Pattern Recognition, pages 581–585, 1996.
J. Zhou, Q. Gan, and C. Y. Suen. A high performance hand-printed numeral recognition system with verification module. In Proc. 4th Int. Conf. on Document Analysis and Recognition, ICDAR97, vol. 1, pages 293–297, 1997.
M. C. Fairhurst and A. F. R. Rahman. A Generalised approach to the recognition of structurally similar handwritten characters. Int. Jour. of IEE Proc. on Vision, Image and Signal Processing, 144(1), pp. 15–22, 1997.
C. H. Tung and H. J. Lee. 2-stage character recognition by detection and correction of erroneously-identified characters. In Proc. of the Second Int. Conf. on Document Analysis and Recognition, pages 834–837, 1993.
F. Wang, L. Vuurpijl and L. Schomaker. Support vector machines for the classification of western handwriting capitals. Proc. IWFHR 2000, pages 167–176.
L. Vuurpijl, and L. Schomaker. Two-stage character classification: A combined approach of clustering and support vector classifiers. Proc. IWFHR 2000, pages 423–432.
F Schwenker and G. Palm. Tree structured support vector machines for multi-class pattern recognition. In Proc. MCS 2001, pages 409–417.
D. S. Frossyniotis and A. Stafylopatis. A multi-SVM classification system. Proc. MCS 2001, pages 198–207.
B. Scholkopf, S. T. Dumais, E. Osuna and J. Platt. Support Vector Machine. In IEEE Intelligent Systems Magazine, Trends and Controversies, Marti Hearst, ed., 13(4), pages 18–28, 1998.
V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
T. Joachims. In Making large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.
A. F R. Rahman, Y. Tarnikova and H. Alam. Exploring a Hybrid of Support Vector Machines (SVMs) and a Heuristic Based System in Classifying Web Pages. Document Recognition and Retrieval X, 15th Annual IS&S/SPIE Symposium, pages 120–127, 2003.
M. Sinka and D. Corne. A large benchmark dataset for web document clustering. Int. Conf. on Hybrid Intelligent Systems (HIS’02), 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alam, H., Rahman, F., Tarnikova, Y. (2003). Solving Problems Two at a Time: Classification of Web Pages Using a Generic Pair-Wise Multiple Classifier System. In: Windeatt, T., Roli, F. (eds) Multiple Classifier Systems. MCS 2003. Lecture Notes in Computer Science, vol 2709. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44938-8_39
Download citation
DOI: https://doi.org/10.1007/3-540-44938-8_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40369-2
Online ISBN: 978-3-540-44938-6
eBook Packages: Springer Book Archive