Abstract
In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.
Supported by the POSC/EIA/58367/2004/Site-o-Matic Project (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual de Unidades de I & D.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yate, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Bueno, D., David, A.A.: METIORE: A Personalized Information Retrieval System. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS, vol. 2109, p. 168. Springer, Heidelberg (2001)
Buntine, W., Perttu, S., Tirri, H.: Building and Maintaining Web Taxonomies. In: Proceedings of the XML Finland 2002 Conference, pp. 54–65 (2002)
Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference (1998)
Chakrabarti, S., Berg, M., Dom, B.: Focused crawling: a new approach to topic specific resource discovery. In: Proceedings of the 8th World Wide Web Conference (1999)
Chakrabarti, S.: Mining the web, Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)
Chen, C.C., Chen, M.C., Sun, Y.: PVA: A Self-Adaptive Personal View Agent System. In: Proceedings of the ACM SIGKDD 2001 Conference (2001)
Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000a)
Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of the 23rd ACM SIGIR Conference, pp. 256–263 (2000)
Etzioni, O.: The World-Wide-Web: quagmire or gold mine? Communications of the ACM 39(11), 65–68 (1996)
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing Web document collections based on link semantics. The VLDB Journal 12, 320–332 (2003)
Joachims, T.: A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 1997 International Conference on Machine Learning (1997)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Research Report of the unit no. VIII(AI), Computer Science Department of the University of Dortmund (1998)
Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: IJCAI 1999 Workshop on Text Mining: Foundation, Techniques and Applications, pp. 52–63 (1999)
Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–13 (2000)
Levene, M., Poulovassilis, A. (eds.): Web Dynamics: Adapting to Change in Content, Size, Topology and Use. Springer, Heidelberg (2004)
Lieberman, H.: Letizia: an Agent That Assists Web Browsing. In: Proceedings of the International Joint Conference on AI (1995)
Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: Proceedings of the World Wide Web 2003 Conference (2003)
Macskassy, S.A., Banerjee, A., Dovison, B.D., Hirsh, H.: Human Performance on Clustering Web Pages: a Preliminary Study. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (1998)
Mladenic, D.: Personal WebWatcher: design and implementation, Technical Report IJS-DP-7472, SI (1999)
Martins, B., Silva, M.J.: Language Identification in Web Pages. In: Document Engineering Track of the 20th ACM Symposium on Applied Computing (unpublished, 2002)
Mitchell, S., Mooney, M., Mason, J., Paynter, G.W., Ruscheinski, J., Kedzierski, A., Humphreys, K.: iVia Open Source Virtual Library System. D-Lib Magazine 9(1) (2003)
Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G.: Visualization of a Document Collection: The VIBE System. Information Processing & Management 29(1), 69–81 (1992)
Silva, M.J., Martins, B.: Web Information Retrieval with Result set Clustering. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902. Springer, Heidelberg (2003)
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems, 253–277 (1994)
Yang, Y., Pederson, J.: A Comparative Study of Feature Selection in Text Categorization. In: International Conference on Machine Learning (1997)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization, pp. 1–25. Kluwer Academic Publishers, Dordrecht (2002)
Zamir, O., Etzioni, O.: Grouper: A Dynamic clustering Interface to Web Search Results. In: Proceedings of the 1999 World Wide Web Conference (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Escudeiro, N.F., Jorge, A.M. (2006). Semi-automatic Creation and Maintenance of Web Resources with webTopic. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_6
Download citation
DOI: https://doi.org/10.1007/11908678_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47697-9
Online ISBN: 978-3-540-47698-6
eBook Packages: Computer ScienceComputer Science (R0)