Abstract
In this paper we explore some of the opportunities and challenges for machine learning on the Semantic Web. The Semantic Web provides standardized formats for the representation of both data and ontological background knowledge. Semantic Web standards are used to describe meta data but also have great potential as a general data format for data communication and data integration. Within a broad range of possible applications machine learning will play an increasingly important role: Machine learning solutions have been developed to support the management of ontologies, for the semi-automatic annotation of unstructured data, and to integrate semantic information into web mining. Machine learning will increasingly be employed to analyze distributed data sources described in Semantic Web formats and to support approximate Semantic Web reasoning and querying. In this paper we discuss existing and future applications of machine learning on the Semantic Web with a strong focus on learning algorithms that are suitable for the relational character of the Semantic Web’s data structure. We discuss some of the particular aspects of learning that we expect will be of relevance for the Semantic Web such as scalability, missing and contradicting data, and the potential to integrate ontological background knowledge. In addition we review some of the work on the learning of ontologies and on the population of ontologies, mostly in the context of textual data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (2001)
van Harmelen, F.: Semantische Techniken stehen kurz vor dem Durchbruch. C’T Magazine (2007)
Fensel, D., Hendler, J.A., Lieberman, H., Wahlster, W.: Spinning the Semantic Web: Bringing the World Wide Web to its Full Potential. MIT Press, Cambridge (2003)
Ontoprise: Neue Version des von Ontoprise entwickelten Ratgebersystems beschleunigt die Roboterwartung. Ontoprise Pressemitteilung (2007)
LarKC: The large Knowledge Collider. EU FP 7 Large-Scale Integrating Project (2008), http://www.larkc.eu/
Incubator Group: Uncertainty Reasoning for the World Wide Web. W3C (2005), http://www.w3.org/2005/Incubator/urw3/
Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342. Springer, Heidelberg (2002)
Grobelnik, M., Mladenic, D.: Automated knowledge discovery in advanced knowledge management. Library Hi Tech News incorporating Online and CD Notes 9(5) (2005)
Grobelnik, M., Mladenic, D.: Knowledge discovery for ontology construction. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies. Wiley, Chichester (2006)
Bloehdorn, S., Haase, P., Sure, Y., Voelker, J.: Ontology evolution. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies. Wiley, Chichester (2006)
Bruijn, J.d., Ehrig, M., Feier, C., Martin-Recuerda, F., Scharffe, F., Weiten, M.: Ontology mediation, merging, and aligning. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies. Wiley, Chichester (2006)
Fortuna, B., Grobelnik, M., Mladenic, D.: Ontogen: Semi-automatic ontology editor. In: HCI (9) (2007)
Mladenic, D., Grobelnik, M., Foruna, B., Grcar, M.: Knowledge discovery for the semantic web (submitted, 2008)
Lisi, F.A.: Principles of inductive reasoning on the semantic web: A framework for learning in AL-Log. In: Fages, F., Soliman, S. (eds.) PPSWR 2005. LNCS, vol. 3703. Springer, Heidelberg (2005)
Lisi, F.A.: A methodology for building semantic web mining systems. In: The 16th International Symposium on Methodologies for Intelligent Systems (2006)
Lisi, F.A.: Practice of inductive reasoning on the semantic web: A system for semantic web mining. In: Alferes, J.J., Bailey, J., May, W., Schwertel, U. (eds.) PPSWR 2006. LNCS, vol. 4187. Springer, Heidelberg (2006)
Lisi, F.A.: The challenges of the semantic web to machine learning and data mining. In: Tutorial at ECML 2006 (2006)
Fukushige, Y.: Representing probabilistic relations in rdf. In: ISWC-URSW (2005)
Getoor, L., Friedman, N., Koller, D., Pferrer, A., Taskar, B.: Probabilistic relational models. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Rettinger, A., Nickles, M., Tresp, V.: A statistical relational model for trust learning. In: Proceeding of 7th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2008 (2008)
W3C: World Wide Web Consortium, http://www.w3.org/
Antoniou, G., van Harmelen, F.: A Semantic Web Primer. MIT Press, Cambridge (2004)
Herman, I.: Tutorial on the Semantic Web. W3C, http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/Slides.pdf
Tauberer, J.: Resource Description Framework, http://rdfabout.com/
Kiryakov, A.: Measurable targets for scalable reasoning. Ontotext Technology White Paper (2007)
Fahrmeir, L., Künstler, R., Pigeot, I., Tutz, G., Caputo, A., Lang, S.: Arbeitsbuch Statistik, 4th edn. Springer, Heidelberg (2004)
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press (1990)
Trochim, W.: The Research Methods Knowledge Base, 2nd edn. Atomic Dog Publishing (2000)
Popescul, A., Ungar, L.H.: Feature generation and selection in multi-relational statistical learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Karalič, A., Bratko, I.: First order regression. Machine Learning 26(2-3) (1997)
Reckow, S., Tresp, V.: Integrating ontological prior knowledge into relational learning. Technical report, Siemens (2007)
Tresp, V.: Committee machines. In: Hu, Y.H., Hwang, J.N. (eds.) Handbook for Neural Network Signal Processing. CRC Press, Boca Raton (2001)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, Chichester (2002)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine (Special Issue on AI and Networks), forthcoming (forthcoming, 2008)
Macskassy, S., Provost, F.: Classification in networked data: a toolkit and a univariate case study. Machine Learning (2007)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD (1998)
Neville, J., Jensen, D.: Iterative classification in relational data. In: AAAI (2000)
Lu, Q., Getoor, L.: Link-based classification. In: ICML (2003)
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. journal of machine learning research. Journal of Machine Learning Research (2002)
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Uncertainty in Artificial Intelligence, UAI (2002)
Zhu, X.: Semi-supervised learning literature survey. Technical report, Technical Report 1530, Department of Computer Sciences, University of Wisconsin (2005)
Neville, J., Jensen, D.: Dependency networks for relational data. In: ICDM 2004: Proceedings of the Fourth IEEE International Conference on Data Mining (2004)
Neville, J., Jensen, D.: Relational dependency networks. Journal of Machine Learning Research (2007)
Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5(3) (1990)
Džeroski, S.: Inductive logic programming in a nutshell. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4) (1995)
Muggleton, S., Feng, C.: Efficient induction of logic programs. In: Proceedings of the 1st Conference on Algorithmic Learning Theory, Ohmsma, Tokyo, Japan (1990)
Kramer, S., Lavrac, N., Flach, P.: From propositional to relational data mining. In: Džeroski, S., Lavrac, L. (eds.) Relational Data Mining. Springer, Heidelberg (2001)
De Raedt, L.: Attribute-value learning versus inductive logic programming: The missing links (extended abstract). In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446. Springer, Heidelberg (1998)
Lavrač, N., Džeroski, S., Grobelnik, M.: Learning nonrecursive definitions of relations with LINUS. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482. Springer, Heidelberg (1991)
Van Laer, W., De Raedt, L.: How to upgrade propositional learners to first order logic: A case study. In: Machine Learning and Its Applications, Advanced Lectures (2001)
De Raedt, L., Van Laer, W.: Inductive constraint logic. In: Zeugmann, T., Shinohara, T., Jantke, K.P. (eds.) ALT 1995. LNCS, vol. 997. Springer, Heidelberg (1995)
Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2) (1998)
Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Machine Learning - Proceedings 13th International Conference on Machine Learning (1996)
Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating naïve bayes and FOIL. In: Veloso, M., Kambhampati, S. (eds.) Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005) (2005)
Landwehr, N., Passerini, A., De Raedt, L., Frasconi: kFOIL: Learning simple relational kernels. In: National Conference on Artificial Intelligence (AAAI) (2006)
Cohen, W.W., Hirsh, H.: Learning the CLASSIC description logic: Theoretical and experimental results. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR 1994) (1994)
Rouveirol, C., Ventos, V.: Towards learning in CARIN-ALN. In: International Workshop on Inductive Logic Programming (2000)
Edwards, P., Grimnes, G., Preece, A.: An empirical investigation of learning from the semantic web. In: ECML/PKDD, Semantic Web Mining Workshop (2002)
Takacs, G., Pilaszy, I., Nemeth, B., Tikk, D.: On the gravity recommendation system. In: Proceedings of KDD Cup and Workshop 2007 (2007)
Lippert, C., Huang, Y., Weber, S.H., Tresp, V., Schubert, M., Kriegel, H.P.: Relation prediction in multi-relational domains using matrix factorization. Technical report, Siemens (2008)
Yu, K., Chu, W., Yu, S., Tresp, V., Xu, Z.: Stochastic relational models for discriminative link prediction. In: Advances in Neural Information Processing Systems 19 (2006)
Yu, S., Yu, K., Tresp, V.: Soft clustering on graphs. In: Advances in Neural Information Processing Systems 18 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning (2001)
Koller, D., Pfeffer, A.: Probabilistic frame-based systems. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (1998)
Kersting, K., De Raedt, L.: Bayesian logic programs. Technical report, Albert-Ludwigs University at Freiburg (2001)
Jaeger, M.: Relational bayesian networks. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI) (1997)
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006)
Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
De Raedt, L., Dehaspe, L.: Clausal discovery. Machine Learning 26 (1997)
Xu, Z., Tresp, V., Yu, K., Kriegel, H.P.: Infinite hidden relational models. In: Uncertainty in Artificial Intelligence (UAI) (2006)
Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. In: Poceedings of the National Conference on Artificial Intelligence (AAAI) (2006)
Buitelaar, P., Cimiano, P.: Ontology Learning and Population: Bridging the Gap between Text and Knowledge. IOS Press, Amsterdam (2008)
Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, Heidelberg (2006)
Sowa, J.F.: Ontology, metadata, and semiotics. In: International Conference on Computational Science (2000)
Biemann, C.: Ontology learning from text: A survey of methods. LDV Forum 20(2) (2005)
Völker, J., Haase, P., Hitzler, P.: Learning expressive ontologies. In: Buitelaar, P., Cimiano, P. (eds.) Ontology Learning and Population: Bridging the Gap between Text and Knowledge. IOS Press, Amsterdam (2008)
Harris, Z.S.: Mathematical Structures of Language. Wiley, Chichester (1968)
Hindle, D.: Noun classification from predicate-argument structures. In: Meeting of the Association for Computational Linguistics (1990)
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods (2005)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Maedche, A., Staab, S.: Semi-automatic engineering of ontologies from text. In: Proceedings of the 12th International Conference on Software Engineering and Knowledge Engineering (2000)
Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, divise and agglomerative clustering for learning taxonomies from text. In: Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI 2004 (2004)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1997)
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Biemann, C., Paas, G. (eds.) Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods (2005)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics (1992)
Paaß, G., Kindermann, J., Leopold, E.: Learning prototype ontologies by hierachical latent semantic analysis. In: Knowledge Discovery and Ontologies (2004)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (1990)
Hofmann, T.: Probabilistic latent semantic analysis. In: Uncertainty in Artificial Intelligence (UAI) (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3 (2003)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA (2004)
Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. In: Advances in Neural Information Processing Systems (2003)
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (2007)
Bontcheva, K., Cunningham, H., Kiryakov, A., Tablan, V.: Semantic annotation and human language technology. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies. Wiley, Chichester (2006)
McCallum, A.: Information extraction: distilling structured data from unstructured text. Queue 3(9) (2005)
Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: MUC6 1995: Proceedings of the 6th conference on Message understanding (1995)
Erik, F., Sang, T., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003 (2003)
Yeh, A., Morgan, A., Colosimo, M., Hirschman, L.: Biocreative task 1a: gene mention finding evaluation. BMC Bioinformatics 6 (2005)
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the seventh conference on natural language learning (2003)
Ray, S., Craven, M.: Representing sentence structure in hidden markov models for information extraction. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (2001)
Lin, Y.F., Tsai, T.H., Chou, W.C., Wu, K.P., Sung, T.Y., Hsu, W.L.: A maximum entropy approach to biomedical named entity recognition. In: Proceedings of 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BioKDD) (2004)
Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N. (eds.) Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP (2005)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Ando, R.K., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (2005)
Kristjansson, T.T., Culotta, A., Viola, P.A., McCallum, A.: Interactive information extraction with constrained conditional random fields. In: Nineteenth National Conference on Artificial Intelligence (AAAI) (2004)
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal (2008)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data 1(1) (2007)
Bhattacharya, I., Getoor, L.: A latent dirichlet model for unsupervised entity resolution. In: SIAM SDM 2006 (2006)
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (2005)
Ramani, A.K., Bunescu, R.C., Mooney, R.J., Marcotte, E.M.: Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol. 6(5) (2005)
Bunescu, R.C., Mooney, R.J.: Subsequence kernels for relation extraction. In: Advances in Neural Information Processing Systems (2005)
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2) (2001)
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 9 (2008)
Brickley, D., Miller, L.: The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
Brickley, D., Miller, L.: FOAF Vocabulary Specification, http://xmlns.com/foaf/spec/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tresp, V., Bundschus, M., Rettinger, A., Huang, Y. (2008). Towards Machine Learning on the Semantic Web. In: da Costa, P.C.G., et al. Uncertainty Reasoning for the Semantic Web I. URSW URSW URSW 2006 2007 2005. Lecture Notes in Computer Science(), vol 5327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89765-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-89765-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89764-4
Online ISBN: 978-3-540-89765-1
eBook Packages: Computer ScienceComputer Science (R0)