[go: up one dir, main page]

skip to main content
10.1145/3401832.3402681acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Ontology mediated information extraction in financial domain with Mastro System-T

Published: 15 June 2020 Publication History

Abstract

Information extraction (IE) refers to the task of turning text documents into a structured form, in order to make the information contained therein automatically processable. Ontology Mediated Information Extraction (OMIE) is a new paradigm for IE that seeks to exploit the semantic knowledge expressed in ontologies to improve query answering over unstructured data (properly raw text). In this paper we present Mastro System-T, an OMIE tool born from a joint collaboration between the University of Rome "La Sapienza" and IBM Research Almaden and its first application in a financial domain, namely to facilitate the access to and the sharing of data extracted from the EDGAR system.

References

[1]
Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. 2007. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated reasoning 39, 3 (2007), 385--429.
[2]
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick R Reiss, and Shivakumar Vaithyanathan. 2010. SystemT: an algebraic approach to declarative information extraction. In Proc. of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 128--137.
[3]
Jim Cowie and Wendy Lehnert. 1996. Information extraction. Commun. ACM 39, 1 (1996), 80--91.
[4]
Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to RDF Mapping Language. W3C Recommendation. W3C. Available at http://www.w3.org/TR/r2rml/.
[5]
Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. 2012. MASTRO: A Reasoner for Effective Ontology-Based Data Access. In Proc. of the 1st Int. Workshop on OWL Reasoner Evaluation (ORE).
[6]
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. 2015. Document spanners: A formal approach to information extraction. J. ACM 62, 2 (2015), 1--51.
[7]
Dayne Freitag. 2000. Machine learning for information extraction in informal domains. Machine learning 39, 2-3 (2000), 169--202.
[8]
Giulio Ganino, Domenico Lembo, Massimo Mecella, and Federico Scafoglieri. 2018. Ontology population for open-source intelligence: A GATE-based solution. Software: Practice and Experience 48, 12 (2018), 2302--2330.
[9]
Tom Gruber. 2018. Ontology. In Encyclopedia of Database Systems, Second Edition. Springer.
[10]
Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. W3C. Available at http://www.w3.org/TR/sparql11-query.
[11]
Alexander Hogenboom, Frederik Hogenboom, Flavius Frasincar, Kim Schouten, and Otto Van Der Meer. 2013. Semantics-based information extraction for detecting economic events. Multimedia Tools and Applications 64, 1 (2013), 27--52.
[12]
Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2009. SystemT: a system for declarative information extraction. ACM SIGMOD Record 37, 4 (2009), 7--13.
[13]
Domenico Lembo, Daniele Pantaleone, Valerio Santarelli, and Domenico Fabio Savo. 2016. Easy OWL drawing with the graphol visual ontology language. In Proc. of the 15th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR). 573--576.
[14]
Domenico Lembo, Daniele Pantaleone, Valerio Santarelli, and Domenico Fabio Savo. 2018. Drawing OWL 2 ontologies with Eddy the editor. AI Commun. 31, 1 (2018), 97--113.
[15]
Domenico Lembo and Federico Maria Scafoglieri. 2020. Ontology-based Document Spanning Systems for Information Extraction. Int. Journal of Semantic Computing (2020).
[16]
Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C Recommendation 10, 10 (2004), 2004.
[17]
Boris Motik, Achille Fokoue, Ian Horrocks, Zhe Wu, Carsten Lutz, and Bernardo Cuenca Grau. 2009. OWLWeb Ontology Language Profiles. W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl-profiles/.
[18]
Boris Motik, Bijan Parsia, and Peter F. Patel-Schneider. 2012. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (Second Edition). W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl2-syntax/.
[19]
Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. 2008. Linking Data to Ontologies. Journal on Data Semantics X (2008), 133--173.
[20]
Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, Angel Kirilov, and Miroslav Goranov. 2003. Towards semantic web information extraction. In Proc. of the Human Language Technologies Workshop at ISWC 2003, Vol. 20.
[21]
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, and Shivakumar Vaithyanathan. 2008. An algebraic approach to rule-based information extraction. In 2008 IEEE 24th Int. Conf. on Data Engineering. IEEE, 933--942.
[22]
Horacio Saggion, Adam Funk, Diana Maynard, and Kalina Bontcheva. 2007. Ontology-Based Information Extraction for Business Intelligence. In Proc. of the 6th Int. Semantic Web Conf. and the, 2nd Asian Semantic Web Conf. (ISWC + ASWC). 843--856.
[23]
Federico Maria Scafoglieri and Domenico Lembo. 2019. A formal framework for coupling document spanners with ontologies. In 2019 IEEE 2nd Int. Conf. on Artificial Intelligence and Knowledge Engineering (AIKE). IEEE, 155--162.
[24]
Michael K. Smith, Chris Welty, and Deborah L. McGuiness. 2004. OWL Web Ontology Language Guide. W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl-guide/.
[25]
Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. 2018. Ontology-based data access: A survey. In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI). 5511--5519.

Cited By

View all
  • (2023)A knowledge recommendation approach in design for multi-material 4D printing based on semantic similarity vector space model and case-based reasoningComputers in Industry10.1016/j.compind.2022.103824145(103824)Online publication date: Feb-2023
  • (2022)Graphol: A Graphical Language for Ontology Modeling Equivalent to OWL 2Future Internet10.3390/fi1403007814:3(78)Online publication date: 28-Feb-2022
  • (2022)Demonstrating ASET: Ad-hoc Structured Exploration of Text CollectionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520174(2393-2396)Online publication date: 10-Jun-2022
  • Show More Cited By

Index Terms

  1. Ontology mediated information extraction in financial domain with Mastro System-T

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DSMM '20: Proceedings of the Sixth International Workshop on Data Science for Macro-Modeling
      June 2020
      23 pages
      ISBN:9781450380300
      DOI:10.1145/3401832
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 June 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. financial domain
      2. information extraction
      3. ontology
      4. ontology based data access
      5. ontology mediated information extraction

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 32 of 64 submissions, 50%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 10 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A knowledge recommendation approach in design for multi-material 4D printing based on semantic similarity vector space model and case-based reasoningComputers in Industry10.1016/j.compind.2022.103824145(103824)Online publication date: Feb-2023
      • (2022)Graphol: A Graphical Language for Ontology Modeling Equivalent to OWL 2Future Internet10.3390/fi1403007814:3(78)Online publication date: 28-Feb-2022
      • (2022)Demonstrating ASET: Ad-hoc Structured Exploration of Text CollectionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520174(2393-2396)Online publication date: 10-Jun-2022
      • (2021)Database Principles and Challenges in Text AnalysisACM SIGMOD Record10.1145/3484622.348462450:2(6-17)Online publication date: 31-Aug-2021

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media