Abstract
Today’s knowledge bases (KBs) capture facts about the world’s entities, their properties, and their semantic relationships in the form of subject-predicate-object (SPO) triples. Domain-oriented KBs, such as DBpedia, Yago, Wikidata or Freebase, capture billions of facts that have been (semi-)automatically extracted from Wikipedia articles. Their commercial counterparts at Google, Bing or Baidu provide back-end support for search engines, online recommendations, and various knowledge-centric services.
This invited talk provides an overview of our recent contributions—and also highlights a number of open research challenges—in the context of extracting, managing, and reasoning with large semantic KBs. Compared to domain-oriented extraction techniques, we aim to acquire facts for a much broader set of predicates. Compared to open-domain extraction methods, the SPO arguments of our facts are canonicalized, thus referring to unique entities with semantically typed predicates. A core part of our work focuses on developing scalable inference techniques for querying an uncertain KB in the form of a probabilistic database. A further, very recent research focus lies also in scaling out these techniques to a distributed setting. Here, we aim to process declarative queries, posed in either SQL or logical query languages such as Datalog, via a proprietary, asynchronous communication protocol based on the Message Passing Interface.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Ives, Z.: DBpedia: a nucleus for a web of open data. In: ISWC, pp. 11–15 (2007)
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)
Dylla, M., Miliaraki, I., Theobald, M.: A temporal-probabilistic database model for information extraction. PVLDB 6(14), 1810–1821 (2013a)
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013b)
Dylla, M., Sozio, M., Theobald, M.: Resolving temporal conflicts in inconsistent RDF knowledge bases. In: BTW, pp. 474–493 (2011)
Dylla, M., Theobald, M., Miliaraki, I.: Querying and learning in probabilistic databases. In: Reasoning Web, pp. 313–368 (2014)
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD, pp. 289–300 (2014a)
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: Using graph summarization for join-ahead pruning in a distributed RDF engine. In: SWIM, pp. 41:1–41:4 (2014b)
Gurajada, S., Theobald, M.: Distributed processing of generalized graph-pattern queries in SPARQL 1.1. CoRR, abs/1609.05293 (2016a)
Gurajada, S., Theobald, M.: Distributed set reachability. In: SIGMOD, pp. 1247–1261 (2016b)
Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: CIKM, pp. 545–554 (2012)
Mutsuzaki, M., et al.: Trio-one: layering uncertainty and lineage on a conventional DBMS. In: CIDR, pp. 269–274 (2007)
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, pp. 227–236 (2011)
Nakashole, N., Weikum, G., Suchanek, F.M.: PATTY: a taxonomy of relational patterns with semantic types. In: EMNLP-CoNLL, pp. 1135–1145 (2012)
Nguyen, D.B., Abujabal, A., Tran, K., Theobald, M., Weikum, G.: Query-driven on-the-fly knowledge base construction. PVLDB 11(1), 66–79 (2017a)
Nguyen, D.B., Hoffart, J., Theobald, M., Weikum, G.: AIDA-light: high-throughput named-entity disambiguation. In: LDOW (2014)
Nguyen, D.B., Theobald, M., Weikum, G.: J-NERD: joint named entity recognition and disambiguation with rich linguistic features. TACL 4, 215–229 (2016)
Nguyen, D.B., Theobald, M., Weikum, G.: J-REED: joint relation extraction and entity disambiguation. In: CIKM, pp. 2227–2230 (2017b)
Papaioannou, K., Theobald, M., Böhlen, M.H.: Supporting set operations in temporal-probabilistic databases. In: ICDE, pp. 1180–1191 (2018)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Comm. of the ACM 57(10), 78–85 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Theobald, M. (2019). From Big Data to Big Knowledge. In: Catania, B., Královič, R., Nawrocki, J., Pighizzini, G. (eds) SOFSEM 2019: Theory and Practice of Computer Science. SOFSEM 2019. Lecture Notes in Computer Science(), vol 11376. Springer, Cham. https://doi.org/10.1007/978-3-030-10801-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-10801-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10800-7
Online ISBN: 978-3-030-10801-4
eBook Packages: Computer ScienceComputer Science (R0)