HadoopRDF: A Scalable Semantic Data Analytical Engine

Jin-Hang Du²³,
Hao-Fen Wang²³,
Yuan Ni²⁴ &
…
Yong Yu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7390))

Included in the following conference series:

International Conference on Intelligent Computing

2994 Accesses

Abstract

With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis.

In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Distributed Query Processing and Reasoning Over Linked Big Data

Distributed RDF store for efficient searching billions of triples based on Hadoop

Article 16 February 2016

Presto-RDF: SPARQL Querying over Big RDF Data

References

Resource Description Framework, http://www.w3.org/TR/rdf-syntax-grammar/
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Apache Hadoop, http://hadoop.apache.org/
Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)
Google Scholar
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
Google Scholar
Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal on Semantic Web & Information Systems 5(2), 1–24 (2009)
Article Google Scholar
Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceeding of SIGMOD (2010)
Google Scholar
Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 715–730. Springer, Heidelberg (2009)
Chapter Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A Not-so-foreign Language for Data Processing. In: Proceeding of SIGMOD 2008 (2008)
Google Scholar
Myung, J., Yeon, J., Lee, S.: SPARQL Basic Graph Pattern Processing with Iterative MapReduce. In: MDAC 2010 (2010)
Google Scholar
Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce. In: Cloud-Com (2009)
Google Scholar
Liu, J.: Distributed Storage and Query of Large RDF Graphs. Technique Report
Google Scholar
Ravindra, P., Deshpande, V.V., Anyanwu, K.: Towards Scalable RDF Graph Analytics on MapReduce. In: MDAC 2010 (2010)
Google Scholar
Partition problem, http://en.wikipedia.org/wiki/Partition_problem
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution over A Map-reduce Framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Google Scholar
LargeTripleStores, http://www.w3.org/wiki/LargeTripleStores

Download references

Author information

Authors and Affiliations

Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, 200240, China
Jin-Hang Du, Hao-Fen Wang & Yong Yu
IBM China Research Lab, China
Yuan Ni

Authors

Jin-Hang Du
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Fen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronics and Information Engineering, Machine Learning and Systems Biology Laboratory, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
Faculty of Computer and Information Sciences, Hosei University, 3-7-2, Kajino-Cho, Koganei-Shi, Japan
Jianhua Ma
School of Electrical Engineering, University of Ulsan, #7-413, San 29, Muger Dong, 680-749, Ulsan, South Korea
Kang-Hyun Jo
Department of Biotechnology, Indian Institute of Technology (IIT) Madras, 600 036, Chennai, Tamilnadu, India
M. Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, JH., Wang, HF., Ni, Y., Yu, Y. (2012). HadoopRDF: A Scalable Semantic Data Analytical Engine. In: Huang, DS., Ma, J., Jo, KH., Gromiha, M.M. (eds) Intelligent Computing Theories and Applications. ICIC 2012. Lecture Notes in Computer Science(), vol 7390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31576-3_80

Download citation

DOI: https://doi.org/10.1007/978-3-642-31576-3_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31575-6
Online ISBN: 978-3-642-31576-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics