Abstract
With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis.
In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Resource Description Framework, http://www.w3.org/TR/rdf-syntax-grammar/
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Apache Hadoop, http://hadoop.apache.org/
Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal on Semantic Web & Information Systems 5(2), 1–24 (2009)
Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceeding of SIGMOD (2010)
Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 715–730. Springer, Heidelberg (2009)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A Not-so-foreign Language for Data Processing. In: Proceeding of SIGMOD 2008 (2008)
Myung, J., Yeon, J., Lee, S.: SPARQL Basic Graph Pattern Processing with Iterative MapReduce. In: MDAC 2010 (2010)
Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce. In: Cloud-Com (2009)
Liu, J.: Distributed Storage and Query of Large RDF Graphs. Technique Report
Ravindra, P., Deshpande, V.V., Anyanwu, K.: Towards Scalable RDF Graph Analytics on MapReduce. In: MDAC 2010 (2010)
Partition problem, http://en.wikipedia.org/wiki/Partition_problem
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution over A Map-reduce Framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
LargeTripleStores, http://www.w3.org/wiki/LargeTripleStores
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, JH., Wang, HF., Ni, Y., Yu, Y. (2012). HadoopRDF: A Scalable Semantic Data Analytical Engine. In: Huang, DS., Ma, J., Jo, KH., Gromiha, M.M. (eds) Intelligent Computing Theories and Applications. ICIC 2012. Lecture Notes in Computer Science(), vol 7390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31576-3_80
Download citation
DOI: https://doi.org/10.1007/978-3-642-31576-3_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31575-6
Online ISBN: 978-3-642-31576-3
eBook Packages: Computer ScienceComputer Science (R0)