[go: up one dir, main page]

Skip to main content

HadoopRDF: A Scalable Semantic Data Analytical Engine

  • Conference paper
Intelligent Computing Theories and Applications (ICIC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7390))

Included in the following conference series:

  • 2994 Accesses

Abstract

With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis.

In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Resource Description Framework, http://www.w3.org/TR/rdf-syntax-grammar/

  2. SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  3. Apache Hadoop, http://hadoop.apache.org/

  4. Jeffrey, D., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)

    Google Scholar 

  5. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

    Google Scholar 

  6. Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal on Semantic Web & Information Systems 5(2), 1–24 (2009)

    Article  Google Scholar 

  7. Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceeding of SIGMOD (2010)

    Google Scholar 

  8. Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 715–730. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: A Not-so-foreign Language for Data Processing. In: Proceeding of SIGMOD 2008 (2008)

    Google Scholar 

  10. Myung, J., Yeon, J., Lee, S.: SPARQL Basic Graph Pattern Processing with Iterative MapReduce. In: MDAC 2010 (2010)

    Google Scholar 

  11. Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce. In: Cloud-Com (2009)

    Google Scholar 

  12. Liu, J.: Distributed Storage and Query of Large RDF Graphs. Technique Report

    Google Scholar 

  13. Ravindra, P., Deshpande, V.V., Anyanwu, K.: Towards Scalable RDF Graph Analytics on MapReduce. In: MDAC 2010 (2010)

    Google Scholar 

  14. Partition problem, http://en.wikipedia.org/wiki/Partition_problem

  15. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution over A Map-reduce Framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Google Scholar 

  16. LargeTripleStores, http://www.w3.org/wiki/LargeTripleStores

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Du, JH., Wang, HF., Ni, Y., Yu, Y. (2012). HadoopRDF: A Scalable Semantic Data Analytical Engine. In: Huang, DS., Ma, J., Jo, KH., Gromiha, M.M. (eds) Intelligent Computing Theories and Applications. ICIC 2012. Lecture Notes in Computer Science(), vol 7390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31576-3_80

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31576-3_80

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31575-6

  • Online ISBN: 978-3-642-31576-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics