[go: up one dir, main page]

Skip to main content

Differential Erasure Codes for Efficient Archival of Versioned Data in Cloud Storage Systems

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXX

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 10130))

  • 448 Accesses

Abstract

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner. The proposed technique is relevant in cloud settings, where, because of the huge volume of data to be stored, distributed (scale-out) storage systems deploying erasure codes for fault tolerance is typical. However existing erasure coding techniques do not leverage redundancy of information across multiple versions of a file. We propose a new technique called differential erasure coding (DEC) where the differences (deltas) between subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding techniques, DEC opportunistically exploits the sparsity (i.e., when the differences between two successive versions have few non-zero entries) in the updates to store the deltas using sparse sampling techniques applied with erasure coding. We first show that DEC provides significant savings in the storage size for versioned data whenever the update patterns are characterized by in-place alterations. Subsequently, we propose a practical DEC framework so as to reap storage size benefits against not just in-place alterations but also real-world update patterns such as insertions and deletions that alter the overall data sizes. We conduct experiments with several synthetic and practical workloads to demonstrate that the practical variant of DEC provides significant reductions in storage-overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

eBook
USD 12.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The proof for the proposition follows from the property that any \(2\gamma \) columns of a parity check matrix of a linear code with minimum distance \(2\gamma +1\) are linearly independent.

References

  1. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: The Proceedings of the USENIX Annual Technical Conference (ATC) (2012)

    Google Scholar 

  2. Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at Facebook. In: The Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010)

    Google Scholar 

  3. Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: The 9th USENIX Conference on Operating Systems Designand Implementation (OSDI) (2010)

    Google Scholar 

  4. Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network codes for distributed storage. Proc. IEEE 99, 476–489 (2011)

    Article  Google Scholar 

  5. Oggier, F., Datta, A.: Coding techniques for repairability in networked distributed storage systems. In: Foundations and Trends in Communications and Information Theory, vol. 9, no. 4, pp. 383–466. Now Publishers, June 2013

    Google Scholar 

  6. http://subversion.apache.org/

  7. http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html

  8. Borthakur, D.: HDFS and Erasure Codes (HDFS-RAID), August 2009. http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html

  9. The Coding for Distributed Storage wiki. http://storagewiki.ece.utexas.edu/

  10. Wang, Z., Cadambe, V.: Multi-version Coding for Distributed Storage. In: Proceedings of IEEE ISIT 2014, Honalulu, USA (2014)

    Google Scholar 

  11. Rouayheb, S., Goparaju, S., Kiah, H., Milenkovic, O.: Synchronising edits in distributed storage networks. In: The Proceedings of the IEEE International Symposium on Information Theory, Hong Kong (2015)

    Google Scholar 

  12. Rawat, A., Vishwanath, S., Bhowmick, A., Soljanin, E.: Update efficient codes for distributed storage. In: IEEE International Symposium on Information Theory (2011)

    Google Scholar 

  13. Han, Y., Pai, H.-T., Zheng, R., Varshney, P.K.: Update-efficient regenerating codes with minimum per-node storage. In: Proceedings of IEEE International Symposium on Information Theory (ISIT 2013), Istanbul (2013)

    Google Scholar 

  14. Mazumdar, A., Wornell, G.W., Chandar, V.: Update efficient codes for error correction. In: The Proceedings of IEEE IEEE International Symposium on Information Theory, Cambridge, MA, pp. 1558–1562, July 2012

    Google Scholar 

  15. Esmaili, K.S., Chiniah, A., Datta, A.: Efficient updates in cross-object erasure-coded storage systems. In: IEEE International Conference on Big Data, Silicon Valley, CA, October 2013

    Google Scholar 

  16. Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: The Proceedings of the 2012 USENIX Conference on Annual Technical Conference (2012)

    Google Scholar 

  17. Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for resilient storage and efficient I, O access in delta based versioning systems. In: The Proceedings of IEEE ICDCS, Columbus, Ohio, USA (2015)

    Google Scholar 

  18. Harshan, J., Oggier, F., Datta, A.: Sparsity exploiting erasure coding for distributed storage of versioned data. Computing 98, 1305–1329 (2016). Springer

    Article  MathSciNet  Google Scholar 

  19. http://rsync.samba.org/

  20. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theor. 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhang, F., Pfister, H.D.: Compressed sensing and linear codes over real numbers. In: Information Theory and Applications Workshop (ITA) (2008)

    Google Scholar 

  22. McWilliams, F.J., Sloane, N.J.A.: The Theory of Error Correcting Codes. North Holland, Amsterdam (1977)

    Google Scholar 

  23. Lacan, J., Fimes, J.: A construction of matrices with no singular square submatrices. In: The Proceedings of International Conference on Finite Fields and Applications, pp. 145–147 (2003)

    Google Scholar 

  24. Harshan, J., Datta, A., Oggier, F.: DiVers: an erasure code based storage architecture for versioning exploiting sparsity. Future Gener. Comput. Syst. 59, 47–62 (2016). Elsevier

    Article  Google Scholar 

  25. http://www.txtwizard.net/compression

  26. https://en.wikipedia.org/wiki/United_States

Download references

Acknowledgements

This work was supported by the MoE Tier-2 grant “eCode: erasure codes for data center environments" (MOE2013-T2-1-068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Harshan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Harshan, J., Datta, A., Oggier, F. (2016). Differential Erasure Codes for Efficient Archival of Versioned Data in Cloud Storage Systems. In: Hameurlain, A., Küng, J., Wagner, R., Schewe, KD., Bosa, K. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXX. Lecture Notes in Computer Science(), vol 10130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54054-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-54054-1_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-54053-4

  • Online ISBN: 978-3-662-54054-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics