[go: up one dir, main page]

Skip to main content
Log in

Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives using C\(++\) and subsequently its integration with the Spark framework to achieve the scalability we need for large systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Bader, D., Buluç, A., Gilbert, J., Gonzalez, J., Kepner, J., Mattson, T.: The Graph BLAS effort and its implications for Exascale. In: SIAM Workshop on Exascale Applied Mathematics Challenges and Opportunities (EX14) (2014) (extended abstract)

  2. Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  MATH  Google Scholar 

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  4. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix–vector and matrix-transpose–vector multiplication using compressed sparse blocks. In: Symposium on Parallelism in Algorithms and Architectures, pp. 233–244. ACM (2009)

  5. Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)

    Article  Google Scholar 

  6. Buono, D., Gunnels, J.A., Que, X., Checconi, F., Petrini, F., Tuan, T.-C., Long, C.: Optimizing sparse linear algebra for large-scale graph analytics. Computer 8, 26–34 (2015)

    Article  Google Scholar 

  7. Cahill, J., Nguyen, T., Vega, M., Baska, D., Szerdi, D., Pross, H., Arroyo, R., Nguyen, H., Mueller, M., Henderson, D., et al.: IBM power systems built with the POWER8 architecture and processors. IBM J. Res. Dev. 59(1), 4:1–4:10 (2015)

  8. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining 2004 Apr 22, pp. 442–446, Society for Industrial and Applied Mathematics

  9. Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at Facebook-scale. Proc. VLDB Endow. 8(12), 1804–1815 (2015)

    Article  Google Scholar 

  10. Dagum, L., Enon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)

    Article  Google Scholar 

  11. Ekanadham, K., Horn, B., Jann, J., Kumar, M., Moreira, J., Pattnaik, P., Serrano, M., Tanase, G., Yu, H.: Graph programming interface: rationale and specification. Technical report, IBM Research (2014)

    Google Scholar 

  12. Fineman, J.T., Robinson, E.: Fundamental graph algorithms. Graph Algorithms Lang. Linear Algebra 22, 45–58 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, pp. 17–30, Hollywood, CA, USA, 2012. USENIX Association, Berkeley, CA (2012)

  14. Harshvardhan, B., West, A., Fidel, N., Amato, M., Rauchwerger, L.: A hybrid approach to processing big data graphs on memory-restricted systems. In: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS ’15, pp. 799–808, Hyderabad, India, 2015. IEEE Computer Society (2015)

  15. IBM. IBM Power System S812L and S822L (2014). [Online; Accessed 10-October-2015]

  16. Kang, U., Tsourakakis, C., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 IEEE International Conference on Data Mining, ICDM ’09, pp. 229–238, Miami, FL, 2009. IEEE Computer Society, Washington, DC (2009)

  17. Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra, vol. 22. SIAM, Philadelphia, PA (2011)

    Book  MATH  Google Scholar 

  18. Leskovec, J., Sosič, R.: SNAP: A General Purpose Network Analysis and Graph Mining Library in C++ ACM Trans. Intell. Syst. Technol. 8(1):1:1–1:20 (2016)

  19. Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)

  20. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 135–146, Indianapolis, IN, 2010. ACM. (2010)

  21. T. Mattson, D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for graph algorithm primitives. In: High Performance Extreme Computing Conference (HPEC), 2013 IEEE, pp. 1–2 (2013)

  22. Mehlhorn, K., Näher, S.: LEDA: a platform for combinatorial and geometric computing. Commun. ACM 38(1), 96–102 (1995)

    Article  MATH  Google Scholar 

  23. Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. In: Proceedings of the CUG 2010: Simulation Comes of Age [online], Edinburgh, UK, 2010. Cray User’s Group (CUG), Delaware (2010)

  24. Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of ACM Symposium on Operating Systems Principles, SOSP ’13, pp. 456–471 (2013)

  25. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia, PA (2003)

    Book  MATH  Google Scholar 

  26. Siek, J.G., Lee, L.-Q., Lumsdaine, A.: Boost Graph Library: User Guide and Reference Manual. The Pearson Education, London (2001)

    Google Scholar 

  27. Turi Inc. SFrame. https://github.com/turi-code/SFrame (2016)

  28. Van De Geijn, R.A., Watts, J.: Summa: scalable universal matrix multiplication algorithm. Concurr. Pract. Exp. 9(4), 255–274 (1997)

    Article  Google Scholar 

  29. Whang, J.J., Lenharth, A., Dhillon, I.S., Pingali, K.: Scalable data-driven pagerank: algorithms, system issues, and lessons learned. In: Euro-Par 2015: Parallel Processing—21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24–28, 2015, Proceedings, pp. 438–450 (2015)

  30. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: International Workshop on Graph Data Management Experiences and Systems. ACM (2013)

  31. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pp. 2–2, San Jose, CA (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Moreira.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Horn, W., Kumar, M., Jann, J. et al. Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations. Int J Parallel Prog 46, 412–440 (2018). https://doi.org/10.1007/s10766-016-0481-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0481-y

Keywords

Navigation