Abstract
Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives using C\(++\) and subsequently its integration with the Spark framework to achieve the scalability we need for large systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.
Similar content being viewed by others
References
Bader, D., Buluç, A., Gilbert, J., Gonzalez, J., Kepner, J., Mattson, T.: The Graph BLAS effort and its implications for Exascale. In: SIAM Workshop on Exascale Applied Mathematics Challenges and Opportunities (EX14) (2014) (extended abstract)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix–vector and matrix-transpose–vector multiplication using compressed sparse blocks. In: Symposium on Parallelism in Algorithms and Architectures, pp. 233–244. ACM (2009)
Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
Buono, D., Gunnels, J.A., Que, X., Checconi, F., Petrini, F., Tuan, T.-C., Long, C.: Optimizing sparse linear algebra for large-scale graph analytics. Computer 8, 26–34 (2015)
Cahill, J., Nguyen, T., Vega, M., Baska, D., Szerdi, D., Pross, H., Arroyo, R., Nguyen, H., Mueller, M., Henderson, D., et al.: IBM power systems built with the POWER8 architecture and processors. IBM J. Res. Dev. 59(1), 4:1–4:10 (2015)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining 2004 Apr 22, pp. 442–446, Society for Industrial and Applied Mathematics
Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at Facebook-scale. Proc. VLDB Endow. 8(12), 1804–1815 (2015)
Dagum, L., Enon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)
Ekanadham, K., Horn, B., Jann, J., Kumar, M., Moreira, J., Pattnaik, P., Serrano, M., Tanase, G., Yu, H.: Graph programming interface: rationale and specification. Technical report, IBM Research (2014)
Fineman, J.T., Robinson, E.: Fundamental graph algorithms. Graph Algorithms Lang. Linear Algebra 22, 45–58 (2011)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, pp. 17–30, Hollywood, CA, USA, 2012. USENIX Association, Berkeley, CA (2012)
Harshvardhan, B., West, A., Fidel, N., Amato, M., Rauchwerger, L.: A hybrid approach to processing big data graphs on memory-restricted systems. In: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS ’15, pp. 799–808, Hyderabad, India, 2015. IEEE Computer Society (2015)
IBM. IBM Power System S812L and S822L (2014). [Online; Accessed 10-October-2015]
Kang, U., Tsourakakis, C., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 IEEE International Conference on Data Mining, ICDM ’09, pp. 229–238, Miami, FL, 2009. IEEE Computer Society, Washington, DC (2009)
Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra, vol. 22. SIAM, Philadelphia, PA (2011)
Leskovec, J., Sosič, R.: SNAP: A General Purpose Network Analysis and Graph Mining Library in C++ ACM Trans. Intell. Syst. Technol. 8(1):1:1–1:20 (2016)
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 135–146, Indianapolis, IN, 2010. ACM. (2010)
T. Mattson, D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for graph algorithm primitives. In: High Performance Extreme Computing Conference (HPEC), 2013 IEEE, pp. 1–2 (2013)
Mehlhorn, K., Näher, S.: LEDA: a platform for combinatorial and geometric computing. Commun. ACM 38(1), 96–102 (1995)
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the Graph 500. In: Proceedings of the CUG 2010: Simulation Comes of Age [online], Edinburgh, UK, 2010. Cray User’s Group (CUG), Delaware (2010)
Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: Proceedings of ACM Symposium on Operating Systems Principles, SOSP ’13, pp. 456–471 (2013)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia, PA (2003)
Siek, J.G., Lee, L.-Q., Lumsdaine, A.: Boost Graph Library: User Guide and Reference Manual. The Pearson Education, London (2001)
Turi Inc. SFrame. https://github.com/turi-code/SFrame (2016)
Van De Geijn, R.A., Watts, J.: Summa: scalable universal matrix multiplication algorithm. Concurr. Pract. Exp. 9(4), 255–274 (1997)
Whang, J.J., Lenharth, A., Dhillon, I.S., Pingali, K.: Scalable data-driven pagerank: algorithms, system issues, and lessons learned. In: Euro-Par 2015: Parallel Processing—21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24–28, 2015, Proceedings, pp. 438–450 (2015)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: International Workshop on Graph Data Management Experiences and Systems. ACM (2013)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pp. 2–2, San Jose, CA (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Horn, W., Kumar, M., Jann, J. et al. Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations. Int J Parallel Prog 46, 412–440 (2018). https://doi.org/10.1007/s10766-016-0481-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0481-y