Abstract
Load balance is important because it may affect the speedup attained through the concurrent execution of loop iterations on a parallel processor. We study loop load balance in the context of the well-known Perfect benchmarks. Several static and dynamic characteristics of the Perfect benchmark DOALL loops are observed and interpreted. Thelate arrival of processors is noted as a major source of load imbalance. This observation suggested the idea ofprocessor preallocation. An analytic cost model is presented and the advantages of processor preallocation are demonstrated by experimental evaluation on a CRAY Y-MP8 under the Unicos operating system.
Similar content being viewed by others
References
Banerjee, U. 1993.Loop Transformations for Restructuring Compilers. Kluwer Academic, Boston.
Berry, M., et al. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers.Internat. J. Supercomputer Applications (fall): 5–40.
Chen, D.K., and Yew, P.C. 1991. An empirical study of DOACROSS loops. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 620–632.
Chen, D.K., Su, H.M., and Yew, P.C. 1990. The impact of synchronization and granularity on parallel systems. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 239–248.
Cheng, D.Y., and Pase, D.M. 1991. An evaluation of automatic and interactive parallel programming tools. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 412–423.
Cray Research. 1989.Cray Y-MP Hardware Reference Manual (HR-4001). Cray Research, Inc., Mendota Heights, Minn.
Cray Research. 1991a.CF77 Compiling System, Volume 4: Parallel Processing Guide (SG-3074 5.0). Cray Research, Inc., Mendota Heights, Minn.
Cray Research. 1991b.Cray Y-MP, Volume 4: UNICOS System Calls Reference Manual (SR-2012 6.0). Cray Research, Inc., Mendota Heights, Minn.
Cytron, R., Lipkis, J., and Schonberg, E. 1990. A compiler-assisted approach to SPMD execution. InProc., Supercomputing '90 Conf. (New York, Nov. 12–16), pp. 398–406.
Eigenmann, R., and Blume, W. 1991. An effectiveness study of parallelizing compiler techniques. InProc., Internat. Conf. on Parallel Processing (Austin, Tex., Aug. 12–17), pp. II-17–25.
Elsesser, G., Ngo, V., Bhattacharya, S., and Tsai, W.-T. 1993. A study of DOALL loops in Perfect Club benchmark. InProc., Internat. Parallel Processing Symp. (Newport Beach, Calif. Apr. 13–16), pp. 129–133.
Ferrante, J., Ottenstein, K.J., and Warren, J.D. 1987. The program deppendence graph and its use in optimization. ACMTrans. on Programming Languages and Systems, 9: 319–349.
Fu, J.W.C., and Patel, J.H. 1991. Data prefetching in multiprocessor vector cache memories. InProc., Internat. Symp. on Computer Architecture (Toronto, May 27–30), pp. 54–63.
Hummel, S.F., Schonberg, E., and Flynn, L.E. 1992. Factoring: A method for scheduling parallel loops.CACM, 35, 8 (Aug.): 90–101.
Knuth, D.E. 1971. An empirical study of Fortran programs.J. Software Practice and Experience, 1, 12 (Dec.): 105–134.
Kruskal, C.P., and Weiss, A. 1985. Allocating independent subtasks on parallel processors.IEEE Trans. Software Engineering, 11, 10 (Oct.): 1001–1016.
Kuck, D.J., Budnick, P., Chen, S., Davis, E., Jr., Han, J., Kraska, P., Lawrie, D., Muraoka, Y., Strebendt, R., and Towle, R. 1974. Measurements of parallelism in ordinary Fortran programs.IEEE Comp., 7, 1 (Jan.): 37–46.
Kumar, M. 1988. Measuring parallelism in computation-intensive scientific engineering applications.IEEE Comp., 37, 9 (Sept.): 1088–1098.
Polychronopoulos, C.D. 1988.Parallel Programming and Compilers. Kluwer Academic, Boston.
Reddy, A.L.N., and Banerjee, P. 1990. A study of I/O behavior of Perfect benchmarks on a multiprocessor. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 312–317.
Sarkar, V. 1989. Determining average program execution times and their variance. SIGPLANConf. on Programming Language Design and Implementation (Portland, Ore., July 21–23), pp. 298–312.
Shen, Z., Li, Z., and Yew, P.C. 1990. An empirical study of Fortran programs for parallel compilers.IEEE Trans. on Parallel and Distributed Systems, 1, 3 (July): 356–364.
Vajapeyam, S., Sohi, G.S., and Hsu, W.C. 1991. An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks. InProc., Internat. Symp. on Comp. Architecture (Toronto, May 27–30), pp. 170–179.
Williams, E., and Bobrowicz, F. 1985. Speedup predications for large scientific parallel programs on CRAY X-MP-like architectures. InProc., Internat. Conf. on Parallel Processing (University Park, Penn., Aug 20–23), pp. 541–543.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elsesser, G.W., Ngo, V.N., Bhattacharya, S. et al. Processor preallocation and load balancing of DOALL loops. J Supercomput 8, 135–161 (1994). https://doi.org/10.1007/BF01204659
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01204659