Processor preallocation and load balancing of DOALL loops

Gary W. Elsesser¹,
Viet N. Ngo¹,
Sourav Bhattacharya² &
…
Wei -Tek Tsai³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Load balance is important because it may affect the speedup attained through the concurrent execution of loop iterations on a parallel processor. We study loop load balance in the context of the well-known Perfect benchmarks. Several static and dynamic characteristics of the Perfect benchmark DOALL loops are observed and interpreted. Thelate arrival of processors is noted as a major source of load imbalance. This observation suggested the idea ofprocessor preallocation. An analytic cost model is presented and the advantages of processor preallocation are demonstrated by experimental evaluation on a CRAY Y-MP8 under the Unicos operating system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banerjee, U. 1993.Loop Transformations for Restructuring Compilers. Kluwer Academic, Boston.
Google Scholar
Berry, M., et al. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers.Internat. J. Supercomputer Applications (fall): 5–40.
Chen, D.K., and Yew, P.C. 1991. An empirical study of DOACROSS loops. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 620–632.
Chen, D.K., Su, H.M., and Yew, P.C. 1990. The impact of synchronization and granularity on parallel systems. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 239–248.
Cheng, D.Y., and Pase, D.M. 1991. An evaluation of automatic and interactive parallel programming tools. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 412–423.
Cray Research. 1989.Cray Y-MP Hardware Reference Manual (HR-4001). Cray Research, Inc., Mendota Heights, Minn.
Google Scholar
Cray Research. 1991a.CF77 Compiling System, Volume 4: Parallel Processing Guide (SG-3074 5.0). Cray Research, Inc., Mendota Heights, Minn.
Google Scholar
Cray Research. 1991b.Cray Y-MP, Volume 4: UNICOS System Calls Reference Manual (SR-2012 6.0). Cray Research, Inc., Mendota Heights, Minn.
Google Scholar
Cytron, R., Lipkis, J., and Schonberg, E. 1990. A compiler-assisted approach to SPMD execution. InProc., Supercomputing '90 Conf. (New York, Nov. 12–16), pp. 398–406.
Eigenmann, R., and Blume, W. 1991. An effectiveness study of parallelizing compiler techniques. InProc., Internat. Conf. on Parallel Processing (Austin, Tex., Aug. 12–17), pp. II-17–25.
Elsesser, G., Ngo, V., Bhattacharya, S., and Tsai, W.-T. 1993. A study of DOALL loops in Perfect Club benchmark. InProc., Internat. Parallel Processing Symp. (Newport Beach, Calif. Apr. 13–16), pp. 129–133.
Ferrante, J., Ottenstein, K.J., and Warren, J.D. 1987. The program deppendence graph and its use in optimization. ACMTrans. on Programming Languages and Systems, 9: 319–349.
Google Scholar
Fu, J.W.C., and Patel, J.H. 1991. Data prefetching in multiprocessor vector cache memories. InProc., Internat. Symp. on Computer Architecture (Toronto, May 27–30), pp. 54–63.
Hummel, S.F., Schonberg, E., and Flynn, L.E. 1992. Factoring: A method for scheduling parallel loops.CACM, 35, 8 (Aug.): 90–101.
Google Scholar
Knuth, D.E. 1971. An empirical study of Fortran programs.J. Software Practice and Experience, 1, 12 (Dec.): 105–134.
Google Scholar
Kruskal, C.P., and Weiss, A. 1985. Allocating independent subtasks on parallel processors.IEEE Trans. Software Engineering, 11, 10 (Oct.): 1001–1016.
Google Scholar
Kuck, D.J., Budnick, P., Chen, S., Davis, E., Jr., Han, J., Kraska, P., Lawrie, D., Muraoka, Y., Strebendt, R., and Towle, R. 1974. Measurements of parallelism in ordinary Fortran programs.IEEE Comp., 7, 1 (Jan.): 37–46.
Google Scholar
Kumar, M. 1988. Measuring parallelism in computation-intensive scientific engineering applications.IEEE Comp., 37, 9 (Sept.): 1088–1098.
Google Scholar
Polychronopoulos, C.D. 1988.Parallel Programming and Compilers. Kluwer Academic, Boston.
Google Scholar
Reddy, A.L.N., and Banerjee, P. 1990. A study of I/O behavior of Perfect benchmarks on a multiprocessor. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 312–317.
Sarkar, V. 1989. Determining average program execution times and their variance. SIGPLANConf. on Programming Language Design and Implementation (Portland, Ore., July 21–23), pp. 298–312.
Shen, Z., Li, Z., and Yew, P.C. 1990. An empirical study of Fortran programs for parallel compilers.IEEE Trans. on Parallel and Distributed Systems, 1, 3 (July): 356–364.
Google Scholar
Vajapeyam, S., Sohi, G.S., and Hsu, W.C. 1991. An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks. InProc., Internat. Symp. on Comp. Architecture (Toronto, May 27–30), pp. 170–179.
Williams, E., and Bobrowicz, F. 1985. Speedup predications for large scientific parallel programs on CRAY X-MP-like architectures. InProc., Internat. Conf. on Parallel Processing (University Park, Penn., Aug 20–23), pp. 541–543.

Download references

Author information

Authors and Affiliations

Cray Research Park, 655 Lone Oak Drive, 55121, Eagan, MN
Gary W. Elsesser & Viet N. Ngo
Honeywell Technology Center, 3600 Technology Drive, 55418, Minneapolis, MN
Sourav Bhattacharya
Department of Computer Science, University of Minnesota, 55455, MN
Wei -Tek Tsai

Authors

Gary W. Elsesser
View author publications
You can also search for this author in PubMed Google Scholar
Viet N. Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Sourav Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Wei -Tek Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elsesser, G.W., Ngo, V.N., Bhattacharya, S. et al. Processor preallocation and load balancing of DOALL loops. J Supercomput 8, 135–161 (1994). https://doi.org/10.1007/BF01204659

Download citation

Received: 15 November 1992
Accepted: 15 March 1994
Issue Date: June 1994
DOI: https://doi.org/10.1007/BF01204659

Processor preallocation and load balancing of DOALL loops

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Just in Time Load Balancing

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Processor preallocation and load balancing of DOALL loops

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Just in Time Load Balancing

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation