Abstract
In this paper we propose a general framework for compiling, scheduling, and executing parallel programs on parallel computers. We discuss important aspects of program partitioning, scheduling, and execution, and consider possible realistic alternatives for each issue. Subsequently we propose a possible implementation of an auto-scheduling compiler and give simple examples to illustrate the principles. Our approach to the entire problem is to utilize program information available to the compiler while, at the same time, allowing for run-time “corrections” and flexibility.
Similar content being viewed by others
References
Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison Wesley, Reading, Mass.
Allen, F. E. and Cocke, J. 1972. A catalogue of optimizing transformations. In Design and Optimization of Compilers (R. Rustin, ed.), Prentice-Hall, Englewood Cliffs, N.J., pp. 1–30.
Allen, J. R., and Kennedy, K. 1982. PFC: A program to convert Fortran to parallel form. Tech. Rept. MASC-TR82–6, Rice University, Houston, Texas.
Allen R., and Kennedy, K. 1987. Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4).
Alliant Computer Systems Corp. 1985. FX/Series Architecture Manual. Acton, Mass.
American National Standards Institute. 1981. American National Standard for Information Systems. Programming Language Fortran S8 (X3.9–198x). Revision of X3.9–1978, Draft S8, Version 99, ANSI, New York.
Banerjee, U. 1979. Speedup of ordinary programs. Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-79–989.
Bokhari, S. 1988. Partitioning problems in parallel, pipelined and distributed computing. IEEE Transactions on Computers, 37(1).
Chen, S. 1983. Large-scale and high-speed multiprocessor system for scientific applications—Cray-X-MP-2 series. Proc. of NATO Advanced Research Workshop on High Speed Computing, Kowalik (ed.), pp.59–67.
Coffman, E. G. Jr., ed. 1976. Computer and Job-Shop Scheduling Theory. Wiley, New York, 1976.
Coffman. E. G., and Graham R. L. 1972. Optimal scheduling on two processor systems. Acta Informatica, 1(3).
Cray, 1985. Multitasking user guide. Cray Computer Systems Technical Note, SN-0222, January 1985.
Cytron, R. G. 1986. Doacross: Beyond vectorization for multiprocessors (extended abstract). In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Ill., pp.836–844.
Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco.
Gajski, D., Padua, D., Kuck, D., and Kuhn, R. 1982. A second opinion on dataflow machines and languages, IEEE Computer, Feb.
Girkar, M., and Polychronopoulos, C. 1988. Compiler issues for supercomputers. Technical Report CSRD No. 676, Center for Supercomputing Research and Development, University of Illinois.
Gottlieb, A., Grishman, R., Kruskal, C. P., McAuliffe, K. P., Rudolph, L., and Snir, M. 1983. The NYU Ultracomputer—Designing an MIMD shared-memory parallel machine. IEEE Trans. Computers, C-32,(2).
Graham, R. L. 1972. Bounds on multiprocessor scheduling anomalies and related packing algorithms. Proceedings of Spring Joint Computer Conference.
Guzzi, M., Padua, D., and Lawrie, D. H. 1988. Cedar Fortran. Internal Document, Center for Supercomputing Research and Development, University of Illinois.
Kennedy, K. 1980. Automatic vectorization of Fortran programs to vector form. Technical Report, Rice University, Houston, Texas.
Kuck, D. J., Kuhn, R. H., Leasure, B., and Wolfe, M. 1980. The structure of an advanced vectorizer for pipelined processors. Fourth International Computer Software and Applications Conference.
Kuck, D. J., Davidson, E. S., Lawrie, D. H., and Sameh, A. H. 1986. Parallel supercomputing today and the Cedar approach. Science231 (4740): 967–974.
Kuck, D. J., Kuhn, R., Padua, D., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. In Proceedings of the 8th ACM Symposium on Principles of Programming Languages, pp. 207–218.
Mehrotra, P., and Van Rosendale, J. 1985. The Blaze language: A parallel language for scientific programming. Rep. 85–29, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Va.
Midkiff S. P., and Padua, D. A. 1987. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36, 12, December.
Miura, K., and Uchida, K. 1984. Facom vector processor VP-100/VP-200. High Speed Computation, NATO ASI Series, Vol. F7 (J. S. Kowalik, ed.), Springer-Verlag, New York.
Nicolau, A. 1984. Parallelism, memory anti-aliasing and correctness for trace scheduling compilers. Ph.D. Thesis, Yale University, New Haven, Conn.
Padua, D; A. and Wolfe, M. 1986. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12): 1184–1201.
Padua, D. A., Haiek, . ., Kuck, D. J., Lawrie, D. H., 1980. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computers, C-29(9).
Polychronopoulos, C. D. 1986. On program restructuring, scheduling, and communication for parallel processor systems. Ph.D. Thesis, CSRD No. 595, Center for Supercomputing Research and Development, University of Illinois.
Polychronopoulos, C. D. 1987. Loop coalescing: A compiler transformation for parallel machines. Proceedings of the 1987 International Conference on Parallel Processing (St. Charles, Ill., Aug, 1987).
Polychronopoulos, C. D. 1988a. Compiler optimizations for enhancing parallelism and their impact on architecture design. IEEE Transactions on Computers, C-37(8).
Polychronopoulos, C. D. 1988b. The impact of run-time overhead on usable parallelism. To appear in the Proceedings of the 1988 International Conference on Parallel Processing (St Charles, Ill. August 15–19, 1988).
Polychronopoulos, C. D., and Banerjee, U. 1987. Processor allocation for horizontal and vertical parallelism and related speedup bounds. IEEE Transactions on Computers, C-36(4).
Polychronopoulos, C. D., and Kuck, D. J. 1987. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C-36(12).
Sahni, S. 1984. Scheduling multipipeline and multiprocessor computers. IEEE Trans. Comput., C-33(7).
Schwan, K., and Gaimon, C. 1985. Automatic resource allocation for the Cm* multiprocessor. In Proc. of the 1985 International Conference on Distributed Computing Systems.
Smith, B. 1981. Architecture and applications of the HEP multiprocessor computer system. Real Time Processing IV, Proc. of SPIE, pp.241–248.
Stone, H. S. 1977. Multiprocessor scheduling with the aid of network flow algorithms. IEEE Trans. Software Engineering, SE-3(1).
Tang, P., and Yew, P. C. 1986. Processor self-scheduling for multiple-nested parallel loops. Proceedings of the 1986 International Conference on Parallel Processing.
Veidenbaum, A. V. 1985. Compiler optimizations and architecture design issues for multiprocessors. Ph.D. Thesis, UIUC-DCS-85-8012, CSRD, University of Illinois.
Wolfe, M. J. 1982. Optimizing supercompilers for supercomputers. Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCCDCS-R-82–1105.
Zhu, C. Q., and Yew, P. C. 1984. A synchronization scheme and its applications for large multiprocessor systems. In Proc. of the 1984 International Conference on Distributed Computing Systems, pp.486–493.
Author information
Authors and Affiliations
Additional information
This work was supported in part by the National Science Foundation under Grant NSF MIP-8410110, the U.S. Department of Energy under Grant DE-FG02-85ER25001, an IBM donation and a grant from AT&T.
Rights and permissions
About this article
Cite this article
Polychronopoulos, C.D. Toward auto-scheduling compilers. J Supercomput 2, 297–330 (1988). https://doi.org/10.1007/BF00129782
Issue Date:
DOI: https://doi.org/10.1007/BF00129782