Charles Leiserson | Massachusetts Institute of Technology (MIT) - Academia.edu

Skip to main content

Charles Leiserson

Massachusetts Institute of Technology (MIT), Electrical Engineering and Computer Science, Faculty Member

Followers

449

Following

0

Public Views

Interests

Uploads

Papers

Memory-mapped transactions

Abstract Memory-mapped transactions combine the advantages of both memory mapping and transaction... more Abstract Memory-mapped transactions combine the advantages of both memory mapping and transactions to provide a programming interface for concurrently accessing data on disk without explicit I/O or locking operations. This interface enables a programmer to design a complex serial program that accesses only main memory, and with little to no modification, convert the program into correct code with multiple processes that can simultaneously access disk.

A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm

Page 1. A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm Tiankai Liu under ... more Page 1. A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm Tiankai Liu under the direction of Prof. Charles E. Leiserson Massachusetts Institute of Technology Research Science Institute July 29, 2003 Page 2. Abstract This paper studies an implementation of the Karp-Zhang Parallel Branch-and-Bound algorithm on a shared memory machine. By employing it to solve a solitaire card puzzle, empirical data on the speedup of the algorithm is (going to be) obtained. Page 3.

Ф Ь м Ф ийж ЙХйаи б Ша н ж

Ф Ь м ОКМ з бйаи б да н ж з в зд аан гж и да н г ж гж а ззжггб а ийж зК Ф Ь м ОКМ да нз бйаи б гв... more Ф Ь м ОКМ з бйаи б да н ж з в зд аан гж и да н г ж гж а ззжггб а ийж зК Ф Ь м ОКМ да нз бйаи б гвз зи в г знв жгв о й гИ к гИ в Шгл жШг виЙзина за зК Св и гв иг гвижгаз гббгван гйв в бйаи Й б да н жзИ Ф Ь м ОКМ ийж з гвижгаз з в зд аан гж а ийж Йбйаи б да н зй з йзигб о а з дИ к ж а Йзд да н л и д и Йвгжб а о и гвИ в жглз а и б а в г за зК

Transactions Everywhere

Arguably, one of the biggest deterrants for software developers who might otherwise choose to wri... more Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task.

High-performance architectures for adaptive filtering based on the Gram-Schmidt algorithm Kyle A. Gallivan Government Aerospace Systems Division, Harris Corporation PO Box 94000, Melbourne, Florida 32902

Abstract The difficulties in designing systolic processors can be reduced by applying the archite... more Abstract The difficulties in designing systolic processors can be reduced by applying the architectural transformations of code motion, retiming, slowdown, coalescing, parallel/serial compromises and partitioning to a more easily designed combinational or semisystolic form of the processor. In this paper, the use of these transformations and the attendant tradeoffs in the design of architectures for adaptive filtering based on the Gram-Schmidt algorithm are considered.

Deterministic parallel random-number generation for dynamic-multithreading platforms

Abstract Existing concurrency platforms for dynamic multithreading do not provide repeatable para... more Abstract Existing concurrency platforms for dynamic multithreading do not provide repeatable parallel random-number generators. This paper proposes that a mechanism called pedigrees be built into the runtime system to enable efficient deterministic parallel random-number generation. Experiments with the open-source MIT Cilk runtime system show that the overhead for maintaining pedigrees is negligible.

JCilk--A Java-Based Multithreaded Programming Language

JCilk is a Java-based multithreaded language for parallel programming that extends the semantics ... more JCilk is a Java-based multithreaded language for parallel programming that extends the semantics of Java by introducing" Cilk-like"[1][2] linguistic constructs for parallel control. The original Cilk language provides a dynamic multithreading model that supports call-return semantics in a C language context. The Cilk system also includes a provably good scheduler that guarantees programs can take full advantage of the resources available at runtime.

The Pochoir stencil compiler

Abstract A stencil computation repeatedly updates each point of a d-dimensional grid as a functio... more

Design and analysis of a nondeterministic parallel breadth-first search algorithm

I have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph u... more I have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. My PBFS program on a single processor runs as quickly as a standard C++ breadth-first search implementation. PBFS achieves high workefficiency by using a novel implementation of a multiset data structure, called a" bag," in place of the FIFO queue usually employed in serial breadth-first search algorithms.

Lab 3| Active Messages

The purpose of this programming assignment is to develop an understanding of asynchronous algorit... more The purpose of this programming assignment is to develop an understanding of asynchronous algorithms involving active messages. You will improve existing code that draws pictures of the so-called Mandelbrot set. Your Mandelbrot program will be written in C, and it will run on the xolas cluster. The assignment explores issues such as deadlock prevention, request-reply protocols, and termination protocols.

Research Abstracts-2007

In this research, we address the problem of adaptive scheduling and resource allocation in the do... more In this research, we address the problem of adaptive scheduling and resource allocation in the domain of dynamic multithreading. Most existing parallel programming systems are nonadaptive, where each job is assigned a fixed number of processors. This policy places the burden of estimating the parallelism of the job on the programmer. In addition, nonadaptive scheduling may lead to a poor use of available resources.

Dynamic Processor Allocation for Adaptively Parallel Jobs

In this research we address the problem of scheduling many adaptively parallel jobs on a multipro... more In this research we address the problem of scheduling many adaptively parallel jobs on a multiprocessor system [4, 5, 6]. An adaptively parallel job is a job that can change its parallelism in the course of its execution. Today, most multiprocessor systems use static allocation, where a fixed number of processors is allocated to the job for its lifetime. This policy places the burden of estimating the parallelism of the job on the programmer.

Macro-level scheduling in the Cilk network of workstations environment

Abstract The term\ macro-level scheduling" refers to nding and recruiting idle workstations and a... more Abstract The term\ macro-level scheduling" refers to nding and recruiting idle workstations and allocating them to various adaptively parallel applications. In this thesis, I have designed and implemented a macro-level scheduler for the Cilk Network of Workstations environment. Cilk-NOW provides the\ micro-level scheduling" needed to allow programs to be executed adaptively in parallel on an unreliable network of workstations. This macro-level scheduler is designed to be hassle-free and easy to use and customize.

Building blocks and excluded sums

In this model we are adding numbers xj and excluding a neighborhood of xi. In the FMM, the xj bec... more In this model we are adding numbers xj and excluding a neighborhood of xi. In the FMM, the xj become representations of functions, which are accurate only at some distance from point i. This core, though perhaps obvious, was buried for many years. It took a trip to Japan and years of classroom presentations (Edelman, Leiserson), and a recent conversation over lunch at MIT (Demaine, Demaine, Edelman, Persson) before we could articulate the essence of the FMM.

CilkTM 1.2 (Version 1) Reference Manual

This document describes CilkTM 1.2 (Version 1), a C language extension and its supporting runtime... more This document describes CilkTM 1.2 (Version 1), a C language extension and its supporting runtime system intended for developing continuation-passing style multi-threaded programs on CM-5. Cilk grew out of efforts in implementing a simple scheduling and execution model on top of CM-5's active message layer, and in adapting it to the needs of real life application programs.

Portable fault-tolerant file I/O

Abstract The ftIO system provides portable and fault-tolerant le I/O by enhancing the functionali... more Abstract The ftIO system provides portable and fault-tolerant le I/O by enhancing the functionality of the ANSI C le system without changing its application programmer interface and without depending on system-speci c implementations of the standard le operations. The ftIO-system is an extension of the porch compiler and its runtime system. The porch compiler automatically generates code to save the internal state of a program in a portable checkpoint.

A new competitive analysis of randomized caching

We provide new competitive upper bounds on the performance of the memoryless, randomized caching ... more We provide new competitive upper bounds on the performance of the memoryless, randomized caching algorithm RAND. Our bounds are expressed in terms of the inherent hit rate α of the sequence of memory references, which is the highest possible hit rate that any algorithm can achieve on the sequence for a cache of a given size. Our results show that RAND is (1-αe-1/α)/(1-α)-competitive on any reference sequence with inherent hit rate α.

How to survive the multicore software revolution (or at least survive the hype)

An irreversible shift towards multicore x86 processors is underway. Building multicore processors... more

Coding Stencil Computations Using the Pochoir Stencil-Specification Language

Abstract Pochoir is a compiler for a domain-specific language embedded in C++ which produces exce... more Abstract Pochoir is a compiler for a domain-specific language embedded in C++ which produces excellent code from a simple specification of a desired stencil computation. Pochoir allows a wide variety of boundary conditions to be specified, and it automatically parallelizes and optimizes cache performance. Benchmarks of Pochoir-generated code demonstrate a performance advantage of 2–10 times over standard parallel loop implementations.

Safe Open-Nested Transactions

Open-nested transactions [2–5] have been proposed as a loophole for transactional memory (TM) to ... more Open-nested transactions [2–5] have been proposed as a loophole for transactional memory (TM) to increase concurrency on highly contended resources in transactional programs. Programs that use open nesting can be difficult to reason about because open nesting breaks serializability at the level of memory semantics. Evidence suggests that an unconstrained use of open nesting cannot be encapsulated, ie, that programmers may need to be aware of whether subroutines contain open-nested transactions.

Memory-mapped transactions

Abstract Memory-mapped transactions combine the advantages of both memory mapping and transaction... more Abstract Memory-mapped transactions combine the advantages of both memory mapping and transactions to provide a programming interface for concurrently accessing data on disk without explicit I/O or locking operations. This interface enables a programmer to design a complex serial program that accesses only main memory, and with little to no modification, convert the program into correct code with multiple processes that can simultaneously access disk.

A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm

Page 1. A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm Tiankai Liu under ... more Page 1. A Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm Tiankai Liu under the direction of Prof. Charles E. Leiserson Massachusetts Institute of Technology Research Science Institute July 29, 2003 Page 2. Abstract This paper studies an implementation of the Karp-Zhang Parallel Branch-and-Bound algorithm on a shared memory machine. By employing it to solve a solitaire card puzzle, empirical data on the speedup of the algorithm is (going to be) obtained. Page 3.

Ф Ь м Ф ийж ЙХйаи б Ша н ж

Ф Ь м ОКМ з бйаи б да н ж з в зд аан гж и да н г ж гж а ззжггб а ийж зК Ф Ь м ОКМ да нз бйаи б гв... more Ф Ь м ОКМ з бйаи б да н ж з в зд аан гж и да н г ж гж а ззжггб а ийж зК Ф Ь м ОКМ да нз бйаи б гвз зи в г знв жгв о й гИ к гИ в Шгл жШг виЙзина за зК Св и гв иг гвижгаз гббгван гйв в бйаи Й б да н жзИ Ф Ь м ОКМ ийж з гвижгаз з в зд аан гж а ийж Йбйаи б да н зй з йзигб о а з дИ к ж а Йзд да н л и д и Йвгжб а о и гвИ в жглз а и б а в г за зК

Transactions Everywhere

Arguably, one of the biggest deterrants for software developers who might otherwise choose to wri... more Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task.

High-performance architectures for adaptive filtering based on the Gram-Schmidt algorithm Kyle A. Gallivan Government Aerospace Systems Division, Harris Corporation PO Box 94000, Melbourne, Florida 32902

Abstract The difficulties in designing systolic processors can be reduced by applying the archite... more Abstract The difficulties in designing systolic processors can be reduced by applying the architectural transformations of code motion, retiming, slowdown, coalescing, parallel/serial compromises and partitioning to a more easily designed combinational or semisystolic form of the processor. In this paper, the use of these transformations and the attendant tradeoffs in the design of architectures for adaptive filtering based on the Gram-Schmidt algorithm are considered.

Deterministic parallel random-number generation for dynamic-multithreading platforms

Abstract Existing concurrency platforms for dynamic multithreading do not provide repeatable para... more Abstract Existing concurrency platforms for dynamic multithreading do not provide repeatable parallel random-number generators. This paper proposes that a mechanism called pedigrees be built into the runtime system to enable efficient deterministic parallel random-number generation. Experiments with the open-source MIT Cilk runtime system show that the overhead for maintaining pedigrees is negligible.

JCilk--A Java-Based Multithreaded Programming Language

JCilk is a Java-based multithreaded language for parallel programming that extends the semantics ... more JCilk is a Java-based multithreaded language for parallel programming that extends the semantics of Java by introducing" Cilk-like"[1][2] linguistic constructs for parallel control. The original Cilk language provides a dynamic multithreading model that supports call-return semantics in a C language context. The Cilk system also includes a provably good scheduler that guarantees programs can take full advantage of the resources available at runtime.

The Pochoir stencil compiler

Abstract A stencil computation repeatedly updates each point of a d-dimensional grid as a functio... more

Design and analysis of a nondeterministic parallel breadth-first search algorithm

I have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph u... more I have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. My PBFS program on a single processor runs as quickly as a standard C++ breadth-first search implementation. PBFS achieves high workefficiency by using a novel implementation of a multiset data structure, called a" bag," in place of the FIFO queue usually employed in serial breadth-first search algorithms.

Lab 3| Active Messages

The purpose of this programming assignment is to develop an understanding of asynchronous algorit... more The purpose of this programming assignment is to develop an understanding of asynchronous algorithms involving active messages. You will improve existing code that draws pictures of the so-called Mandelbrot set. Your Mandelbrot program will be written in C, and it will run on the xolas cluster. The assignment explores issues such as deadlock prevention, request-reply protocols, and termination protocols.

Research Abstracts-2007

In this research, we address the problem of adaptive scheduling and resource allocation in the do... more In this research, we address the problem of adaptive scheduling and resource allocation in the domain of dynamic multithreading. Most existing parallel programming systems are nonadaptive, where each job is assigned a fixed number of processors. This policy places the burden of estimating the parallelism of the job on the programmer. In addition, nonadaptive scheduling may lead to a poor use of available resources.

Dynamic Processor Allocation for Adaptively Parallel Jobs

In this research we address the problem of scheduling many adaptively parallel jobs on a multipro... more In this research we address the problem of scheduling many adaptively parallel jobs on a multiprocessor system [4, 5, 6]. An adaptively parallel job is a job that can change its parallelism in the course of its execution. Today, most multiprocessor systems use static allocation, where a fixed number of processors is allocated to the job for its lifetime. This policy places the burden of estimating the parallelism of the job on the programmer.

Macro-level scheduling in the Cilk network of workstations environment

Abstract The term\ macro-level scheduling" refers to nding and recruiting idle workstations and a... more Abstract The term\ macro-level scheduling" refers to nding and recruiting idle workstations and allocating them to various adaptively parallel applications. In this thesis, I have designed and implemented a macro-level scheduler for the Cilk Network of Workstations environment. Cilk-NOW provides the\ micro-level scheduling" needed to allow programs to be executed adaptively in parallel on an unreliable network of workstations. This macro-level scheduler is designed to be hassle-free and easy to use and customize.

Building blocks and excluded sums

In this model we are adding numbers xj and excluding a neighborhood of xi. In the FMM, the xj bec... more In this model we are adding numbers xj and excluding a neighborhood of xi. In the FMM, the xj become representations of functions, which are accurate only at some distance from point i. This core, though perhaps obvious, was buried for many years. It took a trip to Japan and years of classroom presentations (Edelman, Leiserson), and a recent conversation over lunch at MIT (Demaine, Demaine, Edelman, Persson) before we could articulate the essence of the FMM.

CilkTM 1.2 (Version 1) Reference Manual

This document describes CilkTM 1.2 (Version 1), a C language extension and its supporting runtime... more This document describes CilkTM 1.2 (Version 1), a C language extension and its supporting runtime system intended for developing continuation-passing style multi-threaded programs on CM-5. Cilk grew out of efforts in implementing a simple scheduling and execution model on top of CM-5's active message layer, and in adapting it to the needs of real life application programs.

Portable fault-tolerant file I/O

Abstract The ftIO system provides portable and fault-tolerant le I/O by enhancing the functionali... more Abstract The ftIO system provides portable and fault-tolerant le I/O by enhancing the functionality of the ANSI C le system without changing its application programmer interface and without depending on system-speci c implementations of the standard le operations. The ftIO-system is an extension of the porch compiler and its runtime system. The porch compiler automatically generates code to save the internal state of a program in a portable checkpoint.

A new competitive analysis of randomized caching

We provide new competitive upper bounds on the performance of the memoryless, randomized caching ... more We provide new competitive upper bounds on the performance of the memoryless, randomized caching algorithm RAND. Our bounds are expressed in terms of the inherent hit rate α of the sequence of memory references, which is the highest possible hit rate that any algorithm can achieve on the sequence for a cache of a given size. Our results show that RAND is (1-αe-1/α)/(1-α)-competitive on any reference sequence with inherent hit rate α.

How to survive the multicore software revolution (or at least survive the hype)

An irreversible shift towards multicore x86 processors is underway. Building multicore processors... more

Coding Stencil Computations Using the Pochoir Stencil-Specification Language

Abstract Pochoir is a compiler for a domain-specific language embedded in C++ which produces exce... more Abstract Pochoir is a compiler for a domain-specific language embedded in C++ which produces excellent code from a simple specification of a desired stencil computation. Pochoir allows a wide variety of boundary conditions to be specified, and it automatically parallelizes and optimizes cache performance. Benchmarks of Pochoir-generated code demonstrate a performance advantage of 2–10 times over standard parallel loop implementations.

Safe Open-Nested Transactions

Open-nested transactions [2–5] have been proposed as a loophole for transactional memory (TM) to ... more Open-nested transactions [2–5] have been proposed as a loophole for transactional memory (TM) to increase concurrency on highly contended resources in transactional programs. Programs that use open nesting can be difficult to reason about because open nesting breaks serializability at the level of memory semantics. Evidence suggests that an unconstrained use of open nesting cannot be encapsulated, ie, that programmers may need to be aware of whether subroutines contain open-nested transactions.