Proceedings of the Second International Conference on Systems Integration
ABSTRACT An efficient, massively parallel optimization technique is developed for solving the dyn... more ABSTRACT An efficient, massively parallel optimization technique is developed for solving the dynamic data allocation problem in medium to large scale applications such as computer integrated manufacturing (CIM) systems. This method is based on a significantly reduced feasible state search space. A statistical evaluation framework compares the performance of the proposed technique with other dynamic data allocation strategies. Algorithms are actually implemented for a variety of I/O task activation scenarios, with the number of task activation nodes ranging from 50 to 250. The overall performance of the proposed method has a significant improvement over other optimization strategies, especially as the number of task activation nodes increases
The problem of scheduling directed acyclic task graphs on an unbounded number of processors is co... more The problem of scheduling directed acyclic task graphs on an unbounded number of processors is considered. We present a single algorithm which is applicable to several special cases, thus effecting a unified approach to task scheduling independent of the task graph. We start by considering multi-stage dags and present an algorithm that computes a schedule in O(Nq log q) time,
We investigate the potential of workstation clustersfor use in high performance computation for s... more We investigate the potential of workstation clustersfor use in high performance computation for some selectedapplications. Currently, the network speed foundin most of the existing systems is quite low, but higherspeed networks are already emerging in the market.We present four parallel algorithms that performed astonishinglywell on a cluster of workstations connectedby Ethernet. Three of these are algorithms for sorting,matrix multiplication, and
The grid and the mesh of trees (or MOT) are among the best-known parallel architectures in the li... more The grid and the mesh of trees (or MOT) are among the best-known parallel architectures in the literature. Both of them enjoy efficient VLSI layouts, simplicity of topology, and a large number of parallel algorithms that can efficiently execute on them. One drawback of these architectures is that algorithms that perform best on one of them do not perform very well on the other. Thus there is a gap between the algorithmic capabilities of these two architectures. We propose a new class of parallel architectures, called the mesh-connected trees (or MCT) that can execute grid algorithms as efficiently as the grid, and MOT algorithms as efficiently as the MOT, up to a constant amount of slowdown. In particular, the MCT topology contains the MOT as a subgraph and emulates the grid via embedding with dilation 3 and congestion 2. This significant amount of computational versatility offered by the MCT comes at no additional VLSI area cost over these earlier networks. Many topological,...
There is a major problem with algorithm portability when the user switches from one parallel arch... more There is a major problem with algorithm portability when the user switches from one parallel architecture to another. Since algorithms are usually architecture-dependent, the algorithm running on the old architecture may not run on the new one. Standard techniques, like parallelizing compilers or emulation, have efficacies far below those of algorithms specifically developed for the individual architecture. This paper proposes a two-level approach to programming parallel computers that is applicable as long as the underlying interconnection architecture can be modeled as a product network (e.g. grid, torus, hypercube, etc.). Our approach assumes that there are some low-level routines optimized for the "factor" networks comprising the product network. The set of low-level routines can be implemented as library routines. The high-level programming is then achieved, oblivious to the topology of the factor networks, by decomposing computations in a manner that only uses the se...
With the rapid growth of the number of web pages, designing a search engine that can retrieve hig... more With the rapid growth of the number of web pages, designing a search engine that can retrieve high quality information in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches and they take a long time to run. It is argued in the literature that link-following search methods can substantially increase the search quality, provided that these methods use an accurate assumption about useful patterns in the hyperlink topology of the web. Recent work in the field has focused on detecting identi able patterns in the web graph and exploiting this information to improve the performance of search algorithms. We survey relevant work in this area and comment on the implications of these patterns for other areas such as advertisement and marketing.
If G is a connected graph with N nodes, its r dimensional product contains N r nodes. We present ... more If G is a connected graph with N nodes, its r dimensional product contains N r nodes. We present an algorithm which sorts N r keys stored in the r- dimensional product of any graph G in O(r 2 S(N )) time where S(N ) depends on G. We show that for any graph G, S(N ) is bounded above by O(N ), establishing an upper bound of O(r 2 N ) for the time complexity of sorting N r keys on any product network. When r is fixed, this leads to the asymptotic complexity O(N ) to sort N r keys, which is optimal for several instances of product networks. There are graphs for which S(N ) = O(Log 2 N ) which leads to the asymptotic running time of O(Log 2 N ). Keywords: sorting, interconnection networks, product networks, algorithms, odd-even merge. 1 Introduction In [1], Batcher presented two efficient sorting networks. Algorithms derived from these networks have been presented for a number of different parallel architectures, like the shuffle-exchange network [10], the grid [11, 5], ...
This paper first presents some general properties of product networks pertinent to parallel archi... more This paper first presents some general properties of product networks pertinent to parallel architectures and then focuses on three case studies. These are products of complete binary trees, shuffle-exchange, and de Bruijn networks. It is shown that all of these are powerful architectures for parallel computation, as evidenced by their ability to efficiently emulate numerous other architectures. In particular, r-dimensional grids, and r-dimensional meshes of trees can be embedded efficiently in products of these graphs, i.e. either as a subgraph or with small constant dilation and congestion. In addition, the shuffle-exchange network can be embedded in r-dimensional product of shuffle exchange networks with dilation cost 2r and congestion cost 2. Similarly, the de Bruijn network can be embedded in r-dimensional product of de Bruijn networks with dilation cost r and congestion cost 4. Moreover, it is well known that shuffle-exchange and de Bruijn graphs can emulate the hypercu...
Abstract—The optimal bisection width of r-dimensional N × ···×N grid is known to be N r−1 when N ... more Abstract—The optimal bisection width of r-dimensional N × ···×N grid is known to be N r−1 when N is even, but when N is odd, only approximate values are available. This paper shows that the exact bisection width of grid is Nr −1 when N N−1 is odd. 8 (a)
Gagan Agarwal, Ohio State University, USA Aijun An, York University, Canada Peter Andreae, Victor... more Gagan Agarwal, Ohio State University, USA Aijun An, York University, Canada Peter Andreae, Victoria University of Wellington, New Zealand Luiza Antonie, University of Alberta, Canada Chris Bailey-Kellogg, Dartmouth College, USA Arindam Banerjee, University of Minnesota, Twin Cities, USA Rohan Baxter, ATO, Australia Roberto Bayardo, Google, USA Chiranjib Bhattacharya, Indian Institute of Science, Bangalore, India Indrajit Bhattacharya, IBM Research, Delhi, India Sourav Bhowmick, Nanyang Technological University, Singapore ...
IEEE International Symposium on Circuits and Systems, 1990
Abstract An efficient search method for the configuration of stack filters is presented. Nonlinea... more Abstract An efficient search method for the configuration of stack filters is presented. Nonlinear digital filters such as these are known to be flexible and effective in applications where it is difficult to characterize the noise type. An open problem associated with ...
[1991] Proceedings. 11th International Conference on Distributed Computing Systems, 1991
A detailed analytical and simulation model that accurately captures the effect of communication d... more A detailed analytical and simulation model that accurately captures the effect of communication delay for local area networks is presented. To demonstrate the framework, load sharing algorithms are presented and evaluated both with and without the effect of the communication network delay. The algorithms use the Ethernet communication protocol to their advantage and provide superior performance compared to several published
[1993] Proceedings. The 13th International Conference on Distributed Computing Systems, 1993
In a set of high performance workstations connected by a network, many workstations may be underu... more In a set of high performance workstations connected by a network, many workstations may be underutilized by their owners. While each workstation may be primarily responsible for executing its owner's tasks with the highest priority, the unused processing capacity may be made available to computationally intensive tasks submitted externally to the system. Static co-scheduling for such an environment has been
Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation, 1989
A network based on the hypercube, called the multiply twisted cube is proposed. This network pres... more A network based on the hypercube, called the multiply twisted cube is proposed. This network preserves many of the desirable properties of the hypercube, but has a diameter which is only [(n+1)/2] for an n-dimensional multiple twisted cube, a reduction of nearly 50% compared to the ordinary hypercube. Some of the basic topological properties of multiply twisted cubes are discussed,
1994 International Conference on Parallel Processing-Vol 1 (ICPP'94), 1994
Abstract Recently, the mesh connected trees (MCT) network has been proposed as a possible archite... more Abstract Recently, the mesh connected trees (MCT) network has been proposed as a possible architecture for parallel computers. MCT networks are obtained by combining complete binary trees using the cross product operation. This paper focuses on structural, embedding, routing, and layout properties of the MCT networks. We show that MCT networks are computationally more powerful than grids and complete binary trees, and at least as powerful as meshes of trees (MOT). Analysis of VLSI complexity shows thai the ...
Proceedings of the Second International Conference on Systems Integration
ABSTRACT An efficient, massively parallel optimization technique is developed for solving the dyn... more ABSTRACT An efficient, massively parallel optimization technique is developed for solving the dynamic data allocation problem in medium to large scale applications such as computer integrated manufacturing (CIM) systems. This method is based on a significantly reduced feasible state search space. A statistical evaluation framework compares the performance of the proposed technique with other dynamic data allocation strategies. Algorithms are actually implemented for a variety of I/O task activation scenarios, with the number of task activation nodes ranging from 50 to 250. The overall performance of the proposed method has a significant improvement over other optimization strategies, especially as the number of task activation nodes increases
The problem of scheduling directed acyclic task graphs on an unbounded number of processors is co... more The problem of scheduling directed acyclic task graphs on an unbounded number of processors is considered. We present a single algorithm which is applicable to several special cases, thus effecting a unified approach to task scheduling independent of the task graph. We start by considering multi-stage dags and present an algorithm that computes a schedule in O(Nq log q) time,
We investigate the potential of workstation clustersfor use in high performance computation for s... more We investigate the potential of workstation clustersfor use in high performance computation for some selectedapplications. Currently, the network speed foundin most of the existing systems is quite low, but higherspeed networks are already emerging in the market.We present four parallel algorithms that performed astonishinglywell on a cluster of workstations connectedby Ethernet. Three of these are algorithms for sorting,matrix multiplication, and
The grid and the mesh of trees (or MOT) are among the best-known parallel architectures in the li... more The grid and the mesh of trees (or MOT) are among the best-known parallel architectures in the literature. Both of them enjoy efficient VLSI layouts, simplicity of topology, and a large number of parallel algorithms that can efficiently execute on them. One drawback of these architectures is that algorithms that perform best on one of them do not perform very well on the other. Thus there is a gap between the algorithmic capabilities of these two architectures. We propose a new class of parallel architectures, called the mesh-connected trees (or MCT) that can execute grid algorithms as efficiently as the grid, and MOT algorithms as efficiently as the MOT, up to a constant amount of slowdown. In particular, the MCT topology contains the MOT as a subgraph and emulates the grid via embedding with dilation 3 and congestion 2. This significant amount of computational versatility offered by the MCT comes at no additional VLSI area cost over these earlier networks. Many topological,...
There is a major problem with algorithm portability when the user switches from one parallel arch... more There is a major problem with algorithm portability when the user switches from one parallel architecture to another. Since algorithms are usually architecture-dependent, the algorithm running on the old architecture may not run on the new one. Standard techniques, like parallelizing compilers or emulation, have efficacies far below those of algorithms specifically developed for the individual architecture. This paper proposes a two-level approach to programming parallel computers that is applicable as long as the underlying interconnection architecture can be modeled as a product network (e.g. grid, torus, hypercube, etc.). Our approach assumes that there are some low-level routines optimized for the "factor" networks comprising the product network. The set of low-level routines can be implemented as library routines. The high-level programming is then achieved, oblivious to the topology of the factor networks, by decomposing computations in a manner that only uses the se...
With the rapid growth of the number of web pages, designing a search engine that can retrieve hig... more With the rapid growth of the number of web pages, designing a search engine that can retrieve high quality information in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches and they take a long time to run. It is argued in the literature that link-following search methods can substantially increase the search quality, provided that these methods use an accurate assumption about useful patterns in the hyperlink topology of the web. Recent work in the field has focused on detecting identi able patterns in the web graph and exploiting this information to improve the performance of search algorithms. We survey relevant work in this area and comment on the implications of these patterns for other areas such as advertisement and marketing.
If G is a connected graph with N nodes, its r dimensional product contains N r nodes. We present ... more If G is a connected graph with N nodes, its r dimensional product contains N r nodes. We present an algorithm which sorts N r keys stored in the r- dimensional product of any graph G in O(r 2 S(N )) time where S(N ) depends on G. We show that for any graph G, S(N ) is bounded above by O(N ), establishing an upper bound of O(r 2 N ) for the time complexity of sorting N r keys on any product network. When r is fixed, this leads to the asymptotic complexity O(N ) to sort N r keys, which is optimal for several instances of product networks. There are graphs for which S(N ) = O(Log 2 N ) which leads to the asymptotic running time of O(Log 2 N ). Keywords: sorting, interconnection networks, product networks, algorithms, odd-even merge. 1 Introduction In [1], Batcher presented two efficient sorting networks. Algorithms derived from these networks have been presented for a number of different parallel architectures, like the shuffle-exchange network [10], the grid [11, 5], ...
This paper first presents some general properties of product networks pertinent to parallel archi... more This paper first presents some general properties of product networks pertinent to parallel architectures and then focuses on three case studies. These are products of complete binary trees, shuffle-exchange, and de Bruijn networks. It is shown that all of these are powerful architectures for parallel computation, as evidenced by their ability to efficiently emulate numerous other architectures. In particular, r-dimensional grids, and r-dimensional meshes of trees can be embedded efficiently in products of these graphs, i.e. either as a subgraph or with small constant dilation and congestion. In addition, the shuffle-exchange network can be embedded in r-dimensional product of shuffle exchange networks with dilation cost 2r and congestion cost 2. Similarly, the de Bruijn network can be embedded in r-dimensional product of de Bruijn networks with dilation cost r and congestion cost 4. Moreover, it is well known that shuffle-exchange and de Bruijn graphs can emulate the hypercu...
Abstract—The optimal bisection width of r-dimensional N × ···×N grid is known to be N r−1 when N ... more Abstract—The optimal bisection width of r-dimensional N × ···×N grid is known to be N r−1 when N is even, but when N is odd, only approximate values are available. This paper shows that the exact bisection width of grid is Nr −1 when N N−1 is odd. 8 (a)
Gagan Agarwal, Ohio State University, USA Aijun An, York University, Canada Peter Andreae, Victor... more Gagan Agarwal, Ohio State University, USA Aijun An, York University, Canada Peter Andreae, Victoria University of Wellington, New Zealand Luiza Antonie, University of Alberta, Canada Chris Bailey-Kellogg, Dartmouth College, USA Arindam Banerjee, University of Minnesota, Twin Cities, USA Rohan Baxter, ATO, Australia Roberto Bayardo, Google, USA Chiranjib Bhattacharya, Indian Institute of Science, Bangalore, India Indrajit Bhattacharya, IBM Research, Delhi, India Sourav Bhowmick, Nanyang Technological University, Singapore ...
IEEE International Symposium on Circuits and Systems, 1990
Abstract An efficient search method for the configuration of stack filters is presented. Nonlinea... more Abstract An efficient search method for the configuration of stack filters is presented. Nonlinear digital filters such as these are known to be flexible and effective in applications where it is difficult to characterize the noise type. An open problem associated with ...
[1991] Proceedings. 11th International Conference on Distributed Computing Systems, 1991
A detailed analytical and simulation model that accurately captures the effect of communication d... more A detailed analytical and simulation model that accurately captures the effect of communication delay for local area networks is presented. To demonstrate the framework, load sharing algorithms are presented and evaluated both with and without the effect of the communication network delay. The algorithms use the Ethernet communication protocol to their advantage and provide superior performance compared to several published
[1993] Proceedings. The 13th International Conference on Distributed Computing Systems, 1993
In a set of high performance workstations connected by a network, many workstations may be underu... more In a set of high performance workstations connected by a network, many workstations may be underutilized by their owners. While each workstation may be primarily responsible for executing its owner's tasks with the highest priority, the unused processing capacity may be made available to computationally intensive tasks submitted externally to the system. Static co-scheduling for such an environment has been
Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation, 1989
A network based on the hypercube, called the multiply twisted cube is proposed. This network pres... more A network based on the hypercube, called the multiply twisted cube is proposed. This network preserves many of the desirable properties of the hypercube, but has a diameter which is only [(n+1)/2] for an n-dimensional multiple twisted cube, a reduction of nearly 50% compared to the ordinary hypercube. Some of the basic topological properties of multiply twisted cubes are discussed,
1994 International Conference on Parallel Processing-Vol 1 (ICPP'94), 1994
Abstract Recently, the mesh connected trees (MCT) network has been proposed as a possible archite... more Abstract Recently, the mesh connected trees (MCT) network has been proposed as a possible architecture for parallel computers. MCT networks are obtained by combining complete binary trees using the cross product operation. This paper focuses on structural, embedding, routing, and layout properties of the MCT networks. We show that MCT networks are computationally more powerful than grids and complete binary trees, and at least as powerful as meshes of trees (MOT). Analysis of VLSI complexity shows thai the ...
Uploads
Papers