Rajib Nath

Publication Date: 2012

Publication Name: Lecture Notes in Computer Science

Research Interests:
Multicore processors and QR Factorization

ABSTRACT In this paper, we present a first general purpose GPU thermal management design that consists of both hardware architecture and OS scheduler changes. Our techniques schedule thread blocks from multiple computational kernels in... more

ABSTRACT In this paper, we present a first general purpose GPU thermal management design that consists of both hardware architecture and OS scheduler changes. Our techniques schedule thread blocks from multiple computational kernels in spatial, temporal, and spatio-temporal ways depending on the thermal state of the system. We can reduce the computation slowdown by 60% on average relative to the state of the art techniques while meeting the thermal constraints. We also extend our work to multi GPGPU cards and show improvements of 44% on average relative to existing technique.

Publication Date: 2013

Publication Name: Proceedings of the 50th Annual Design Automation Conference on - DAC '13

Publication Date: 2013

Publication Name: 2013 IEEE Symposium on Computers and Communications (ISCC)

Research Interests:
Power Densities

Download ()

Publication Date: 2010

Publication Name: Sensors and Actuators B: Chemical

Research Interests:
Materials Engineering, Analytical Chemistry, Scanning Electron Microscopy, High Resolution Transmission Electron Microscopy, UV/Vis spectroscopy, and 3 moreHigh Temperature, X ray diffraction, and Quantum Dot

Download ()

Publication Date: 2000

Publication Name: Polymer International

Research Interests:
Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer

Download ()

Publication Date: 1998

Publication Name: Polymer International

Research Interests:
Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer

Publication Date: 2007

Publication Name: Crystal Research and Technology

Research Interests:
Thin Film, Optical Properties, and Biochemistry and cell biology

Download ()

AbstractAs GPUs are quickly evolving in complexity, tuning numerical libraries for them is becoming more challenging. We present an autotuning approach in the area of dense linear algebra (DLA) libraries for GPUs. The MAGMA library is... more

AbstractAs GPUs are quickly evolving in complexity, tuning numerical libraries for them is becoming more challenging. We present an autotuning approach in the area of dense linear algebra (DLA) libraries for GPUs. The MAGMA library is used to demonstrate the techniques and ...

Publication Date: 2010

Publication Name: cseweb.ucsd.edu

Publisher: hal.inria.fr

Publication Date: Jun 29, 2010

Publication Date: 2008

Research Interests:
Fault Tolerant

Publication Date: Nov 1, 2009

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the... more

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs' compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithm features two ...

Publisher: icl.cs.utk.edu

Publication Date: 2010

Publication Name: IEEE Transaction on Parallel and Distributed Systems

Research Interests:
High performance

Download (.pdf)

Publication Date: 2010

Research Interests:
Scientific Computing, Complex System, Parallel & Distributed Computing, Hybrid Algorithm, and QR Factorization

Download (.pdf)

Abstract GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the... more

Abstract GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector product (SYMV) for dense linear algebra. Optimizing the SYMV kernel is important because it forms the basis of fundamental algorithms such as linear solvers and eigenvalue solvers on symmetric matrices.

Publication Date: 2011

Abstract In this work we propose a joint energy, thermal and cooling management technique (JETC) that significantly reduces per server cooling and memory energy costs. Our analysis shows that decoupling the optimization of cooling energy... more

Abstract In this work we propose a joint energy, thermal and cooling management technique (JETC) that significantly reduces per server cooling and memory energy costs. Our analysis shows that decoupling the optimization of cooling energy of CPU & memory and the optimization of memory energy leads to suboptimal solutions due to thermal dependencies between CPU and memory and non-linearity in cooling energy.

Publication Date: Feb 25, 2012

Download (.pdf)

Abstract: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle... more

Abstract: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures. We show that it is hard to rely on a model, which motivates us to design a fully empirical approach. We exhibit few strong empirical properties that enable us to efficiently prune the search space.

Journal Name: Euro-Par 2011 Parallel Processing

Publication Date: 2011

Abstract We present an improved matrix��matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture... more

Abstract We present an improved matrix��matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi's new architectural features, most notably their extended memory hierarchy and memory sizes.

Journal Name: International Journal of High Performance Computing Applications

Publication Date: Nov 1, 2010

Download (.pdf)

Abstract Dense linear algebra (DLA) is one of the most important softwares in high performance computing. It is also important for it's wide usage in other application domains like machine learning, gaming, speech processing, image... more

Abstract Dense linear algebra (DLA) is one of the most important softwares in high performance computing. It is also important for it's wide usage in other application domains like machine learning, gaming, speech processing, image processing, etc. The introduction of new machines from vendor provides us opportunities to optimize DLA libraries for the new machines and thus exploit their power. Unfortunately the optimization phase is not straightforward all the time.

Publication Date: 2010

Download (.pdf)

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the... more

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs' compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing.

Publication Date: 2010

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous multicore with GPU accelerators that can exceed 25�� the performance of the corresponding LAPACK algorithm running on current homogeneous multicores. This... more

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous multicore with GPU accelerators that can exceed 25�� the performance of the corresponding LAPACK algorithm running on current homogeneous multicores. This enormous acceleration is due to proper matching of algorithmic requirements to architectural strengths of the system's hybrid components.

Journal Name: Parallel Computing

Publication Date: Dec 31, 2010

Download (.pdf)

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the... more

Abstract. We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs' compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing.

Journal Name: IEEE Transaction on Parallel and Distributed Systems

Publication Date: 2010

Download (.pdf)

The goal of the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project is to create a new generation of linear algebra libraries that achieve the fastest possible time to an accurate solution on hybrid/heterogeneous... more

The goal of the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project is to create a new generation of linear algebra libraries that achieve the fastest possible time to an accurate solution on hybrid/heterogeneous architectures, starting with current multicore+ multiGPU systems.

Publication Date: 2010

Download (.pdf)

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that... more

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs.

Journal Name: High Performance Computing for Computational Science��VECPAR 2010

Publication Date: 2011

Download (.pdf)

Recent activities of major chip manufacturers, such as Intel, AMD, IBM and NVIDIA, make it more evident than ever that future designs of microprocessors and large HPC systems will be hybrid/heterogeneous in nature, relying on the... more

Recent activities of major chip manufacturers, such as Intel, AMD, IBM and NVIDIA, make it more evident than ever that future designs of microprocessors and large HPC systems will be hybrid/heterogeneous in nature, relying on the integration (in varying proportions) of two major types of components:

Publication Date: 2010

Download (.pdf)

We present a Cholesky factorization for multicore with GPU accelerators systems. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge... more

We present a Cholesky factorization for multicore with GPU accelerators systems. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs' compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing.

Journal Name: High Performance Computing for Computational Science��VECPAR 2010

Publication Date: 2011

Download (.pdf)

Abstract Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear... more

Abstract Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these critical solvers in the area of dense linear algebra (DLA) for multicore with GPU accelerators.

Publication Date: 2012

Publication Name: Lecture Notes in Computer Science

Research Interests: Multicore processors and QR Factorization<div>()</div>

Publication Date: 2013

Publication Name: Proceedings of the 50th Annual Design Automation Conference on - DAC '13

Publication Date: 2013

Publication Name: 2013 IEEE Symposium on Computers and Communications (ISCC)

Research Interests: Power Densities<div>()</div>

Publication Date: 2010

Publication Name: Sensors and Actuators B: Chemical

Publication Date: 2000

Publication Name: Polymer International

Research Interests: Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer<div>()</div>

Publication Date: 1998

Publication Name: Polymer International

Research Interests: Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer<div>()</div>

Publication Date: 2007

Publication Name: Crystal Research and Technology

Research Interests: Thin Film, Optical Properties, and Biochemistry and cell biology<div>()</div>

Publication Date: 2010

Publication Name: cseweb.ucsd.edu

Publisher: hal.inria.fr

Publication Date: Jun 29, 2010

Publication Date: 2008

Research Interests: Fault Tolerant<div>()</div>

Publication Date: Nov 1, 2009

Publisher: icl.cs.utk.edu

Publication Date: 2010

Publication Name: IEEE Transaction on Parallel and Distributed Systems

Research Interests: High performance<div>()</div>

Publication Date: 2010

Research Interests: Scientific Computing, Complex System, Parallel & Distributed Computing, Hybrid Algorithm, and QR Factorization<div>()</div>

Publication Date: 2011

Publication Date: Feb 25, 2012

Journal Name: Euro-Par 2011 Parallel Processing

Publication Date: 2011

Journal Name: International Journal of High Performance Computing Applications

Publication Date: Nov 1, 2010

Publication Date: 2010

Publication Date: 2010

Journal Name: Parallel Computing

Publication Date: Dec 31, 2010

Journal Name: IEEE Transaction on Parallel and Distributed Systems

Publication Date: 2010

Publication Date: 2010

Journal Name: High Performance Computing for Computational Science���VECPAR 2010

Publication Date: 2011

Publication Date: 2010

Journal Name: High Performance Computing for Computational Science���VECPAR 2010

Publication Date: 2011

Publication Date: Apr 19, 2010

Log In

Research Interests:
Multicore processors and QR Factorization

Research Interests:
Power Densities

Research Interests:
Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer

Research Interests:
Materials Engineering, Chemical Engineering, Analytical Chemistry, and Polymer

Research Interests:
Thin Film, Optical Properties, and Biochemistry and cell biology

Research Interests:
Fault Tolerant

Research Interests:
High performance

Research Interests:
Scientific Computing, Complex System, Parallel & Distributed Computing, Hybrid Algorithm, and QR Factorization

Journal Name: High Performance Computing for Computational Science��VECPAR 2010

Journal Name: High Performance Computing for Computational Science��VECPAR 2010