IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
AbstractThe use of turbo codes has been proposed for sev-eral applications, including the deve... more AbstractThe use of turbo codes has been proposed for sev-eral applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to-noise ratios (SNR). The problem of extracting the best coding gains from these ...
IEEE Transactions on Circuits and Systems II: Express Briefs, 2000
ABSTRACT In this brief, we propose local automatic rate adjustment in network-on-chips (NoC) (LAU... more ABSTRACT In this brief, we propose local automatic rate adjustment in network-on-chips (NoC) (LAURA-NoC), a NoC with a distributed approach to dynamic voltage and frequency scaling (DVFS). The utilization of the switch buffers is used in a local feedback loop to automatically determine the appropriate clock frequency and voltage that allow the switch to sustain the rate at its input ports, without a global controller. The DVFS controller is simple and uses 2 voltage and 16 frequency values. We report a significant power saving compared to a global DVFS approach in a 45-nm CMOS technology, 33 % on average over four realistic video applications.
The design and the performance evaluation of a parallel architecture for digital signal processin... more The design and the performance evaluation of a parallel architecture for digital signal processing (DSP) are described. Many DSP applications cannot be efficiently solved by using standard DSP sequential microprocessors. In this case, using dedicated parallel architectures is the best way to obtain high performance, but they need flexibility to apply different DSP algorithms efficiently. The Pandora architecture provides a
This paper proposes a novel decoder architec- ture for the brand-new image coding standard JPEG20... more This paper proposes a novel decoder architec- ture for the brand-new image coding standard JPEG2000. Innovative technologies involved in JPEG2000 framework will lead to rapid development of new mobile multimedia applications. In this scenario a well known critical factor is that computational resources available on tetherless ter- minals are limited and valuable. Dedicated hardware units can completely fulfill elaboration tasks,
... In order to validate high level design choices, fi-nite precision effects have been thoroughl... more ... In order to validate high level design choices, fi-nite precision effects have been thoroughly investigated to-gether with H.263t low hit-rate requirements. In this work a 8 x 8 DCTI FPGA implementation is proposed, specifically ... 579-583. [9] Dhiraj Kumar and Keshab K. Parhi, ...
ACM Journal on Emerging Technologies in Computing Systems, 2013
ABSTRACT Biosequence alignment recently received an increasing support from both commodity and de... more ABSTRACT Biosequence alignment recently received an increasing support from both commodity and dedicated hardware platforms. Processing capabilities are constantly rising, but still not satisfying the limitless requirements of this application. We give an insight on the contribution to this need that can possibly be expected from emerging technology devices and architectures, focusing as an example on nanofabrics based on silicon Nano Wires. By varying a few parameters we explore the solution space, and demonstrate with proper figures of merit how this family of beyond CMOS structures could be considered as the effective disruptive technology for biosequence analysis applications.
ACM Journal on Emerging Technologies in Computing Systems, 2014
ABSTRACT Density and regularity are deemed as the major advantages of nanoarray architectures bas... more ABSTRACT Density and regularity are deemed as the major advantages of nanoarray architectures based on nanowires. Literature demonstrated that proper reliability analyzes must be performed and solutions have to be devised to improve nanoarrays yield. Their complexity and high fault probability claim for specific design automation tools able to explore circuit solutions, performance and fault tolerant approaches. We envision a simulator conceived to carry on characterizations in terms of logic behavior, defect-induced output error rate assessment, switching activity, power and timing performance. Though already existing for traditional technology, a simulator based on specific technological and topological tiled nanoarray descriptions, and conceived to join both device and architecture levels, has never been attempted at the degree of accuracy we present. Our contribution is twofold. First, marking a difference with respect to the state of the art, we developed an algorithm based on an event-driven engine which works at switch level and is not simply built on top of cost functions evaluations. The straightforward advantage is the possibility to follow the evolution of dynamic control sequences throughout all the inner components of the nanoarray, and, as a consequence, to obtain circuit level characterization as a projection of the real internal parameters. Second, we added to our simulator the capability to inject faults with specific statistical distributions associated to the nanoarray topology. Here we extract output error rates and yield for one of the possible nanoarray structures proposed in literature, the NASIC. Results specificity and accuracy demonstrate the simulator trustworthiness, its effectiveness for extensive nanoarrays characterization and its suitability as a foundation for both higher architectural and lower device simulation levels. The aim of this work, then, is to provide insights into the intertwined relation between actual technology and circuit design for these emerging fabrics, and, as a consequence, to clarify how defects and variability affect circuits and systems performance.
ACM Transactions on Embedded Computing Systems, 2014
ABSTRACT An UWB microwave imaging system for breast cancer detection consists of antennas, transc... more ABSTRACT An UWB microwave imaging system for breast cancer detection consists of antennas, transceivers, and a high-performance embedded system for elaborating the received signals and reconstructing breast images. In this article we focus on this embedded system. To accelerate the image reconstruction, the Beamforming phase has to be implemented in a parallel fashion. We assess its implementation in three currently available high-end platforms based on a multicore CPU, a GPU, and an FPGA, respectively. We then project the results applying technology scaling rules to future many-core CPUs, many-thread GPUs, and advanced FPGAs. We consider an optimistic case in which available resources increase according to Moore's law only, and a pessimistic case in which only a fraction of those resources are available due to a limited power budget. In both scenarios, an implementation that includes a high-end FPGA outperforms the other alternatives. Since the number of effectively usable cores in future many-cores will be power-limited, and there is a trend toward the integration of power-efficient accelerators, we conjecture that a chip consisting of a many-core section and a reconfigurable logic section will be the perfect platform for this application.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
AbstractThe use of turbo codes has been proposed for sev-eral applications, including the deve... more AbstractThe use of turbo codes has been proposed for sev-eral applications, including the development of wireless systems, where highly reliable transmission is required at very low signal-to-noise ratios (SNR). The problem of extracting the best coding gains from these ...
IEEE Transactions on Circuits and Systems II: Express Briefs, 2000
ABSTRACT In this brief, we propose local automatic rate adjustment in network-on-chips (NoC) (LAU... more ABSTRACT In this brief, we propose local automatic rate adjustment in network-on-chips (NoC) (LAURA-NoC), a NoC with a distributed approach to dynamic voltage and frequency scaling (DVFS). The utilization of the switch buffers is used in a local feedback loop to automatically determine the appropriate clock frequency and voltage that allow the switch to sustain the rate at its input ports, without a global controller. The DVFS controller is simple and uses 2 voltage and 16 frequency values. We report a significant power saving compared to a global DVFS approach in a 45-nm CMOS technology, 33 % on average over four realistic video applications.
The design and the performance evaluation of a parallel architecture for digital signal processin... more The design and the performance evaluation of a parallel architecture for digital signal processing (DSP) are described. Many DSP applications cannot be efficiently solved by using standard DSP sequential microprocessors. In this case, using dedicated parallel architectures is the best way to obtain high performance, but they need flexibility to apply different DSP algorithms efficiently. The Pandora architecture provides a
This paper proposes a novel decoder architec- ture for the brand-new image coding standard JPEG20... more This paper proposes a novel decoder architec- ture for the brand-new image coding standard JPEG2000. Innovative technologies involved in JPEG2000 framework will lead to rapid development of new mobile multimedia applications. In this scenario a well known critical factor is that computational resources available on tetherless ter- minals are limited and valuable. Dedicated hardware units can completely fulfill elaboration tasks,
... In order to validate high level design choices, fi-nite precision effects have been thoroughl... more ... In order to validate high level design choices, fi-nite precision effects have been thoroughly investigated to-gether with H.263t low hit-rate requirements. In this work a 8 x 8 DCTI FPGA implementation is proposed, specifically ... 579-583. [9] Dhiraj Kumar and Keshab K. Parhi, ...
ACM Journal on Emerging Technologies in Computing Systems, 2013
ABSTRACT Biosequence alignment recently received an increasing support from both commodity and de... more ABSTRACT Biosequence alignment recently received an increasing support from both commodity and dedicated hardware platforms. Processing capabilities are constantly rising, but still not satisfying the limitless requirements of this application. We give an insight on the contribution to this need that can possibly be expected from emerging technology devices and architectures, focusing as an example on nanofabrics based on silicon Nano Wires. By varying a few parameters we explore the solution space, and demonstrate with proper figures of merit how this family of beyond CMOS structures could be considered as the effective disruptive technology for biosequence analysis applications.
ACM Journal on Emerging Technologies in Computing Systems, 2014
ABSTRACT Density and regularity are deemed as the major advantages of nanoarray architectures bas... more ABSTRACT Density and regularity are deemed as the major advantages of nanoarray architectures based on nanowires. Literature demonstrated that proper reliability analyzes must be performed and solutions have to be devised to improve nanoarrays yield. Their complexity and high fault probability claim for specific design automation tools able to explore circuit solutions, performance and fault tolerant approaches. We envision a simulator conceived to carry on characterizations in terms of logic behavior, defect-induced output error rate assessment, switching activity, power and timing performance. Though already existing for traditional technology, a simulator based on specific technological and topological tiled nanoarray descriptions, and conceived to join both device and architecture levels, has never been attempted at the degree of accuracy we present. Our contribution is twofold. First, marking a difference with respect to the state of the art, we developed an algorithm based on an event-driven engine which works at switch level and is not simply built on top of cost functions evaluations. The straightforward advantage is the possibility to follow the evolution of dynamic control sequences throughout all the inner components of the nanoarray, and, as a consequence, to obtain circuit level characterization as a projection of the real internal parameters. Second, we added to our simulator the capability to inject faults with specific statistical distributions associated to the nanoarray topology. Here we extract output error rates and yield for one of the possible nanoarray structures proposed in literature, the NASIC. Results specificity and accuracy demonstrate the simulator trustworthiness, its effectiveness for extensive nanoarrays characterization and its suitability as a foundation for both higher architectural and lower device simulation levels. The aim of this work, then, is to provide insights into the intertwined relation between actual technology and circuit design for these emerging fabrics, and, as a consequence, to clarify how defects and variability affect circuits and systems performance.
ACM Transactions on Embedded Computing Systems, 2014
ABSTRACT An UWB microwave imaging system for breast cancer detection consists of antennas, transc... more ABSTRACT An UWB microwave imaging system for breast cancer detection consists of antennas, transceivers, and a high-performance embedded system for elaborating the received signals and reconstructing breast images. In this article we focus on this embedded system. To accelerate the image reconstruction, the Beamforming phase has to be implemented in a parallel fashion. We assess its implementation in three currently available high-end platforms based on a multicore CPU, a GPU, and an FPGA, respectively. We then project the results applying technology scaling rules to future many-core CPUs, many-thread GPUs, and advanced FPGAs. We consider an optimistic case in which available resources increase according to Moore's law only, and a pessimistic case in which only a fraction of those resources are available due to a limited power budget. In both scenarios, an implementation that includes a high-end FPGA outperforms the other alternatives. Since the number of effectively usable cores in future many-cores will be power-limited, and there is a trend toward the integration of power-efficient accelerators, we conjecture that a chip consisting of a many-core section and a reconfigurable logic section will be the perfect platform for this application.
Uploads
Papers by M. Zamboni