Shanthala 2009
Shanthala 2009
Shanthala 2009
Abstract— In the majority of the Digital signal processing consumption of the system. Therefore, the main motivation of
(DSP) applications, the critical operations usually involve many this research is to investigate various pipelined
multiplications and /or accumulations. So, for real time signal multiplier/accumulator architectures and circuit design
processing applications, high throughput multiplier –accumulator techniques that are suitable for implementing high throughput
(MAC) is always a key element to achieve a high-performance signal processing algorithms and at the same time achieve low
digital signal processing application. In the last few years, the power Consumption
main consideration of MAC design is to enhance its speed. This is
because speed and throughput rate are always the concerns of
digital signal processing systems. However due to the increase of 2. ONE BIT FULL ADDER ARCHITECTURES
portable electronic products, low power designs also become
another major consideration. This is because, the limited battery
energy of these portable products restricts the power consumption The major cell of multiplier and accumulator is a 1-bit full
of the system. Therefore the main motivation is to investigate adder, which decides the operational speed, power dissipation
various pipelined MAC architectures and circuit and the design and area of the MAC. That is, the operational speed of full
techniques which are suitable for the implementation of high adder between two pipeline stages decides the system clock
through put signal processing algorithms. The goal of this
rate. A fully pipelined full adder designed with True Single
project was to design and VLSI implementation of pipelined MAC
for high-speed DSP applications at 180nm technology. For Phase Clock (TSPC) has been considered but the transistor
designing the pipelined MAC, various architectures of multipliers count and power dissipation is large. The Complementary Pass
and one bit full adders are considered. The static and dynamic one Transistor logic(CPL) based designs have shown that they have
bit full adder was implemented as the basic block. For checking the advantage of low power dissipation and rather high speed
the functionality of the whole system, spice code is written using of operation but their transistor count for pipelining design is
the HSPICE by defining all the blocks in the circuit as the sub
also high due to the requirement to latch both the outputs and
circuits. Then a schematic capture is done using schematic
composer from virtuoso starting from bottom level to top level. their Complements. In this section, we propose a new circuit
Finally the layout for the complete MAC is done using virtuoso. design, which has high operation speed, smallest transistor
count and lowest power/speed ratio. Our Design is based on
Keywords: DSP, MAC, CMOS, Pipeline, static and Quasi-Domino dynamic full adder circuit design, but several
dynamic modifications have been made.
1. INTRODUCTION
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.
signal in P-block and a NMOS transistor controlled by clock parasitic capacitance and therefore, the setup and power
signal in N-block. Besides, a resistive NMOS is added in series dissipation. So if a complementary static CMOS logic is used,
with the N-switch under the P-block to purposely reduce the the above condition still exists. However, if static pseudo-
pull-down voltage swing during precharge phases without NMOS logic is used, it can avoid the weakness of PMOS.
enlarging the N-switch gate length and avoid the increase of Further, it reduces one transistor and one clock input so the
capacitive load on the clock Signals. A complementary single- layout area and power dissipation will be reduced. However,
phase clock scheme is used in this pipeline system. The clk and the use of pseudo-NMOS logic will increase the static power
clkb are interchanged from one pipelines stage to another. consumption, so the size of the PMOS transistor should be
During the pre charge phase, depending on the input pattern, carefully chosen to compromise power consumption and speed.
node 1 and 2 are not always fully driven to VSS and VDD, Note that, it is the dynamic power (charge and discharging
respectively. This is because if a conduction path exists loading capacitors) that dominates the overall power
between VDD and VSS, and the output node may stage consumption.
halfway between VDD and VSS. However the proceeding
stages are in evaluation phase, so in the end of the precharge
phase the inputs are correct. Thus during the following
evaluation phase, the discharging (charging) switch is turned
off and the PMOS logic tree (NMOS logic tree) pulls node 1
(2) back to VDD (VSS). The existence of dc path in dynamic
gates does not affect the accuracy of the operation of the full
adder but it can reduce the discharging or charging time at
evaluation phase. This is merit of original circuit. The worst
case delay of quasi-domino full adder is to evaluate node 1
(through PMOS logic tree) and node 2 (through NMOS logic
tree). We find that the precharge phase the proceeding pipeline
stage is in evaluation phase and at the end of the phase, the
output signals of the proceeding stage are already correct. So,
if we change the carry block of the cell into static, then the
Fig 3: Pseudo-NMOS logic
carry block can be partially evaluated during the precharge
phase. As a result, the delay can be shortened. So in the new
full adder design, we use static pseudo-NMOS logic instead of
the conventional PMOS-type domino logic shown in figure 2.
The merit of the change is two fold, increase of speed and
decrease in power consumption and overhead
For the sum block, we use the original NMOS logic tree
dynamic circuit because node 1 is not yet correct in the
beginning of evaluation phase. There is no advantage of
changing the sum block in to static logic. Thus, the NMOS
logic tree of the quasi-domino structure is still used to achieve
Fig 2: Conventional CMOS logic
high speed and low power dissipation. One modification to the
The increase of speed has been described early. With regard to sum block is that, we place the NMOS device controlled by the
the decrease of the power consumption and area overhead, as last arriving carry signals nearest the output of the sum block.
we know, the speed of PMOS device is slower than the NMOS. By using this scheme, the early signals in effect discharge
The increase in PMOS size is usually two or three times larger internal nodes and the last arriving signals only have to switch
than NMOS. The increase in PMOS size also increases the transistor with minimum body effect. The new full adder is a
382
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.
combination of static and dynamic logic design[ Jou, Chen, size n bits has n square gates. For multiplication algorithms
Chung and Su,2000]. So we name it S&D full adder. performed in DSP applications, latency and throughput are the
two major constraints from delay perspective.
2. MULTIPLIER ARCHITECTURES
383
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.
L – low, M – medium, H-high, V.H –very high, SI-simple, S-
small, LG- larger, A-average
384
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.
5. RESULTS
Latency 6 cycles
385
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.
6. CONCLUSION [5] Shyh-Jye Jou, Chang-Yu Chen, En-Chung and Chau-
Chin Su “ A Pipeline Multiplier-Accumulator Using a
Initially, different MAC architectures were analyzed to High Speed Low-Power Static and Dynamic Full
determine the optimal topology for the given performance Adder” Journal of Solid State State Circuits, Vol 32,
specifications with minimum power and speed. Second, the no- 1, January 2000.
exact implementation of the chosen architecture was [6] G. Goto, et. Al., “A 54x54-b regularly structured tree
investigated in an effort to use the maximum amount of speed. multiplier”, IEEE J. Solid-State Circuits, vol. 27, no.
After a complete analysis, the block was simulated for 9, Sept. 1992.
functionality verification. Once confirmation of correct [7] Pascal C.H. Meier, Rob A. Rutenbar and Richard
operation was achieved, a complete layout was done in order to carley, “Exploring multiplier architecture and Layout
optimize the area. Simulation results from Spice demonstrate for low power”, IEEE Custom Integrated circuits
that the MAC achieves all required performance specifications Conference, 1996.
in terms of accuracy and performance parameters such as delay [8] John Kim, Earl E. Swartzlander, “Improving the
and power. In terms of power and area, the design is dissipating Recursive Multiplier” IEEE Trans. VLSI Systems,
only 50.26 mw of power and is 3*1.05 mm2 in area. The vol-5, pp 2-5, 2000.
latency of the design is 6 clock cycles. [9] Beril Seda Çiftçi “Design and Realization of a High
Speed 64 x 64 – bit Multiplier for Low Power
Applications” Sabancı University Spring 2003.
REFERENCES [10] S.Shah, A. J. AI-Khabb, D. AI-Khabb, “Comparison
of 32-bit Multipliers for Various Performance
[1] Kihak Shin, Ik Kyun Oh, Sang Min, Beom Seom Ryu, Measures”, The 12th International Conference on
Kie Young Lee and Tae Won Cho “ A Multi-Level Microelectronics Tehran, Oct. 31- Nov.2, 2000.
Approach to Low Power Mac Design” IEEE Trans. [11] Sung-Mo Kang and Yusuf Leblebici, “CMOS Digital
VLSI systems, vol 48 , pp 361- 763, 1999. integrated circuits”, Tata McGraw-Hill Publishing
[2] Ichiro Kuroda, Eri Murata, Kouhei Nadehara, Company Limited, 2003.
Kazumasa Suzukit Tomohisa Araitt and Atsushi [12] Sung-Mo Kang and Yusuf Leblebici, “CMOS Digital
Okamuratt “A 16-bit Parallel Mac Architecture for a integrated circuits”, Third Edition,Tata McGraw-Hill
Multimedia Risc Processor” IEEE Trans. VLSI Publishing Company Limited, 2003.
systems, vol. 83, no. 83, pp 103-112, 1995. [13] Jan M.Rabaey, Anantha Chandrakasan and Borivoje
[3] Jae Sung Lee, Young Seop Jeon, and Myung H. Nikolic, “Digital Integrated Circuits”,Second Edition,
Sunwoo “ Design of New Dsp instructions and their Prentince Hall Electronics and VLSI series, 2004.
Hardware Architecture for High-Speed FFT” IEEE [14] M.Tech. Credit Seminar Report, Electronics Systems
Trans. VLSI systems,, pp 80-90, 2001. Group, EE Dept, IIT Bombay, submitted November
[4] Dusan Suvakovic, C. Andre, Salama “A Pipelined ’02-2000,DSP Architectures For System Design”,Vi
Multiply-Accumulate Unit Design for Energy (02307910), Supervisor: Prof A.N. Chandorkar.
Recovery DSP Systems” IEEE Internationa
Symposium on Circuits and Systems, May 28-31,
2000.
386
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on September 15,2020 at 06:42:33 UTC from IEEE Xplore. Restrictions apply.