DSP Architecture - Part 1
DSP Architecture - Part 1
DSP Architecture - Part 1
x n+1 is
obtained by shifting x
n
so that the
[n+1] th sample becomes first element and all
the elements of the x array are shifted right
such that ith element of x
n
becomes [i+1] the
element of x
n+1
The
content of the product register is added to
accumulator before new product is stored
Further
the content of dma is copied to next
location whose address is dma+1.
Harvard architecture
Harvard architecture explained
This employed entirely separate memory systems to store instructions and
data
CPU fetched the next instruction
It also fetched data simultaneously
Its unique feature is instruction address space and data address space are
separate
Each address space can have the same address
So An address does NOT uniquely specify a memory location
You also need to store which address space you are referring to.
This will use two buses one for accessing instructions and one for
accessing data
Von Neumann architecture
Von Neumann architecture explained
It employs one address space
Instructions and data are stored in the same address
space
The PC refers to the next instruction
It takes the instruction, examines it and the instruction
would be having pointers to operands
If the pointer gets corrupted, there is a possibility of
program abending
As it fetches instruction and then data, this
architecture is slow
So P-DSPs rarely use this architecture
Modified Harvard architecture
In a Pure Harvard architecture, mechanisms need to be
provided to load programs into program memory and
initial data into data memory
Modern machines use Multiple buses
One will access both program memory and data memory
One will access only data memory
Data can also be transferred from one memory to another
memory
This feature is used in modern day P-DSPs
This is helpful at start time too as constant data can be
transferred from program memory to data memory
Advantage of having multiple busses
Number of accesses/memory cycle can be
increased
Motorola DSP5600X, DSP96002 have three
memory buses and three memory
accesses/cycle
TMS320C54X has four memory buses and four
memory accesses/cycle
Multiple access memory
Memory that permits more than one memory
access per cycle is called Multiple access
memory
Dual access RAM technology permits two
memory accesses per clock cycle
Four memory accesses are also possible if
Dual access RAM memory is connected to P-
DSP with two independent address and data
buses
Multiported memory
No of accesses can be increased using multiport
memory
Typical 2 port memories will have two memory
address buses and two data buses
Thus two different chips need not be used in
Harvard architecture
Disadvantage
Increased complexity
More number of pins, more area and increased cost
VLIW architecture
VLIW Very long instruction word
Transmeta crusoe is a chip that uses this technique
TMS320C6X also uses similar technique
This reads relatively large group of instructions
They execute them at the same time
For this purpose they have
Many ALUs
Many Multipliers
Many shifters etc.,
VLIW is accessed from memory and it specifies the operations and
operands for performing on different data paths
It simply increases the number of instructions executed per cycle
Performance gain with VLIW depends on parallelism achievable
with the algorithm
Instruction pipelining
An instruction may have many phases
Fetch
Decode
Execute
Write
Throughput will be low if all these are executed
serially as when one stage is busy others are idle
All these stages could be operated parallely in
pipelining technique which will improve
throughput
Pipelining diagram
Special addressing modes in P-DSPs
Short immediate addressing
Short direct addressing
Memory mapped addressing
Indirect addressing
Bit reversed addressing
Circular addressing
Special addressing modes explained - 1
Short immediate addressing
Operand is specified as a short constant
This forms part of the instruction
Length depends on P-DSP
Example TMS320C5X an 8 bit constant could be used
Short direct addressing
The lower order address of operand is specified as part of the
instruction
Higher order bits could be stored elsewhere like a page
pointer
Example
TITMS320 DSP lower 7 bits are specified in instruction
Motorola DSP5600X lower 6 bits are specified in instruction
Special addressing modes explained - 2
Memory mapped addressing
CPU registers and I/O registers are accessed as memory locations
This is done by storing them in the initial or final page
Example
TMS320C5x page 0 corresponds to CPU registers and I/O registers
Motorola DSP5600X last page is used
Indirect addressing
Address of operands can be stored in one of the registers called
indirect access registers
When operands are fetched from addresses specified in registers, the
registers are updated
This is by done having another special CPU or ALU for updating these
addresses
Increment can be 1 or an offset in some special registr
Special addressing modes explained - 3
Bit reversed addressing
Binary pattern corresponding to a particular decimal number is
obtained by writing the natural binary equivalent in reverse
order
Therefore LSB becomes MSB and MSB becomes LSB
Address is incremented or decremented in bit reversed form
Circular addressing mode
In real time data will be continuously coming
If they are stored in linear buffers, buffer would be exhausted
If they are stored in circular buffer, new data would overwrite
older data
No need to check whether we have reached the end of buffer
Use of linear buffer
Use of circular buffer
Example of circular addressing
Limitations of circular buffering
Methodology for a circular buffer
On Chip peripherals
On chip timer
They generate periodic interrupts to the DSP
They also generate sampling clocks for A/D
converters
Serial port
They enable data communication between P-DSP and peripherals such
as ADC,DAC or a RS-232C device.
These ports have buffers such that the DSP sends data and reads data
to these ports in parallel form but the data is sent out through these
ports in serial form and data is read from these ports in serial form
On Chip peripherals contd..
TDM serial port
a special serial port which permits P-DSP to
communicate with other devices or other P-DSPs
using Time division multiplexing format
Parallel ports
They are faster than serial port
Bit I-O port
These are only single bit wide
They can be individually set, reset or read
These bits are used for control purposes or for data transfer also
On Chip peripherals contd..
Host port
A special type of parallel port the P-DSPs have
This enables the P-DSPs to communicate with a processor or a PC
which is called host
They can communicate data through this
They can generate interrupts
They also help P-DSP to load a program from ROM to RAM
Common ports
They are used for communication between many P-DSPs in a
multiprogrammed system
On Chip ADCs and DACs
They are used to enable P-DSP communicate with analog world
They are used in cellular phones and tapeless answering machines
TMS320C50
Complex DSP operations
Sum of products is the most key element in
most DSP algorithms
Algorithm Equation
Finite Impulse Response Filter
=
=
M
k
k
k n x a n y
0
) ( ) (
Infinite Impulse Response Filter
= =
+ =
N
k
k
M
k
k
k n y b k n x a n y
1 0
) ( ) ( ) (
Convolution
=
=
N
k
k n h k x n y
0
) ( ) ( ) (
Discrete Fourier Transform
=
=
1
0
] ) / 2 ( exp[ ) ( ) (
N
n
nk N j n x k X t
Discrete Cosine Transform
( ) ( )
=
(
+ =
1
0
1 2
2
cos ). ( ). (
N
x
x u
N
x f u c u F
t