Programmable digital signal processors
Introduction
Leading manufacturers of integrated circuits such as Texas
Instruments (TI), Analog devices & Motorola manufacture the
digital signal processor (DSP) chips
These manufacturers have developed a range of DSP chips with
varied complexity
The TMS320 family consists of two types of single chips DSPs: 16-
bit fixed point & 32-bit floating-point
These DSPs possess the operational flexibility of high-speed
controllers and the numerical capability of array processors
Commercial Digital Signal-Processing Devices
There are several families of commercial DSP devices
Right from the early eighties, when these devices began to appear in
the market, they have been used in numerous applications
Such as communication, control, computers, Instrumentation,
and consumer electronics
The architectural features and the processing power of these devices
have been constantly upgraded based on the advances in technology
and the application needs
However, their basic versions, most of them have Harvard
architecture, a single-cycle hardware multiplier, an address
generation unit with dedicated address registers, special addressing
modes, on-chip peripherals interfaces
Of the various families of programmable DSP devices that are
commercially available, the three most popular ones are those from
Texas Instruments, Motorola, and Analog Devices
Texas Instruments was one of the first to come out with a
commercial programmable DSP with the introduction of its
TMS32010 in 1982
Table1: Summary of the Architectural Features of three fixed-Points
Architectural Feature TMS320C25 DSP 56000 ADSP 2100
Data representation 16-bit fixed point 24-bit fixed format 16-bit fixed format
format
Hardware multiplier 16X16 24X24 16X16
ALU 32 bits 56 bits 40 bits
Internal bus 16-bit program bus 24-bit program bus 24-bit program bus
16-bit data bus 2X24-bit data buses 16-bit data bus
24-bit global data bus 16-bit result bus
External bus 16-bit program/data 24-bit program/data 24-bit program bus
bus bus 16-bit data bus
On-chip memory 544 words RAM 512 words PROM -
4k words ROM 2X256 words data
RAM
2X256 words data
ROM
Off-chip memory 64K words program 64K words program 16K words program
64K words data 2X64K words data 16K words data
Cache memory - - 16 words program
Instruction cycle time 100nsec 97.5 nsec 125nsec
Special addressing Bit reversed Modulo Modulo
modes Bit reversed Bit reversed
Data address modes 1 2 2
Interfacing features Synchronous serial I/O Synchronous and DMA
DMA asynchronous serial
I/O DMA
The architecture of TMS320C54xx digital signal processors
TMS320C54xx processors retain in the basic Harvard architecture
of their predecessor, TMS320C25, but have several additional
features, which improve their performance over it
Figure 3.1 shows a functional block diagram of TMS320C54xx
processors
They have one program and three data memory spaces with
separate buses, which provide simultaneous accesses to program
instruction and two data operands and enables writing of result at
the same time
Part of the memory is implemented on-chip and consists of
combinations of ROM, dual-access RAM, and single-access RAM
Transfers between the memory spaces are also possible
The central processing unit (CPU) of TMS320C54xx processors
consists of
1. A 40-bit arithmetic logic unit (ALU)
2. Two 40-bit accumulators
3. A barrel shifter
4. A 17x17 multiplier
5. A 40-bit adder
6. Data address generation logic (DAGEN) with its own arithmetic
unit
7. Program address generation logic (PAGEN)
These major functional units are supported by a number of registers
and logic in the architecture
A powerful instruction set with a hardware-supported, single-
instruction repeat and block repeat operations, block memory move
instructions, instructions that pack two or three simultaneous reads,
and arithmetic instructions with parallel store and load make these
devices very efficient for running high-speed DSP algorithms
Several peripherals, such as a clock generator, a hardware timer, a
wait state generator, parallel I/O ports, and serial I/O ports, are also
provided on-chip
These peripherals make it convenient to interface the signal
processors to the outside world
In these following sections, we examine in detail the various
architectural features of the TMS320C54xx family of processors
Functional architecture for TMS320C54xx processors
Bus Structure
The performance of a processor gets enhanced with the provision of
multiple buses to provide simultaneous access to various parts of
memory or peripherals
The 54xx architecture is built around four pairs of 16-bit buses with
each pair consisting of an address bus and a data bus
As shown in Figure above these are -
1. The program bus pair (PAB, PB); which carries the instruction
code from the program memory
2. Three data bus pairs (CAB, CB; DAB, DB; and EAB, EB); which
interconnected the various units within the CPU
3. In Addition the pair CAB, CB and DAB, DB are used to read from
the data memory, while the pair EAB, EB; carries the data to be
written to the memory
The 54xx can generate up to two data-memory addresses per cycle
using the two auxiliary register arithmetic unit (ARAU0 and
ARAU1) in the DAGEN block
It enables accessing two operands simultaneously
Central Processing Unit (CPU)
The 54xx CPU is common to all the 54xx devices
The 54xx CPU contains
1. A 40-bit arithmetic logic unit (ALU)
2. Two 40-bit accumulators (A and B)
3. A barrel shifter
4. A 17 x 17-bit multiplier
5. A 40-bit adder
6. A compare, select and store unit (CSSU)
7. An exponent encoder (EXP)
8. A data address generation unit (DAGEN) and
9. A program address generation unit (PAGEN)
The ALU performs 2’s complement arithmetic operations and bit-
level Boolean operations on 16, 32, and 40-bit words
It can also function as two separate 16-bit ALUs and perform two
16-bit operations simultaneously
Figure below show the functional diagram of the ALU of the
TMS320C54xx family of devices
Functional diagram of the central processing unit of the
TMS320C54xx processors
Accumulators A and B
It stores the output from the ALU or the multiplier/adder block and
provide a second input to the ALU
Each accumulators is divided into three parts: guards bits (bits 39-
32), high-order word (bits-31-16), and low-order word (bits 15-
0), which can be stored and retrieved individually
Each accumulator is memory-mapped and partitioned
It can be configured as the destination registers
The guard bits are used as a head margin for computations
Barrel shifter
It provides the capability to scale the data during an operand read or
write
No overhead is required to implement the shift needed for the scaling
operations
The 54xx barrel shifter can produce a left shift of 0 to 31 bits or a
right shift of 0 to 16 bits on the input data
The shift count field of status registers ST1, or in the temporary
register T
Figure below shows the functional diagram of the barrel shifter of
TMS320C54xx processors
The barrel shifter and the exponent encoder normalize the values in
an accumulator in a single cycle
The LSBs of the output are filled with 0s, and the MSBs can be
either zero filled or sign extended, depending on the state of the sign-
extension mode bit in the status register ST1
An additional shift capability enables the processor to perform
numerical scaling, bit extraction, extended arithmetic, and overflow
prevention operations
Functional diagram of the barrel shifter
Multiplier/adder unit
The kernel of the DSP device architecture is multiplier/adder unit
The multiplier/adder unit of TMS320C54xx devices performs 17 x
17 2’s complement multiplication with a 40-bit addition effectively
in a single instruction cycle
In addition to the multiplier and adder, the unit consists of control
logic for integer and fractional computations and a 16-bit temporary
storage register, T
Figure below show the functional diagram of the multiplier/adder
unit of TMS320C54xx processors
The compare, select, and store unit (CSSU) is a hardware unit
specifically incorporated to accelerate the add/compare/select
operation
This operation is essential to implement the Viterbi algorithm used in
many signal-processing applications
The exponent encoder unit supports the EXP instructions, which
stores in the T register the number of leading redundant bits of the
accumulator content
This information is useful while shifting the accumulator content for
the purpose of scaling
Internal Memory and Memory-Mapped Registers
The amount and the types of memory of a processor have direct
relevance to the efficiency and performance obtainable in
implementations with the processors
The 54xx memory is organized into three individually selectable
spaces: program, data, and I/O spaces
All 54xx devices contain both RAM and ROM
RAM can be either dual-access type (DARAM) or single-access type
(SARAM)
The on-chip RAM for these processors is organized in pages having
128 word locations on each page
The 54xx processors have a number of CPU registers to support
operand addressing and computations
The CPU registers and peripherals registers are all located on page 0
of the data memory
Figure (a) and (b) shows the internal CPU registers and peripheral
registers with their addresses
The processors mode status (PMST) registers that is used to
configure the processor
It is a memory-mapped register located at address 1Dh on page 0 of
the RAM
A part of on-chip ROM may contain a boot loader and look-up tables
for function such as sine, cosine, μ- law, and A- law
(a) Internal memory-mapped registers of TMS320C54xx processors
(b) Peripheral registers for the TMS320C54xx processors
Status registers (ST0, ST1)
ST0: Contains the status of flags (OVA, OVB, C, TC) produced by
arithmetic operations & bit manipulations
ST1: Contain the status of various conditions & modes
Bits of ST0 & ST1 registers can be set or clear with the SSBX &
RSBX instructions
PMST: Contains memory-setup status & control information
Status register0 diagram
ARP: Auxiliary register pointer
TC: Test/control flag
C: Carry bit
OVA: Overflow flag for accumulator A
OVB: Overflow flag for accumulator B
DP: Data-memory page pointer
Status register1 diagram
BRAF: Block repeat active flag
BRAF=0, the block repeat is deactivated
BRAF=1, the block repeat is activated
CPL: Compiler mode
CPL=0, the relative direct addressing mode using data page pointer
is selected
CPL=1,the relative direct addressing mode using stack pointer is
selected
HM: Hold mode, indicates whether the processor continues internal
execution or acknowledge for external interface
INTM: Interrupt mode, it globally masks or enables all interrupts
INTM=0_all unmasked interrupts are enabled
INTM=1_all masked interrupts are disabled
0: Always read as 0
OVM: Overflow mode
OVM=1_the destination accumulator is set either the most positive
value or the most negative value
OVM=0_the overflowed result is in destination accumulator
SXM: Sign extension mode
SXM=0 _Sign extension is suppressed
SXM=1_Data is sign extended
C16: Dual 16 bit/double-Precision arithmetic mode
C16=0_ALU operates in double-Precision arithmetic mode
C16=1_ALU operates in dual 16-bit arithmetic mode
FRCT: Fractional mode
FRCT=1_the multiplier output is left-shifted by 1bit to compensate
an extra sign bit
CMPT: Compatibility mode
CMPT=0_ ARP is not updated in the indirect addressing mode
CMPT=1_ARP is updated in the indirect addressing mode
ASM: Accumulator Shift Mode. 5 bit field, & specifies the Shift
value within -16 to 15 range
Processor Mode Status Register (PMST)
INTR: Interrupt vector pointer, point to the 128-word program page
where the interrupt vectors reside
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled
MP/MC=1, the on chip ROM is disabled
OVLY: RAM OVERLAY, OVLY enables on chip dual access data
RAM blocks to be mapped into program space
AVIS: It enables/disables the internal program address to be visible
at the address pins
DROM: Data ROM, DROM enables on-chip ROM to be mapped
into data space
CLKOFF: CLOCKOUT off
SMUL: Saturation on multiplication
SST: Saturation on store
Data Addressing Modes of TMS320C54X Processors
Data addressing modes provide various ways to access operands to
execute instructions and place results in the memory or the registers
The 54XX devices offer seven basic addressing modes
1. Immediate addressing
2. Absolute addressing
3. Accumulator addressing
4. Direct addressing
5. Indirect addressing
6. Memory mapped addressing
7. Stack addressing
1. Immediate addressing
The instruction contains the specific value of the operand
The operand can be short (3,5,8 or 9 bit in length) or long (16 bits in
length)
The instruction syntax for short operands occupies one memory
location,
Example: LD #20, DP
RPT #0FFFFh
2. Absolute Addressing
The instruction contains a specified address in the operand
(i). Dmad addressing
MVDK Smem, dmad
MVDM dmad, MMR
(ii). Pmad addressing
MVDP Smem, pmad
MVPD pmem, Smad
(iii). PA addressing
PORTR PA, Smem
(iv). *(lk) addressing
Example:
MVKP 1000h, *AR5; 1000 H →*AR5 (dmad addressing)
MVPD 1000h, *AR7 ; 1000h →*AR7 (pmad addressing)
PORTR 05h, *AR3 ; 05h →*AR3 (PA addressing)
LD *(1000h), A ; *(1000h)→ A (*(lk) addressing)
3. Accumulator Addressing
Accumulator content is used as address to transfer data between
Program and Data memory
Ex: READA *AR2
4. Direct Addressing
Base address + 7 bits of value contained in instruction = 16 bit
address
A page of 128 locations can be accessed without change in DP or SP
Compiler mode bit (CPL) in ST1 register is used
If CPL =0 selects DP
CPL = 1 selects SP
It should be remembered that when SP is used instead of DP, the
effective address is computed by adding the 7-bit offset to SP
Block diagram of the direct addressing mode for TMS320C54xx
Processors
5. Indirect Addressing
Data space is accessed by address present in an auxiliary register
54xx have 8, 16 bit auxiliary register (AR0 – AR7)
Two auxiliary register arithmetic units (ARAU0 & ARAU1)
Used to access memory location in fixed step size
AR0 register is used for indexed and bit reverse addressing modes
For single – operand addressing
MOD → type of indirect addressing
ARF →AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP
Block diagram of the indirect addressing mode for TMS320C54xx
Processors
Table : Indirect addressing options with a single data –memory operand
Circular Addressing
Operand syntax Function
*ARx Addr = ARx;
*ARx - Addr = ARx ; ARx = ARx -1
*ARx + Addr = ARx; ARx = ARx +1
*+ARx Addr = ARx+1; ARx = ARx +1
*ARx - 0B Addr = ARx ; ARx = B(ARx – AR0)
*ARx – 0 Addr = Arx ; ARx = ARx – AR0
*ARx + 0 Addr = Arx ; ARx = ARx +AR0
*ARx + 0B Addr = ARx ; ARx = B(ARx + AR0)
*ARx - % Addr = ARx ; ARx = circ(ARx – 1)
*+AR – 0% Addr = ARx; ARx = circ(ARx - AR0)
*ARx + % Addr = ARx ; ARx = circ(ARx + 1)
Used in convolution, correlation and FIR filters
A circular buffer is a sliding window contains most recent data
Circular buffer of size R must start on a N-bit boundary, where 2N >
R
The circular buffer size register (BK): specifies the size of circular
buffer
Effective base address (EFB): By zeroing the N LSBs of a user
selected AR (ARx)
End of buffer address (EOB) : By replacing the N LSBs of ARx with
the N LSBs of BK
If 0 _ index + step < BK ; index = index +step;
else if index + step ≥ BK ; index = index + step - BK;
else if index + step < 0; index + step + BK
Block diagram of the circular addressing mode for TMS320C54xx Processors
Circular addressing mode implementation for TMS320C54xx Processors
Bit-Reversed Addressing
Used for FFT algorithms
AR0 specifies one half of the size of the FFT
The value of AR0 = 2N-1: N = integer FFT size = 2N
AR0 + AR (selected register) = bit reverse addressing
The carry bit propagating from left to right
Dual-Operand Addressing
Dual data-memory operand addressing is used for instruction that
simultaneously perform two reads (32-bit read) or a single read (16-
bit read) and a parallel store (16-bit store) indicated by two vertical
bars, II
These instructions access operands using indirect addressing mode
If in an instruction with a parallel store the source operand the
destination operand point to the same location, the source is read
before writing to the destination
Only 2 bits are available in the instruction code for selecting each
auxiliary register in this mode
Thus, just four of the auxiliary registers, AR2-AR5, can be used
The ARAUs together with these registers, provide capability to
access two operands in a single cycle
Figure below shows how an address is generated using dual data-
memory operand addressing
Table Function of the different field in dual data memory operand addressing
Name Function
Opcode This field contains the operation code for the instruction
Xmod Defined the type of indirect addressing mode used for accessing the Xmem
Operand
XAR Xmem AR selection field defines the AR that contains the address of Xmem
Ymod Defines the type of indirect addressing mode used for accessing the Ymem
Operand
Yar Ymem AR selection field defines the AR that contains the address of Ymem
Block diagram of the Indirect addressing options with a dual data –memory
operand
6. Memory-Mapped Register Addressing
Used to modify the memory-mapped registers without affecting the
current data page pointer (DP) or stack-pointer (SP)
– Overhead for writing to a register is minimal
– Works for direct and indirect addressing
– Scratch –pad RAM located on data PAGE0 can be modified
STM #x, DIRECT
STM #tbl, AR1
16 bit memory mapped register address generation
7. Stack Addressing
Used to automatically store the program counter during interrupts
and subroutines
Can be used to store additional items of context or to pass data
values
Uses a 16-bit memory-mapped register, the stack pointer (SP)
PSHD X2
Values of stack & SP before and after operation
1. Assuming the current content of AR3 to be 200h, what will be its
contents after each of the following TMS320C54xx addressing
modes is used? Assume that the contents of AR0 are 20h
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3-
e. *AR3
f. *+AR3(40h)
g. *+AR3(-40h)
Solution:
a. AR3 ←AR3 + AR0;
AR3 = 200h + 20h = 220h
b. AR3 ←AR3 - AR0;
AR3 = 200h - 20h = 1E0h
c. AR3 ← AR3 + 1;
AR3 = 200h + 1 = 201h
d. AR3 ← AR3 - 1;
AR3 = 200h - 1 = 1FFh
e. AR3 is not modified
AR3 = 200h
f. AR3 ←AR3 + 40h;
AR3 = 200 + 40h = 240h
g. AR3 ← AR3 - 40h;
AR3 = 200 - 40h = 1C0h
2. Assume that the register AR3 with contents 1020h is selected as
the pointer for the circular buffer. Let BK = 40h to specify the
circular buffer size as 40h. Determine the start and the end
addresses fort the buffer. What will be the contents of register
AR3 after the execution to the instruction LD*AR3 + 0%, A, if
the contents of register AR0 are 0025h?
Solution:
AR3 = 1020h means that currently it points to location 1020h
Masking the lower 6 bits zeros gives the start address of the buffer as
1000h
Replacing the same bits with the BK gives the end address as 1040h
The Instruction LD*AR3 + 0%, A modifies AR3 by adding AR0 to it
and applying the circular modification
It yields
AR3 = circ(1020h+0025h) = circ(1045h) = 1045h - 40h = 1005h
Thus the location 1005h is the one pointed to by AR3
3. Assuming the current contents of AR3 to be 200h, what will be its
contents after each of the following TMS320C54xx addressing
modes is used? Assume that the contents of AR0 are 20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse carry propagation;
AR3 = 200h + 20h (with reverse carry propagation) = 220h
b. AR3 ← AR3 - AR0 with reverse carry propagation;
AR3 = 200h - 20h (with reverse carry propagation) = 23Fh
Program memory: To store program instructions & tables used in the
execution of programs
Organized into 128 pages, each of 64k word size
Table: Function of different pin PMST register
PMST bit Logic On-chip memory configuration
MP/MC 0 ROM enabled
1 ROM not available
OVLY 0 RAM in data space
1 RAM in program space
DROM 0 ROM not in data space
1 ROM in data space
Memory map for the TMS320C5416 Processor
Program Control
It contains program counter (PC), the program counter related
H/W, hard stack, repeat counters & status registers
PC addresses memory in several ways namely:
Branch: The PC is loaded with the immediate value following the
branch instruction
Subroutine call: The PC is loaded with the immediate value
following the call instruction
Interrupt: The PC is loaded with the address of the appropriate
interrupt vector
Instructions such as BACC, CALA, etc ;The PC is loaded with the
contents of the accumulator low word
End of a block repeat loop: The PC is loaded with the contents of
the block repeat program address start register
Return: The PC is loaded from the top of the stack
TMS320C54xx Instructions and programming
Assembly language instructions can be classified as:
Arithmetic operations
1. Addition instruction: ex-ADD, ADDC
2. Subtract instruction: ex-SUB, SUBB
3. Multiply instruction: ex-MPY, MPYA
4. Multiply accumulate instruction: ex-MAC, MACD
5. Multiply subtract instruction: ex-MAS, MASA
6. Double (32-bit operand) instruction: ex-DADD, DSUB
7. Application specific instruction: ex-EXP, LMS
Load and store instructions
1. Load instruction: ex-LD, LDM
2. Store instruction: ex-ST, STM
3. Conditional store instruction: ex-CMPS, STRCD
4. Parallel load and store instruction: ex-LDǁST
5. Parallel load and Multiply instruction: ex-LDǁMPY
6. Parallel store and add/sub instruction: ex-STǁADD, STǁSUB
7. Parallel store and multiply instruction: ex-STǁMPY, STǁMAC
8. Miscellaneous load type instruction: ex-MVDD, MVPD
Logical operations
1. AND instruction: ex AND, ANDM
2. OR instruction: ex OR, ORM
3. XOR instruction: ex XOR, XORM
4. Shift instruction: ex ROL, SFTL
5. Test instruction: ex BIT, CMPM
Program-control operations
1. Branch instruction: ex B, BACC
2. Call instruction: ex CALL, CALA
3. Interrupt instruction: ex INTR, TRAP
4. Return instruction: ex RET, FRET
5. Repeat instruction: ex RPT, RPTB
6. Stack manipulating: ex PUSH, POP
7. Miscellaneous PC instruction: ex IDLE, RESET
MPY: Multiply With/Without Rounding
Syntax: Operation:
1: MPY[R] Smem, dst (T)x(Smem)→dst
2: MPY Xmem, Ymem, dst
(Xmem)x(Ymem)→dst
(Xmem)→T
3:MPY #1k, dst (T)x1k→dst
4:MPY Smem, #1k, dst (Smem)x1k→dst
(Smem)→T
Operands:
Smem: Single data-memory operand
Xmem, Ymem: Dual data-memory operands
dst: A (accumulator A)
B (accumulator B)
–32 768 ≤ lk ≤32 767
Status Bits:
Affected by FRCT and OVM
Affects OVdst
MPYA: Multiply by Accumulator A
Syntax: Operation:
1:MPYA Smem (Smem)x(A(32-16))→B
(Smem)→T
2:MPYA dst (T)x(A(32-16))→dst
Operands:
Smem: Single data-memory operand
dst: A (accumulator A)
B (accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects Ovdst (OVB in syntax1)
MPYU: Multiply Unsigned
Syntax: Operation:
MPYU Smem, dst unsigned(T)x unsigned(Smem)→dst
Operands:
Smem: Single data-memory operand
dst: A (accumulator A)
B (accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects OVdst
MAC[R]: Multiply Accumulate With/Without Rounding
Syntax Operation
1:MAC[R]Smem, src (Smem)x(T)+(src)→src
2:MAC[R]Xmem, Ymem, src[dst] (Xmem)x(Ymem)+(src)→dst
(Xmem)→T
3:MAC #1k, src[dst] (T)x1k+(src)→dst
4:MACSmem, #1k, src[dst] (Smem)x1k+(src)→dst
(Smem)→T
Operands:
Smem: Single data-memory operand
dst: A (accumulator A)
B (accumulator B)
–32 768 ≤ lk ≤32 767
Status Bits:
Affected by FRCT and OVM
Affects OVdst (or OVsrc, if dst is not specified)
MACA[R]: Multiply by Accumulator A and Accumulate With/Without
Rounding
Syntax Execution
1:MACA[R] Smem[B] (Smem)x(A(32-16))+(B)→B
(Smem)→T
2:MACA[R] T, src[dst] (T)x(A(32-16))+(src)→dst
Operands:
Smem: Single data-memory operand
dst: A (accumulator A)
B (accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects OVdst (or OVsrc, if dst is not specified) and OVB in syntax 1
MACD: Multiply by Program Memory and Accumulate With
Delay
Syntax:
MACD Smem, pmad, src
Operands:
Smem: Single data-memory operand
src: A (accumulator A)
B (accumulator B)
0≤pmad≤65 535
Execution
pmad→PAR
if(RC)≠0
Then
(Smem)xPmem addressed by PAR)+(src)→src
(Smem)→T
(Smem)→Smem+1
(PAR)+1→PAR
Else
(Smem)x(Pmem addressed by PAR)+(src)→src
(Smem)→T
(Smem)→Smem+1
Status Bits:
Affected by FRCT and OVM
Affects Ovsrc
MACP: Multiply by Program Memory and Accumulate
Syntax
MACP Smem, pmad, src
Operands:
Smem: Single data-memory operand
src: A (accumulator A)
B (accumulator B)
0≤pmad≤65 535
Execution:
pmad→PAR
if(RC)≠0
Then
(Smem)xPmem addressed by PAR)+(src)→src
(Smem)→T
(PAR)+1→PAR
Else
(Smem)x(Pmem addressed by PAR)+(src)→src
(Smem)→T
Status Bits:
Affected by FRCT and OVM
Affects OVsrc
MACSU: Multiply Signed by Unsigned and Accumulate
Syntax Execution
MACSU Xmem, Ymem, src
unsigned(Xmem)xsigned(Ymem)+(src)→src
(Xmem)→T
Operands
Xmem,Ymem: Dual data-memory operands
Src :A(accumulator A)
B(accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects OVsrc
MAS[R] :Multiply and Subtract With/Without Rounding
Syntax Execution
1: MAS[R] Smem, src (src)-(Smem)x(T)→src
2: MAS[R] Xmem,Ymem, src[dst] (src)-(Xmem)x(Ymem)→dst
(Xmem)→T
Operands
Smem: Single data-memory operands
Xmem,Ymem: Dual data-memory operands
Src, dst :A(accumulator A)
B(accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects OVdst (or OVsrc, if dst=src)
MASA[R] :Multiply by Accumulator A and Subtract With/Without
Rounding
Syntax Execution
1:MASA Smem [B] (B)-(Smem)x(A(32-16))→B
(Smem)→T
2:MASA[R] T,src[,dst] (src)-(T)x(A(32-16))→dst
Operands
Smem: Single data-memory operands
Src, dst :A(accumulator A)
B(accumulator B)
Status Bits:
Affected by FRCT and OVM
Affects OVdst(or OVsrc, if dst is not specified)and OVB in syntax 1
Repeat Instructions
RPT: Repeat Next Instruction
RPTB[D]: Block Repeat
RPTZ: Repeat Next Instruction and Clear Accumulator
Programming Examples
Basic assembler directives
Example 1: Write a program to find the sum of a series of signed
numbers stored at successive locations in the data memory and
places the result in the accumulator
Solution:
AR1as pointer to the numbers
AR2 as counter for the numbers
Accumulator value set to zero
Sign extension mode is selected
Add each number into accumulator
Increment the pointer & decrement the counter
Repeat until count in AR2 reaches zero
Accumulator contains the sum of number
This program computes the signed sum of data memory locations from
address 410h to 41fh
The result is placed in A
A=dmad(410h)+dmad(411h)+………..+ dmad(41fh)
mmregs
.global _c_int000
.text
._c_int00:
STM #10h, AR2 :initialize counter AR2=10h
STM #410h, AR1 :Initialize Pointer AR1=410h
LD #0h, A :Initialize sum A=0
SSBX SXM :Select sign extension mode
START:
ADD *AR1+, A :Add the next data value
BANZ START, *AR2- :Repeat if not done
NOP :No operation
.end
Example 2: Program to computes multiply and accumulate using
direct addressing mode: Y(n) =h0x(n)+h1x(n-1)+h2x(n-2)
Solution: data memory
h0x(n), h1x(n-1) & h2x(n-2) are computed using MPY
instruction
(T)*(dmad)→Acc A or B
Accumulator contain output value
Acc (15-0) →dmad
Acc (31-16) →dmad+1
.global _c_int00
X .usect “Input Samples”, 3
Y .usect “output”, 2
H .usect “coefficients”, 3
.text
_c_int00:
SSBX SXM ;Select sign extension mode
LD #h, DP ;Select the data page for coefficients
LD @h, T ;get the coefficient h(0)
LD #x, DP ;select the data page for input samples
MPY @x, A ; A = x(n) * h(0)
LD #h, DP ; select the data page for
coefficients
LD @h+1, T ; get the coefficient h(1)
LD #x, DP ;select the data page for input signals
MPY @x+1, B ; B = x(n-1) * h(1)
ADD A, B ; B = x(n)*h(0) + x(n-1)*h(1)
LD #h, DP ; select the data page for coefficients
LD @h+2, T ; get the coefficient h(2)
LD #x, DP ;select the data page for input samples
MPY @x+2, A ; A = x(n-2) * h(1)
ADD A, B ; B = x(n)*h(0)+ x(n-1)*h(1) + x(n-2) * h(2)
LD #y, DP ; select the data page for outputs
STL B, @y ; save low part of output
STH B, @y+1 ; save high part of output
NOP ; No operation
.end
Example 3: Program computes multiply and accumulate using
indirect addressing mode
.global _c_int00
h .int 10, 20, 30
.text
_c_int00:
SSBX SXM ; Select sign extension mode
STM #310H, AR2 ; Initialize pointer AR2 for x(n) stored
at 310H
STM @h, AR3 ; Initialize pointer AR3 for
coefficients
MPY *AR2+,*AR3+, A ; A = x(n) * h(0)
MPY *AR2+,*AR3+, B ; B = x(n-1) * h(1)
ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1)
MPY *AR2+,*AR3+, A ; A = x(n-2) * h(2)
ADD A, B ; B = x(n) * h(0) + x(n-1) * h(1) + x(n-2) * h(2)
STL B, *AR2+ ; Save low part of result
STH B, * AR2+ ; Save high part of result
NOP ; No operation
.end
Example 4: Program computes multiply and accumulate using
MAC instruction :
.global _c_int00
.data
.bss x, 3
.bss y, 2
h .int 10, 20, 30
.text
_c_int00:
SSBX SXM ; Select sign extension mode
STM #x, AR2 ; Initialize AR2 to point to x(n)
STM #h, AR3 ; Initialize AR3 to point to h(0)
LD #0H, A ; Initialize result in A = 0
RPT #2 ; Repeat the next operation 3 times
MAC *AR2+,*AR3+, A ; y(n) computed
STM #y, AR2 ; Select the page for y(n)
STL A, *AR2+ ; Save the low part of y(n)
STL A, *AR2+ ; Save the high part of y(n)
NOP ; No operation
.end
On chip peripherals
It facilitates interfacing with external devices
The peripherals are:
General purpose I/O pins
A software programmable wait state generator
Hardware timer
Host port interface (HPI)
Clock generator
Serial port
1. It has two general purpose I/O pins:
BIO→input pin used to monitor the status of external devices
XF →output pin, software controlled used to signal external devices
2. Software programmable wait state generator:
Extends external bus cycles up to seven machine cycles
3. Hardware Timer
An on chip down counter
Used to generate signal to initiate any interrupt or any other process
Consists of 3 memory mapped registers:
i. The timer register (TIM)
ii. Timer period register (PRD)
iii. Timer controls register (TCR)
Pre scaler block (PSC)
TDDR (Time Divide Down ratio)
TIN &TOUT
The timer register (TIM) is a 16-bit memory-mapped register that
decrements at every pulse from the prescaler block (PSC)
The timer period register (PRD) is a 16-bit memory-mapped register
whose contents are loaded onto the TIM whenever the TIM
decrements to zero or the device is reset (SRESET)
The timer can also be independently reset using the TRB signal
The timer control register (TCR) is a 16-bit memory-mapped register
that contains status and control bits
Table shows the functions of the various bits in the TCR
The prescaler block is also an on-chip counter
Whenever the prescaler bits count down to 0, a clock pulse is given
to the TIM register that decrements the TIM register by 1
The TDDR bits contain the divide-down ratio, which is loaded onto
the prescaler block after each time the prescaler bits count down to 0
That is to say that the 4-bit value of TDDR determines the divide-by
ratio of the timer clock with respect to the system clock
In other words, the TIM decrements either at the rate of the system
clock or at a rate slower than that as decided by the value of the
TDDR bits
TOUT and TINT are the output signal generated as the TIM register
decrements to 0
TOUT can trigger the start of the conversion signal in an ADC
interfaced to the DSP
The sampling frequency of the ADC determines how frequently it
receives the TOUT signal
TINT is used to generate interrupts, which are required to service a
peripheral such as a DRAM controller periodically
The timer can also be stopped, restarted, reset, or disabled by specific
status bits
Logical block diagram of timer circuit
4. Host port interface (HPI):
Allows to interface to an 8bit or 16bit host devices or a host
processor
Signals in HPI are:
Host interrupt (HINT)
HRDY
HCNTL0 &HCNTL1
HBIL
HR/W
A generic diagram of the host port interface (HPI)
Important signals in the HPI are as follows:
The 16-bit data bus and the 18-bit address bus
The host interrupt, Hint, for the DSP to signal the host when it
attention is required
HRDY, a DSP output indicating that the DSP is ready for transfer
HCNTL0 and HCNTL1, control signal that indicate the type of
transfer to carry out
The transfer types are data, address, etc
HBIL. If this is low it indicates that the current byte is the first byte;
if it is high, it indicates that it is second byte
HR/W indicates if the host is carrying out a read operation or a write
operation
5. Clock Generator:
The clock generator on TMS320C54xx devices has two options-an
external clock and the internal clock
In the case of the external clock option, a clock source is directly
connected to the device
The internal clock source option, on the other hand, uses an internal
clock generator and a phase locked loop (PLL) circuit
The PLL, in turn, can be hardware configured or software
programmed
Not all devices of the TMS320C54xx family have all these clock
options; they vary from device to device
6. Serial I/O Ports:
Three types of serial ports are available:
i. Synchronous ports
ii. Buffered ports
iii. Time-division multiplexed ports
The synchronous serial ports are high-speed, full-duplex ports and
that provide direct communications with serial devices, such as
codec, and analog-to-digital (A/D) converters
A buffered serial port (BSP) is synchronous serial port that is
provided with an auto buffering unit and is clocked at the full
clock rate
A time-division multiplexed (TDM) serial port is a synchronous
serial port that is provided to allow time-division multiplexing of the
data
The functioning of each of these on-chip peripherals is controlled by
memory-mapped registers assigned to the respective peripheral
Interrupts of TMS320C54xx Processors
Many times, when CPU is in the midst of executing a program, a
peripheral device may require a service from the CPU
In such a situation, the main program may be interrupted by a signal
generated by the peripheral devices
This results in the processor suspending the main program in order to
execute another program, called interrupt service routine, to
service the peripheral device
On completion of the interrupt service routine, the processor returns
to the main program to continue from where it left
Interrupt may be generated either by an internal or an external
device
It may also be generated by software
Not all interrupts are serviced when they occur
Only those interrupts that are called nonmaskable are serviced
whenever they occur
Other interrupts, which are called maskable interrupts, are serviced
only if they are enabled
There is also a priority to determine which interrupt gets serviced
first if more than one interrupts occur simultaneously
Almost all the devices of TMS320C54xx family have 32 interrupts
However, the types and the number under each type vary from
device to device
Some of these interrupts are reserved for use by the CPU
Pipeline operation of TMS320C54xx Processors
The CPU of ‘54xx devices have a six-level-deep instruction pipeline
The six stages of the pipeline are independent of each other
This allows overlapping execution of instructions
During any given cycle, up to six different instructions can be active,
each at a different stage of processing
The six levels of the pipeline structure are program prefetch,
program fetch, decode, access, read and execute
1. During program prefetch, the program address bus, PAB, is loaded
with the address of the next instruction to be fetched
2. In the fetch phase, an instruction word is fetched from the program
bus, PB, and loaded into the instruction register, IR
These two phases from the instruction fetch sequence
3. During the decode stage, the contents of the instruction register, IR
are decoded to determine the type of memory access operation and
the control signals required for the data-address generation unit and
the CPU
4. The access phase outputs the read operand’s on the data address bus,
DAB
If a second operand is required, the other data address bus, CAB,
also loaded with an appropriate address
Auxiliary registers in indirect addressing mode and the stack pointer
(SP) are also updated
5. In the read phase the data operand(s), if any, are read from the data
buses, DB and CB
This phase completes the two-phase read process and starts the two
phase write processes
The data address of the write operand, if any, is loaded into the data
write address bus, EAB
6. The execute phase writes the data using the data write bus, EB, and
completes the operand write sequence
The instruction is executed in this phase
Pipeline operation of TMS320C54xx Processors
Pipe flow diagram
Example 1: Show the pipeline operation of the following sequence
of instructions if the initial value of AR3 is 80 & the values stored
in memory location 80, 81, 82 are 1, 2 & 3
LD *AR3+, A
ADD #1000h, A
STL A, *AR3+
Pipeline operation for above example1
Example 2: Show the pipeline operation of the following sequence
of instructions if the initial value of AR1, AR3, A are 84,81,1 &
the values stored in memory location 81, 82, 83, 84 are 2, 3, 4, 6,
Also provide the values of registers AR3, AR1, T & accumulator
A , after completion of each cycle
ADD *AR3+,A
LD *AR1+, T
MPY *AR3+, B
ADD B, A
Pipeline operation for above example2
Some assembler directives:
Assembler Description
Directive
.mmregs Permits the memory map register to be refered using names such
as AR0,SP etc
.include “XX” Informs the assembler to insert a list of instructions in the file XX
to be inserted in this place and assemble it
.end The end of assembly language program
.data Assemble into data memory area
.text Assembler into program memory area
.equ Equate a symbol to a constant
.word x,y,....z Reserves 16 bit location and initialise them with values x,
y,...z.this may be used in both the text and data section
.space n Reserve and initialize n bits of memory and when a label is used
with this directive, the label is assigned the address of first word
of the block reserved.
.bes n Reserve and initialize n bits of memory and when a label is used
with this directive, the label is assigned the address of last word
of the block reserved.