Processor R
Processor R
total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)
Processor technology
• Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N loop
total += M[i]
end loop
Desired
functionality
total = 0
• “Pentium” the most well-known, but for i =1 to …
power
Introduction
• General-Purpose Processor
• Processor designed for a variety of computation tasks
• Low unit cost, in part because manufacturer spreads NRE over large numbers
of units
• Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
• Carefully designed since higher NRE is acceptable
• Can yield good performance, size and power
• Low NRE cost, short time-to-market/prototype, high flexibility
• User just writes software; no processor design
• a.k.a. “microprocessor” – “micro” used when they were implemented on one
or a few chips rather than entire rooms
7
Basic Architecture
• Control unit and Processor
datapath Control unit Datapath
• Note similarity to single- ALU
purpose processor Controller Control
/Status
• Key differences
• Datapath is general Registers
8
Datapath Operations
• Load Processor
• Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control
/Status
+1
memory location
I/O
...
Memory
10
11
...
9
Control Unit
• Control unit: configures the datapath
operations Processor
• Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
ALU
“program”
Controller Control
• Instruction cycle – broken into several /Status
sub-operations, each one clock cycle,
e.g.: Registers
10
Control Unit Sub-Operations
• Fetch Processor
into IR ALU
Controller Control
• PC: program /Status
counter, always
Registers
points to next
instruction
• IR: holds the PC 100 IR
load R0, M[500] R0 R1
fetched instruction
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
11
Control Unit Sub-Operations
• Decode Processor
Registers
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
12
Control Unit Sub-Operations
• Fetch operands Processor
memory to ALU
Controller
datapath register Control
/Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
13
Control Unit Sub-Operations
• Execute Processor
instruction does
Registers
nothing during this
sub-operation
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
14
Control Unit Sub-Operations
• Store results Processor
instruction does
Registers
nothing during this
sub-operation
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
15
Instruction Cycles
PC=100 Processor
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
16
Instruction Cycles
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 101 IR R0 R1
inc R1, R0
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
17
Instruction Cycles
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store I/O
ops result ...
s 100 load R0, M[500] Memory
clk 500 10
101 inc R1, R0 501 11
...
102 store M[501], R1
18
Architectural Considerations
• N-bit processor Processor
• PC size determines
address space I/O
Memory
19
Two Memory Architectures
Processor Processor
• Princeton
• Fewer memory
wires
• Harvard
• Simultaneous Program Data memory Memory
memory (program and data)
program and data
memory access
Harvard Princeton
20
Cache Memory
• Memory access may be slow Fast/expensive technology, usually on
the same chip
• Cache is small but fast
memory close to processor Processor
Memory
21
Programmer’s View
• Programmer doesn’t need detailed understanding of architecture
• Instead, needs to know what instructions can be executed
• Two levels of instructions:
• Assembly level
• Structured languages (C, C++, Java, etc.)
• Most development today done using structured languages
• But, some assembly level programming may still be necessary
• Drivers: portion of program that communicates with and/or controls (drives) another device
• Often have detailed timing considerations, extensive bit manipulation
• Assembly level may be best for these
22
Assembly-Level Instructions
Instruction 1 opcode operand1 operand2
...
• Instruction Set
• Defines the legal set of instructions for that processor
• Data transfer: memory/register, register/register, I/O, etc.
• Arithmetic/logical: move register through ALU and back
• Branches: determine next PC value when not just PC+1
23
A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation
24
Addressing Modes
Addressing Register-file Memory
mode Operand field contents contents
Immediate Data
Register-direct
Register address Data
Register
Register address Memory address Data
indirect
Data
25
Internal structure and basic operation of
microprocessor
Address bus
ALU Register
Section
Data bus
27
Control unit
• The circuitry that controls the flow of
information through the processor, and
coordinates the activities of the other units
within it.
• In a way, it is the "brain within the brain", as it
controls what happens inside the processor,
which in turn controls the rest of the PC.
• On a regular processor, the control unit
performs the tasks of fetching, decoding,
managing execution and then storing results.
29
Register sets
• The register section/array consists completely of
circuitry used to temporarily store data or
program codes until they are sent to the ALU or
to the control section or to memory.
31
Program counter (PC)
• a 16 bit register, used to store the next address
of the operation code to be fetched by the CPU.
• Not much use in programming, but as an
indicator to user only.
• Purpose of PC in a Microprocessor
• to store address of tos (top of stack)
• to store address of next instruction to be
executed.
• count the number of instructions.
34
Stack pointer (SP)
• The stack is configured as a data structure that
grows downward from high memory to low
memory.
• At any given time, the SP holds the 16-bit
address of the next free location in the stack.
• The stack acts like any other stack when there is
a subroutine call or on an interrupt. ie. pushing
the return address on a jump, and retrieving it
after the operation is complete to come back to
its original location.
35
Data bus
• The data bus is 'bi-directional'
• data or instruction codes from memory or
input/output.are transferred into the microprocessor
• the result of an operation or computation is sent out
from the microprocessor to the memory or
input/output.
• Depending on the particular microprocessor, the
data bus can handle 8 bit or 16 bit data.
36
Address bus
• The address bus is 'unidirectional', over which
the microprocessor sends an address code to
the memory or input/output.
• The size (width) of the address bus is specified
by the number of bits it can handle.
• The more bits there are in the address bus, the
more memory locations a microprocessor can
access.
• A 16 bit address bus is capable of addressing
65,536 (64K) addresses.
37
Control bus
• The control bus is used by the microprocessor to
send out or receive timing and control signals in
order to coordinate and regulate its operation
and to communicate with other devices, i.e.
memory or input/output.
38
Micro processor clock
• Also called clock rate, the speed at which a
microprocessor executes instructions.
Every computer contains an internal clock
that regulates the rate at which
instructions are executed and
synchronizes all the various computer
components.
39
Examples of micro processor
• Intel 8086
• Motorola 6800
• Zilog Z80
40
Application-Specific Instruction-Set
Processors (ASIPs)
• General-purpose processors
• Sometimes too general to be effective in demanding application
• e.g., video processing – requires huge video buffers and operations on large arrays of
data, inefficient on a GPP
• But single-purpose processor has high NRE, not programmable
• ASIPs – targeted to a particular domain
• Contain architectural features specific to that domain
• e.g., embedded control, digital signal processing, video processing, network processing,
telecommunications, etc.
• Still programmable
43
A Common ASIP: Microcontroller
• For embedded control applications
• Reading sensors, setting actuators
• Mostly dealing with events (bits): data is present, but not in huge amounts
• e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine,
microwave oven
• Microcontroller features
• On-chip peripherals
• Timers, analog-digital converters, serial communication, etc.
• Tightly integrated for programmer, typically part of register space
• On-chip program and data memory
• Direct programmer access to many of the chip’s pins
• Specialized instructions for bit-manipulation and other low-level operations
44
Another Common ASIP: Digital Signal
Processors (DSP)
• For signal processing applications
• Large amounts of digitized data, often streaming
• Data transformations must be applied fast
• e.g., cell-phone voice filter, digital TV, music synthesizer
• DSP features
• Several instruction execution units
• Multiple-accumulate single-cycle instruction, other instrs.
• Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.
45
Trend: Even More Customized ASIPs
• In the past, microprocessors were acquired as chips
• Today, we increasingly acquire a processor as Intellectual Property (IP)
• e.g., synthesizable VHDL model
• Opportunity to add a custom datapath hardware and a few custom instructions,
or delete a few instructions
• Can have significant performance, power and size impacts
• Problem: need compiler/debugger for customized ASIP
• Remember, most development uses structured languages
• One solution: automatic compiler/debugger generation
• e.g., www.tensillica.com
• Another solution: retargettable compilers
• e.g., www.improvsys.com (customized VLIW architectures)
46
Selecting a Microprocessor
• Issues
• Technical: speed, power, size, cost
• Other: development environment, prior expertise, licensing, etc.
• Speed: how evaluate a processor’s speed?
• Clock speed – but instructions per cycle may differ
• Instructions per second – but work per instr. may differ
• Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.
• MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone
MIPS. Commonly used today.
• So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
• SPEC: set of more realistic benchmarks, but oriented to desktops
• EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org
• Suites of benchmarks: automotive, consumer electronics, networking, office automation,
telecommunications
47
General Purpose Processors
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900
L1, 256K
L2, MMX
IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900
PowerPC L1, 256K
750X L2
MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA
R5000 2 way set assoc.
StrongARM 233 MHz None 32 268 1W 2.1M NA
SA-110
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5
68HC811 32 I/O, Timer, WDT,
SPI
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34
Ports, DMA, 13
ADC, 9 DAC
Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
48
Summary
• General-purpose processors
• Good performance, low NRE, flexible
• Controller, datapath, and memory
• Structured languages prevail
• But some assembly level programming still necessary
• Many tools available
• Including instruction-set simulators, and in-circuit emulators
• ASIPs
• Microcontrollers, DSPs, network processors, more customized ASIPs
• Choosing among processors is an important step
• Designing a general-purpose processor is conceptually the same as designing a single-
purpose processor
52
Designing a Single Purpose Processor
and Optimization Issues
Embedded Systems
CS 504
Single-purpose processors
• Digital circuit designed to execute exactly one Controller Datapath
program
Control logic index
• a.k.a. coprocessor, accelerator or peripheral
• Features total
State register
• Contains only the components needed to execute a +
single program
• No program memory
• Benefits
Data
memory
• Fast total = 0
for i = 1 to N loop
• Low power total += M[i]
• Small size end loop
Processors
Custom Standard
Single Purpose Processors
• A single-purpose processor is a digital system intended to solve a
specific computation task.
• The processor may be a standard one, intended for use in a wide
variety of applications in which the same task must be performed.
The manufacturer of such an off-the-shelf processor sells the device
in large quantities.
• On the other hand, the processor may be a custom one, built by a
designer to implement a task specific to a particular application.
• An embedded system designer choosing to use a standard single
purpose, rather than a general-purpose, processor to implement part
of a system’s functionality may achieve several benefits.
Standard single-purpose processors
Known as Peripherals . (exist on the periphery of the CPU)
“Off-the shelf” → pre-designed for a common task.
Embedded system designers use standard single-processor rather than general-
purpose processor to achieve the following benefits:
Fast performance
Fewer clock cycles .
Shorter cycles .
Small size
No program memory .
Small instruction set
Simple datapath and controller .
Low unit cost
Introduction
• Processor
• Digital circuit that performs a computation Digital camera chip
tasks CCD
may be
• Fast, small, low power
• But, high NRE, longer time-to-market, less Memory controller ISA bus interface UART LCD ctrl
flexible
65
BENEFITS OF CUSTOM SINGLE PURPOSE PROCESSOR
x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1
Combinational Logic Design
Combinational components
Sequential Components
Sequential Logic Design
Sequential Logic Design
Custom Single Purpose Processor basic model
0: int p,q;
1: while(1)
2: { while(!go);
3: p=x;
4: q=y;
5: while(p!=q)
6: { if(p<q)
7: q=q-p;
else
8: p=p-q;
}
9: z=p;
}
Example: Greatest Common Divisor(GCD)
Example: Greatest Common Divisor(GCD)
State Diagram templates
Creating the datapath
Creating the controller’s FSM
Splitting into a controller and datapath
Controller state table for the GCD example
Completing the GCD custom single-purpose
processor design … …
controller datapath