[go: up one dir, main page]

0% found this document useful (0 votes)
57 views7 pages

A True Single Cycle RISC Processor Without Pipelining

This paper details the design of an embedded RISC controller used for mixed signal audio integrated circuits. This processor replaced an existing 8 bit CISC embedded processor and obtained a performance improvement of about 6x. This performance improvement was entirely due to architectural improvements using the same input clock rate and external ROM.

Uploaded by

bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

A True Single Cycle RISC Processor Without Pipelining

This paper details the design of an embedded RISC controller used for mixed signal audio integrated circuits. This processor replaced an existing 8 bit CISC embedded processor and obtained a performance improvement of about 6x. This performance improvement was entirely due to architectural improvements using the same input clock rate and external ROM.

Uploaded by

bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ESS Design White Paper – RISC Embedded Controller 1

A True Single Cycle RISC Processor without


Pipelining
Robert S. Plachno, VP of Audio
Adapting a RISC architecture was forced by the new
Abstract—This paper details the design of a embedded RISC customer feature requirements. This integrated circuit did not
controller used for mixed signal audio integrated circuits. This use a PLL so the input clock rate could not be simply sped up.
processor replaced an existing 8 bit CISC embedded processor Any performance improvement had to come from purely an
and obtained a performance improvement of about 6x. This
performance improvement was entirely due to architectural architectural change by obtaining single cycle operation.
improvements using the same input clock rate and external ROM B.Operating Voltage Range
IP block.
This integrated circuit was for PC audio products that could
Index Terms—Computer architecture, Memory management, be designed into desk top PCs or notebooks. In this time frame
Pipeline processing, Reduced instruction set computing. desk tops used 5V operation while notebooks required 3.3V
operation. The analog circuits were designed to run over this
I.INTRODUCTION wide power supply range and at multiple fabrication houses. It

T HE architecture of a RISC processor should support


single cycle operation. The definition of a signal cycle
operation is continuous instruction fetches from the instruction
was surprising to find that the circuit with the least operating
voltage margin was the CISC processor. The CISC was a fully
custom design that had issues with the low voltage operation.
memory (ROM in this case) at the maximum access rate of the The new RISC design lowered the operating voltage
memory. Most RISC processor designs obtain this dramatically to the point where the processor was not the
performance by pipelining. With a pipelined architecture each limiting factor and the chip gained over a half a volt of margin.
instruction is fetched assuming the next instruction is at the C.Royalty Cost
next physical instruction address (PC+1). If a jump instruction
The CISC was a purchased design which required royalty
occurs then the pipeline is flushed or a delay must occur while
payments. Since the PC audio products had volume shipments
the correct instruction address is calculated. This paper
at over 2M per month it was desirable to eliminate this cost
describes a RISC architecture in which single cycle operation
burden
is obtained without using a pipelined design.
This RISC processor was designed as an embedded
III.MEMORY FOR PROCESSOR DESIGNS
controller. Other architectural advantages of this design will be
discussed as well as the implementation and design techniques My original experience was in memory design including
pseudo-static designs where the memory pre-charge is hidden
II.THE LIMITATIONS OF THE PREVIOUS DESIGN from the user. Later I worked on several processor designs
including a 64 bit processor that had a large design group. It
A.CISC versus RISC became apparent that most processor engineers did not
The previous designs used an 8-bit CISC as an embedded understand how memories worked and I had to teach them
controller. This CISC was inefficient and had difficulty to how to efficiently interface to their memory blocks.
support continuous customer requests for additional features. Memories require a pre-charge. In this time frame it was a
The register instructions required 5 internal clock cycles and standard practice to divide the memory cycle in two. The first
instructions using external memory usually required 6 internal half cycle is for pre-charge and the second half-cycle is the
cycles. The internal register to register instructions had a low actual memory access. Addresses must only change during the
utilization in the program. Instructions using external memory pre-charge time and the address set-up time is actually
were more common. The instruction fetch occurred over an 8 measured to the center clock transition (at the end of the first
bit bus using multiple cycles which depended on the half-cycle). The existing ROM used for the CISC was
instruction type. The CISC design used a ROM organized as designed exactly in this manner.
24K by 8 for the instruction memory. Most instruction fetches As time progressed logic engineers became even more
required 2 to 3 ROM accesses. ignorant of their memory blocks and the circuit designers of
the memories made their specified interfaces safer. With the
change in philosophy to synchronous designs then using
latches fell out of favor to using flip-flops. Most engineers

This technical paper describes an embedded RISC controller used in now put flip-flops to fix the addresses at the input of the
mixed signal audio products from 1993 through 1999.
ESS Design White Paper – RISC Embedded Controller 2

memory. This wastes almost a full half cycle. A correct design “label” (conditional jump if not zero).
would use a latch that is open during the pre-charge period. It Fig 1 documents the RISC instruction set and decode table.
is also extremely wasteful to inset flip-flops on the output data
of the memory. A correct design would have a latch open V.THE REGISTER FILE
during the second half cycle of the memory and memory The operands are read and written to a 128 byte register file.
designs should already include this latch internally. All registers are general purpose. This register file is both
Mentally this problem is a conceptual difference caused by double pumped meaning there are two accesses in the same
going to a synchronous design philosophy. The pipeline design time period as one ROM access (interaction fetch) and the
engineer has flip-flops in too many places. How can you do an register file is a true dual-port meaning both source operands
instruction decode and the next instruction address calculation can be read at the same time. Since it is a dual port memory
(for jump instructions) all on the one clock edge between two there are two 8 bit data busses: A & B that connect the register
instruction fetches? In reality you have about half a clock cycle file to the execution unit. The A bus is used for reading
to perform these calculations. You have the margin for the operand A and for writing the result. The B bus is only for
ROM access time for the last instruction fetch plus the pre- reading operand B. Fig 2 shows the memory cell for the
charge time minus the address set-up time for the next register file.
instruction fetch.
VI.EXECUTION UNIT
IV.BASIC OVERVIEW OF THE RISC DESIGN
The execution unit consists of two operand latches, a barrel
The bit widths of each unit are as follows: shifter, and an ALU.
• Instruction Unit: 24 bits
• Execution Unit: 8 bits
A.Operand Latches
• Memory Unit: 16 bits
The instruction ROM used for the CISC was an 8Kx8 block Two operand busses (A and B) are required for single cycle
repeated 3 times and organized as 24K by 8. The RISC used operation. Both input operands are read from the register file
the same original ROM 8Kx8 block but has it organized as simultaneously. There are two non-overlapping clocks in the
8Kx24 since the instruction is 24 bits wide. All instruction RISC called CK1 and CK2. The operand latches are loaded
fetches are a single cycle 24 bit access. during CK1 from the register file and the output from the ALU
Both the instruction ROM and any external RAM has 16 bit is stored back into the register file during CK2. Note that the
addressing as indicated by the 16 bit memory unit width. This operand data is allowed to ripple through the latches during
allows the RISC to have an 8 bit opcode and 16 bit address for the time the data becomes valid. The latches are closed later to
jump instructions, etc all in one instruction fetch. This means avoid corruption before the register file goes into pre-charge.
there is no relative addressing jump calculations. The address Both operand latches are identical and can be independently
for the jump instructions is always immediate. It does not have reset or inverted (reset and invert together is a set). Clearing
to be calculated but only multiplexed with PC+1. ROM and and inverting the operands are required for some of the
RAM access have separate opcodes so in effect there is 17 bit operations in both the ALU and the shifter.
addressing for external memories.
The register file feeding the execution unit has 8 bit B.Barrel Shifter
addressing but the MSB for the register file is always 0. Since
This shifter can rotate right or left inserting zeroes, ones or
the RISC is an embedded controller the other 128 register
wrapping around LSB-to-MSB. The trick is all in the layout
addresses (MSB=1) are reserved for external user defined
and in how the operand registers drive it. Logically the shifter
registers. Within the 24bit instruction width you have 8 bits for
is nothing more than 8 to 1 multiplexers for every bit of the
the opcode and two 8 bit addresses for the two operands for
operand. Physically the A bus drives into the top and then the
the execution unit instructions. Immediate commands have an
wires shifts down one bit to the left for each multiplexer input.
8 bit opcode, an 8 bit address for the operand destination and
Physically the B bus drives into the bottom and then the wires
the 8 bit immediate value.
shifts up one bit to the right for each multiplexer input. If you
There is an obvious trade off to have only an 8 bit execution
drive both operands with the same value then you barrel shift.
unit. For 16 bit audio calculations two instructions are
All eight shift possibilities are available at the inputs of the
performed. For example, 16 bit values are added by two
multiplexers. The shift amount is determined by selecting one
instructions: ADD followed by an ADC (add with carry). The
of the eight possibilities. By clearing or setting one of the
CISC performed in the same manner.
operand registers you insert either leading or trailing 0’s or 1’s
The assembler cross assembled all of the previous CISC
as you shift. Fig 4 shows the shifter wiring.
instructions directly to the RISC instruction set using multiple
RISC instructions if required. For example a DJNZ (decrement
and jump if not zero) command is assembled as two
instructions: ADD Rnum, #%FF (add negative 1) and JP NZ,
ESS Design White Paper – RISC Embedded Controller 3

C.ALU file address generation, the A-B-C data bus multiplexing, and
The Arithmetic Logic Unit can be described as a LFU the flip-flops for the instruction word for data unpacking.
(Logical Function Unit) with a generate-propagate static
CMOS Manchester carry chain design. An LFU means it has a IX.OTHER FEATURES
programmable truth table operation for any Boolean function This RISC was designed as an embedded controller for
of the two input operands. The four control lines: LFU0 to audio applications and has some unique features.
LFU3 specify each bit of the truth table function. Both the
carry chain and the zero detect logic is buffered every four
A.Indirect Addressing
bits. To generate the result for the 8 bit ALU, the carry ripples
through only four inverting stages (total) and each of these Since the execution unit does not have a multiplier, audio
stages is an inverter. This carry chain implementation is fully compression and decompression is done by table look-up. For
static. It does not require any pre-charge clocks. Fig 3 shows this function indirect addressing is very important. Both the
the schematic for the ALU design. register file addresses and the external memory addressing can
be done through another set of registers. The register file
addressing costs another instruction to set a unique page
VII.INSTRUCTION UNIT register. The external memory can be addressed using a
register pair from the register file. The logic for the register
The instruction unit consists of the Program Counter (PC),
pair addressing can also be used for indirect addressing on
the ROM interface registers, and a hardware stack for
subroutine calls or jump instructions.
subroutines and interrupts.

B.External Register Addressing


A.Program Counter Register
As mentioned before there are 8 address bits for the register
This is a 16 bit register. A separate adder which is a
and the register file only uses the lower 128 bytes. The higher
simplified but similar design to the ALU does the count
128 bytes are user defined to be specific registers through the
function. This register always holds the current ROM
integrated circuit design. These external registers interface to
instruction address plus one. This is the value which is pushed
the RISC module through the C bus. This means that external
onto the stack for subroutine CALL’s and for interrupts so that
registers can be specified as a source or destination in an ADD
you RETURN to the next valid instruction. If the present
or other execution unit instruction. Other architectures require
instruction is not a jump, call, branch, etc. then the default is to
loading the values first to an internal register.
use the PC register value for the next ROM address.

C.User Defined Flags


B.The ROM Interface Registers
The RISC has the four standard condition flags for Carry,
This logic has several functions. The next ROM address is
Zero, Sign, and Overflow. However there are eight flags total
multiplexed from the Program Counter, the stack (for
that can be utilized. The other four flag bits are user defined.
Returns), or the immediate value on the instruction word (for
For example this can be a signal such as a FIFO “full” flag.
Jumps). Reset forces the address 0 and interrupt addresses can
be forced to 1, 2, or 3 (which always contain unconditional
jumps). This circuit also does the addressing manipulation for D.ROM Data Packing
reading data from the ROM. The data stored in the ROM uses The instruction width is 24 bytes. However, to perform table
byte addressing which has to be unpacked from a 8Kx24 or an look-up compression, data must be read from the ROM using
16Kx24 configuration. byte addressing. The three byte wide words are packed with
the LSB two data bytes sequentially up to the top of the
C.The Hardware Stack memory and then back down using the MSB bytes.
The stack design is a pure synchronous implementation. It is
four 16 bit registers which are always loaded every cycle. If E.Hardware Stack
the current opcode is a CALL then the register above it is This is a feature which can be a disadvantage. Using a
loaded (pushed). If the current opcode is a RETURN then the hardware stack simplifies the design and speeds up the
register below it is loaded (popped). If the opcode is neither execution. The subroutine RETURN in the CISC took 19
then its own output is multiplexed back to retain its present cycles while the RETURN in the RISC takes one cycle.
value. However, the design is limited to only four nested calls and
interrupts which is sufficient for the audio design. This can be
VIII.MEMORY UNIT un-nerving to some programmers.
The RISC does not have complicated memory management.
However, the memory unit does the functions of the register
ESS Design White Paper – RISC Embedded Controller 4

F.Large Register File XI.CONCLUSION


The RISC has 128 bytes of general purpose registers. This A design for a single cycle RISC processor has been
was large enough that no external RAM was used for the audio discussed that does not use pipelining. This operation is
application. The original CISC design did use an external obtained by folding the processor execution into the memory
RAM in addition to its internal registers. cycle. The RISC uses a ripple through latch style of design as
opposed to a synchronous flip-flop style design. The design
X.DESIGN IMPLEMENTATION improvements include:
• 6x performance improvement.
A.Engineers
• Wide power supply range for both desktop and
The RISC was designed by Roi Peers and Robert Plachno. notebook applications.
Roi was the architect on the PC audio chips. He did the system • No royalty fees.
level design and the software coding. For the RISC he defined • Minimum software impact.
the requirements and helped with the architecture. Robert • This is an architectural change only. No process
Plachno had designed several processors prior to this RISC improvements or clock rate increase were required.
including smaller embedded controllers and a larger 64 bit
processor. Robert did the design and simulation of the RISC.
Vincent Chueng replaced the CISC with the RISC in the audio
chip and fixed the software timing issues.
B.Semi-Custom Design
The RISC was designed in CMOS technology and run at
numerous fabrication houses from 0.6µ to 0.35µ channel
lengths. The data paths were designed at a transistor level.
Four sections including the data paths for the execution unit,
instruction unit, memory unit, and the register file were custom
laid out and placed together in a rectangular area. The
remaining control logic was routed as standard cells. Figure 5
shows the PC audio chip with the RISC in the top left.
C.CAE Tools
The original design was entered using the ORCAD
schematic tool. The simulations were done using Robert
Plachno’s EESIM. This simulator allows a mixed mode spice
and logic netlist. Modules can be simulated at the transistor,
gate or behavior level. The simulator also indicates the 10
worst (or specified) set-up times, hold times, etc and calculates
the power dissipation and test vector coverage. The initial
design was simulated on a PC and then progressed to a UNIX
system. Eventually the schematics were recaptured into
Cadence and simulated using Verilog.
D.Initial Debug
The design was initially done as a test chip on a multi-up
mask set. This was debugged using test vectors transferred to
an IMS tester. Then the CISC was replaced by the RISC in a
full PC audio design. The assembler (written by Plachno)
automatically cross-assembles the CISC instruction set to the
RISC instruction set. However, certain parts of the program
were found to be self-timed (software emulated serial port) and
other problems occurred since the processor performed about
6x faster. I would describe replacing the CISC with the RISC
in the design as having moderately few issues.
ESS Design White Paper – RISC Embedded Controller 5

Figure 1. RISC Instruction Set Definition


ESS Design White Paper – RISC Embedded Controller 6

Figure 2. Register File Dual Port Memory Cell

BL1 BLB1 BL2

WL1
VCC VCC

M4 M1

M8 M6

M5 M2

WLB2
VCC

M9

M10

Figure 3. ALU Design


ESS Design White Paper – RISC Embedded Controller 7

Figure 4. Barrel Shifter.

Figure 5. PC Audio Chip with RISC

You might also like