A True Single Cycle RISC Processor Without Pipelining
A True Single Cycle RISC Processor Without Pipelining
memory. This wastes almost a full half cycle. A correct design “label” (conditional jump if not zero).
would use a latch that is open during the pre-charge period. It Fig 1 documents the RISC instruction set and decode table.
is also extremely wasteful to inset flip-flops on the output data
of the memory. A correct design would have a latch open V.THE REGISTER FILE
during the second half cycle of the memory and memory The operands are read and written to a 128 byte register file.
designs should already include this latch internally. All registers are general purpose. This register file is both
Mentally this problem is a conceptual difference caused by double pumped meaning there are two accesses in the same
going to a synchronous design philosophy. The pipeline design time period as one ROM access (interaction fetch) and the
engineer has flip-flops in too many places. How can you do an register file is a true dual-port meaning both source operands
instruction decode and the next instruction address calculation can be read at the same time. Since it is a dual port memory
(for jump instructions) all on the one clock edge between two there are two 8 bit data busses: A & B that connect the register
instruction fetches? In reality you have about half a clock cycle file to the execution unit. The A bus is used for reading
to perform these calculations. You have the margin for the operand A and for writing the result. The B bus is only for
ROM access time for the last instruction fetch plus the pre- reading operand B. Fig 2 shows the memory cell for the
charge time minus the address set-up time for the next register file.
instruction fetch.
VI.EXECUTION UNIT
IV.BASIC OVERVIEW OF THE RISC DESIGN
The execution unit consists of two operand latches, a barrel
The bit widths of each unit are as follows: shifter, and an ALU.
• Instruction Unit: 24 bits
• Execution Unit: 8 bits
A.Operand Latches
• Memory Unit: 16 bits
The instruction ROM used for the CISC was an 8Kx8 block Two operand busses (A and B) are required for single cycle
repeated 3 times and organized as 24K by 8. The RISC used operation. Both input operands are read from the register file
the same original ROM 8Kx8 block but has it organized as simultaneously. There are two non-overlapping clocks in the
8Kx24 since the instruction is 24 bits wide. All instruction RISC called CK1 and CK2. The operand latches are loaded
fetches are a single cycle 24 bit access. during CK1 from the register file and the output from the ALU
Both the instruction ROM and any external RAM has 16 bit is stored back into the register file during CK2. Note that the
addressing as indicated by the 16 bit memory unit width. This operand data is allowed to ripple through the latches during
allows the RISC to have an 8 bit opcode and 16 bit address for the time the data becomes valid. The latches are closed later to
jump instructions, etc all in one instruction fetch. This means avoid corruption before the register file goes into pre-charge.
there is no relative addressing jump calculations. The address Both operand latches are identical and can be independently
for the jump instructions is always immediate. It does not have reset or inverted (reset and invert together is a set). Clearing
to be calculated but only multiplexed with PC+1. ROM and and inverting the operands are required for some of the
RAM access have separate opcodes so in effect there is 17 bit operations in both the ALU and the shifter.
addressing for external memories.
The register file feeding the execution unit has 8 bit B.Barrel Shifter
addressing but the MSB for the register file is always 0. Since
This shifter can rotate right or left inserting zeroes, ones or
the RISC is an embedded controller the other 128 register
wrapping around LSB-to-MSB. The trick is all in the layout
addresses (MSB=1) are reserved for external user defined
and in how the operand registers drive it. Logically the shifter
registers. Within the 24bit instruction width you have 8 bits for
is nothing more than 8 to 1 multiplexers for every bit of the
the opcode and two 8 bit addresses for the two operands for
operand. Physically the A bus drives into the top and then the
the execution unit instructions. Immediate commands have an
wires shifts down one bit to the left for each multiplexer input.
8 bit opcode, an 8 bit address for the operand destination and
Physically the B bus drives into the bottom and then the wires
the 8 bit immediate value.
shifts up one bit to the right for each multiplexer input. If you
There is an obvious trade off to have only an 8 bit execution
drive both operands with the same value then you barrel shift.
unit. For 16 bit audio calculations two instructions are
All eight shift possibilities are available at the inputs of the
performed. For example, 16 bit values are added by two
multiplexers. The shift amount is determined by selecting one
instructions: ADD followed by an ADC (add with carry). The
of the eight possibilities. By clearing or setting one of the
CISC performed in the same manner.
operand registers you insert either leading or trailing 0’s or 1’s
The assembler cross assembled all of the previous CISC
as you shift. Fig 4 shows the shifter wiring.
instructions directly to the RISC instruction set using multiple
RISC instructions if required. For example a DJNZ (decrement
and jump if not zero) command is assembled as two
instructions: ADD Rnum, #%FF (add negative 1) and JP NZ,
ESS Design White Paper – RISC Embedded Controller 3
C.ALU file address generation, the A-B-C data bus multiplexing, and
The Arithmetic Logic Unit can be described as a LFU the flip-flops for the instruction word for data unpacking.
(Logical Function Unit) with a generate-propagate static
CMOS Manchester carry chain design. An LFU means it has a IX.OTHER FEATURES
programmable truth table operation for any Boolean function This RISC was designed as an embedded controller for
of the two input operands. The four control lines: LFU0 to audio applications and has some unique features.
LFU3 specify each bit of the truth table function. Both the
carry chain and the zero detect logic is buffered every four
A.Indirect Addressing
bits. To generate the result for the 8 bit ALU, the carry ripples
through only four inverting stages (total) and each of these Since the execution unit does not have a multiplier, audio
stages is an inverter. This carry chain implementation is fully compression and decompression is done by table look-up. For
static. It does not require any pre-charge clocks. Fig 3 shows this function indirect addressing is very important. Both the
the schematic for the ALU design. register file addresses and the external memory addressing can
be done through another set of registers. The register file
addressing costs another instruction to set a unique page
VII.INSTRUCTION UNIT register. The external memory can be addressed using a
register pair from the register file. The logic for the register
The instruction unit consists of the Program Counter (PC),
pair addressing can also be used for indirect addressing on
the ROM interface registers, and a hardware stack for
subroutine calls or jump instructions.
subroutines and interrupts.
WL1
VCC VCC
M4 M1
M8 M6
M5 M2
WLB2
VCC
M9
M10