[go: up one dir, main page]

0% found this document useful (0 votes)
32 views28 pages

Module 4 Final

The document provides an overview of ARM architecture, detailing its history, features, and instruction sets. It explains the ARM processor's architecture, including its registers, modes of operation, and pipelining mechanism, which enhances instruction execution speed. Additionally, it covers ARM assembly language programming, highlighting data processing, data transfer, and control flow instructions.

Uploaded by

mhday1812
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views28 pages

Module 4 Final

The document provides an overview of ARM architecture, detailing its history, features, and instruction sets. It explains the ARM processor's architecture, including its registers, modes of operation, and pipelining mechanism, which enhances instruction execution speed. Additionally, it covers ARM assembly language programming, highlighting data processing, data transfer, and control flow instructions.

Uploaded by

mhday1812
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

INTRODUCTION OF ARM:

The ARM was originally developed at Acorn Computers Limited of Cambridge , England, between
1983 and 1985. It was the first RISC microprocessor developed for commercial use and has some
significant differences from subsequent RISC architectures. In 1990 ARM Limited was established
as a separate company specifically to widen the exploitation of ARM technology and it is established
as a market-leader for low-power and cost-sensitive embedded applications. The ARM is supported
by a toolkit which includes an instruction set emulator for hardware modelling and software testing
and benchmarking, an assembler, C and C++ compilers, a linker and a symbolic debugger.

The 16-bit CISC microprocessors that were available in 1983 were slower than standard memory parts.
They also had instructions that took many clock cycles to complete (in some cases, many hundreds of
clock cycles), giving them very long interrupt latencies.As a result of these frustrations with the
commercial microprocessor offerings, the design of a proprietary microprocessor was considered and
ARM chip was designed.

FEATURES OF ARM PROCESSORS

The ARM processors are based on RISC architectures and this architecture has provided small
implementations, and very low power consumption. Implementation size, performance, and very low
power consumption remain the key features in the development of the ARM devices.

The typical RISC architectural features of ARM are :

• A large uniform register file


• A load/store architecture, where data-processing operations only operate on register contents,
not directly on memory contents.
• Simple addressing modes, with all load/store addresses being determined from register
contents and instruction fields only uniform and fixed-length instruction fields, to simplify
instruction decode.
• Control over both the Arithmetic Logic Unit (ALU) and shifter in most data-processing
instructions to maximize the use of an ALU and a shifter
• Auto-increment and auto-decrement addressing modes to optimize program loops
• Load and Store Multiple instructions to maximize data throughput
• Conditional execution of almost all instructions to maximize execution throughput.
There are three basic instruction sets for ARM.

MBITS, Nellimattom Page 1


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

• A 32- bit ARM instruction set


• A 16 –bit Thumb instruction set and
• The 8-bit Java Byte code used in Jazelle state
The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions. Thumb
instructions operate with the standard ARM register configurations, enabling excellent
interoperability between ARM and Thumb states. This Thumb state is nearly 65% of the ARM code
and can provide 160%of the performance of ARM code when working ona 16-bit memory system.
This Thumb mode is used in embedded systems where memory resources are limited. The Jazelle
mode is used in ARM9 processor to work with 8-bit Java code.

ARCHITECTURE OF ARM PROCESSORS:

The ARM 7 processor is based on Von Neman model with a single bus for both data and instructions.
Though this will decrease the performance of ARM, it is overcome by the pipe line concept. ARM
uses the Advanced Microcontroller Bus Architecture (AMBA) bus architecture. This AMBA include
two system buses: the AMBA High-Speed Bus (AHB) or the Advanced System Bus (ASB), and the
Advanced Peripheral Bus (APB).
The ARM processor consists of
• Arithmetic Logic Unit (32-bit)
• One Booth multiplier(32-bit)
• One Barrel shifter
• One Control unit
• Register file of 37 registers each of 32 bits.
In addition to this the ARM also consists of a Program status register of 32 bits, Some special
registers like the instruction register, memory data read and write register and memory address
register ,one Priority encoder which is used in the multiple load and store instruction to indicate
which register in the register file to be loaded or stored and Multiplexers etc.

MBITS, Nellimattom Page 2


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

ARM Registers: ARM has a total of 37 registers. In which - 31 are general-purpose registers of 32-
bits, and six status registers .But all these registers are not seen at once. The processor state and
operating mode decide which registers are available to the programmer. At any time, among the 31
general purpose registers only 16 registers are available to the user. The remaining 15 registers are
used to speed up exception processing. There are two program status registers: CPSR and SPSR (the
current and saved program status registers, respectively
In ARM state the registers r0 to r13 are orthogonal—any instruction that you can apply to r0 you can
equally well apply to any of the other registers.

The main bank of 16 registers is used by all unprivileged code. These are the User mode registers.
User mode is different from all other modes as it is unprivileged. In addition to this register bank, there
is also one 32-bit Current Program status Register(CPSR)

MBITS, Nellimattom Page 3


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

In the 16 registers, the Register 13 acts as a stack pointer register and r14 acts as a link register and
r15 acts as a program counter register. Register r13 is the SP register, and it is used to store the address
of the stack top. R13 is used by the PUSH and POP instructions.

Register 14 is the Link Register (LR). This register holds the address of the next instruction after a
Branch and Link (BL or BLX) instruction, which is the instruction used to make a subroutine call. It
is also used for return address information on entry to exception modes. At all other times, R14 can be
used as a general-purpose register.

Register 15 is the Program Counter (PC). It can be used in most instructions as a pointer to the
instruction which is two instructions after the instruction being executed.

The remaining 13 registers have no special hardware purpose.

CPSR: The ARM core uses the CPSR register to monitor and control internal operations. The CPSR
is a dedicated 32-bit register and resides in the register file. The CPSR is divided into four fields, each
of 8 bits wide: flags, status, extension, and control. The extension and status fields are reserved for
future use. The control field contains the processor mode, state, and interrupt mask bits. The flags
field contains the condition flags. The 32-bit CPSR register is shown below.

Processor Modes: There are seven processor modes. Six privileged modes abort, fast interrupt
request, interrupt request, supervisor, system, and undefined and one non-privileged mode called user
mode.

MBITS, Nellimattom Page 4


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Banked Registers: Out of the 32 registers, 20 registers are hidden from a program at different times.
These registers are called banked registers and are identified by the shading in the diagram. They are
available only when the processor is in a particular mode; for example, abort mode has banked
registers r13_abt, r14_abt and spsr _abt. Banked registers of a particular mode are denoted by an
underline character post-fixed to the mode mnemonic or _mode. When the T bit is 1, then the processor
is in Thumb state. To change states the core executes a specialized branch instruction and when T= 0
the processor is in ARM state and executes ARM instructions. There are two interrupt request levels
available on the ARM processor core—interrupt request (IRQ) and fast interrupt request (FIQ).

V, C , Z , N are the Condition flags .

V (Overflow) : Set if the result causes a signed overflow


C (Carry) : Is set when the result causes an unsigned carry
Z (Zero) : This bit is set when the result after an arithmetic operation is zero, frequently
used to indicate equality
N (Negative) : This bit is set when the bit 31 of the result is a binary 1.

PIPE LINE : Pipeline is the mechanism used by the RISC processor to execute instructions at an
increased speed. This pipeline speeds up execution by fetching the next instruction while other
instructions are being decoded and executed. During the execution of an instruction ,the processor

MBITS, Nellimattom Page 5


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Fetches the instruction .It means loads an instruction from memory.And decodes the instruction i.e
identifies the instruction to be executed and finally Executes the instruction and writes the result back
to a register.
The ARM7 processor has a three stage pipelining architecture namely Fetch , Decode and Execute.And
the ARM 9 has five stage Pipe line architecture.The three stage pipelining is explained as below.

To explain the pipelining ,let us consider that there are three instructions Compare, Subtract and
Add.The ARM7 processor fetches the first instruction CMP in the first cycle and during the second
cycle it decodes the CMP instruction and at the same time it will fetch the SUB instruction. During
the third cycle it executes the CMP instruction , while decoding the SUB instruction and also at the
same time will fetch the third instruction ADD. This will improve the speed of operation. This leads
to the concept of parallel processing .This pipeline example is shown in the following diagram.

As the pipeline length increases, the amount of work done at each stage is reduced, which allows the
processor to attain a higher operating frequency. This in turn increases the performance. One important
feature of this pipeline is the execution of a branch instruction or branching by the direct modification
of the PC causes the ARM core to flush its pipeline.

MBITS, Nellimattom Page 6


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

PROGRAMMER'S MODEL
Programmer's model helps the programmer, how to use the core components of the processor to
program it.
Our ARM has a 32-bit data bus and a 32-bit address bus. The data types the processor supports are
Words (32 bits), where words must be aligned to four byte boundaries. Instructions are exactly one
word, and data operations (e.g. ADD) are only performed on word quantities. Load and store
operations can transfer words.

Our ARM supports six modes of operation:

(1) User mode (usr): the normal program execution state


(2) FIQ mode (fiq): designed to support a data transfer or channel process
(3) IRQ mode (irq): used for general-purpose interrupt handling
(4) Supervisor mode (svc): a protected mode for the operating system
(5) Abort mode (abt): entered after a data or instruction prefetch abort
(6) Undefined mode (und): entered when an undefined instruction is executed

Registers

The processor has a total of 37 registers made up of 31 general 32 bit registers and 6 status registers.
At any one time 16 general registers (R0 to R15) and one or two status registers are visible to the
programmer. The visible registers depend on the processor mode and the other registers (the banked
registers) are switched in to support IRQ, FIQ, Supervisor, Abort and Undefined mode processing.
The register bank organization is shown in below Figure. The banked registers are shaded in the
diagram.

In all modes 16 registers, R0 to R15, are directly accessible. All registers except R15 are general
purpose and may be used to hold data or address values. Register R15 holds the Program Counter
(PC). When R15 is read, bits [1:0] are zero and bits [31:2] contain the PC. A seventeenth register
(the CPSR - Current Program Status Register) is also accessible. It contains condition code flags and
the current mode bits and may be thought of as an extension to the PC.
R14 is used as the subroutine link register and receives a copy of R15 when a Branch and Link
instruction is executed. It may be treated as a general- purpose register at all other times. R14_svc,
R14_irq, R14_fiq, R14_abt and R14_und are used similarly to hold the return values of R15 when
interrupts and exceptions arise, or when Branch and Link instructions are executed within interrupt
or exception routines.

MBITS, Nellimattom Page 7


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

FIQ mode has seven banked registers mapped to R8-14 (R8_fiq-R14_fiq). Many FIQ programs will
not need to save any registers. User mode, IRQ mode, Supervisor mode, Abort mode and
Undefined mode each have two banked registers mapped to R13 and R14. The two banked registers
allow these modes to each have a private stack pointer and link register. Supervisor, IRQ, Abort
and Undefined mode programs which require more than these two banked registers are expected to
save some or all of the caller's registers (R0 to R12) on their respective stacks. They are then free to
use these registers that they will restore before returning to the caller. In addition there are also
five SPSRs (Saved Program Status Registers) that are loaded with the CPSR when an exception
occurs. There is one SPSR for each privileged mode.
(Also explain CPSR Register)

MBITS, Nellimattom Page 8


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

ARM Assembly Language Programming


➢ Data Processing instructions
– operate on values in registers
➢ Data Transfer Instructions
– move values between memory and registers
➢ Control Flow Instructions
– change the program counter (PC)
DATA PROCESSING INSTRUCTIONS:
The data processing instructions manipulate data within registers. They are move instructions,
arithmetic instructions, logical instructions, comparison instructions and multiply instructions. Most
data processing instructions can process one of their operands using the barrel shifter. All operands
are 32-bits wide and either come from registers or are literals (‘immediate’ values). The result, if any,
is 32-bits wide and goes into a register. Long multiplies generate 64-bit results. All operand and result
registers are independently specified.
Arithmetic Operations:
ADD r0, r1, r2 ; r0 = r1 + r2
ADC r0, r1, r2 ; r0 = r1 + r2 + C
SUB r0, r1, r2 ; r0 = r1 - r2
SBC r0, r1, r2 ; r0 = r1 - r2 + C - 1
RSB r0, r1, r2 ; r0 = r2 - r1
RSC r0, r1, r2 ; r0 = r2 - r1 + C - 1
➢ C is the C bit in the CPSR
➢ Operation may be viewed as unsigned or 2’s complement signed
Bit-wise Logical Operations
AND r0, r1, r2 ; r0 = r1 and r2
ORR r0, r1, r2 ; r0 = r1 or r2
EOR r0, r1, r2 ; r0 = r1 xor r2
BIC r0, r1, r2 ; r0 = r1 and not r2
• Specified Boolean logic operation is performed on each bit from 0 to 31
• BIC stands for ‘bit clear’
– each ‘1’ in r2 clears the corresponding bit in r1
Register Movement Operations
MOV r0, r2 ; r0 = r2
MVN r0, r2 ; r0= not r2

MBITS, Nellimattom Page 9


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

• MVN stands for ‘move negated’


• no first operand (r1) specified as these are unary operations
Comparison Operations
CMP r1, r2 ; set cc on r1 - r2
CMN r1, r2 ; set cc on r1 + r2
TST r1, r2 ; set cc on r1 and r2
TEQ r1, r2 ; set cc on r1 xor r2
- Instructions just affect the condition codes (N, Z, C, V) in the CPSR
-There is no result register (r0)
Immediate Operands
Second source operand may be replaced by a constant.
ADD r3, r3, #1 ; r3 = r3 + 1
AND r8, r7, #&ff ; r8 = r7[7:0]
➢ # indicates an immediate value
– & indicates hexadecimal notation
– C-style notation (#0xff) is also supported
➢ allowed immediate values are (in general): (0 → 255) x 22n
0 <= n <= 12
Shifted Register Operands
Second source operand may be shifted by a constant number of bit positions.
ADD r3, r2, r1, LSL #3 ; r3 = r2+r1<<3
– or by a register-specified number of bits:
ADD r5, r5, r3, LSL r2 ; r5= r5 + r3<<r2
– LSL, LSR mean ‘logical shift left, right’
– ASL, ASR mean ‘arithmetic shift left, right’
– ROR means ‘rotate right’
– RRX means ‘rotate right extended’ by 1 bit

MBITS, Nellimattom Page 10


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Multiplication
MUL r4, r3, r2 ; r4 = (r3 x r2)[31:0]
– only the bottom 32 bits are returned
– immediate operands are not supported
➢ Multiplication by a constant is usually best done with a short series of adds and subtracts with
shifts
‘Multiply-Accumulate’ form:
MLA r4, r3, r2, r1; r4 = (r3xr2+r1)[31:0]
➢ 64-bit result forms are also supported.
DATA TRANSFER INSTRUCTIONS:
Data transfer instructions transfer data between registers and memory.
➢ Memory to register or LOAD from memory to register
➢ Register to memory or STORE from register to memory
❖ The ARM has three sets of instructions which interact with main memory. These are:
❖ Single register data transfer (LDR/STR)
❖ Block data transfer (LDM/STM)
❖ Single Data Swap (SWP)
❖ The basic load and store instructions are:
❖ Load and Store Word or Byte or Halfword
❖ LDR / STR / LDRB / STRB / LDRH / STRH
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword

LDRSB Signed byte load


LDRSH Signed halfword load
❖ Memory system must support all access sizes
❖ Syntax:

MBITS, Nellimattom Page 11


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

LDR{<cond>}{<size>} Rd, <address>

STR{<cond>}{<size>} Rd, <address>


e.g. LDREQB

Data Transfer: Memory to Register


To transfer a word of data, we need to specify two things:
➢ Register: r0-r15
➢ Specify Memory address
Memory can be specified in the following Addressing Modes:
❖ There are many ways in ARM to specify the address; these are called addressing modes.
❖ Two basic classification
1. Base register Addressing
▪ Register holds the 32 bit memory address
▪ Also called the base address
2. Base Displacement Addressing mode
▪ An effective address is calculated :
Effective address = < Base address +offset>
▪ Base address in a register as before
▪ Offset can be specified in different ways
Base Register Addressing Modes
❖ Specify a register which contains the memory address
❖ In case of the load instruction (LDR) this is the memory address of the data that we want to
retrieve from memory
❖ In case of the store instruction (STR), this is the memory address where we want to write the
value which is currently in a register
❖ Example: [r0]
❖ specifies the memory address pointed to by the value in r0
Data Transfer: Memory to Register
❖ Load Instruction Syntax: 1 2, [3]
where
1) operation name
2) register that will receive value
3) register containing pointer to memory
❖ ARM Instruction Name: LDR (meaning Load Register, so 32 bits or one word are loaded at a

MBITS, Nellimattom Page 12


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

time)
Memory to Register:
❖ LDR r2,[r1]
This instruction will take the address in r1, and then load a 4 byte value from the memory
pointed to by it into register r2
❖ Note: r1 is called the base register

Register to Memory:
❖ STR r2,[r1]
This instruction will take the address in r1, and then store a 4 byte value from the register r2 to the
memory pointed to by r1.
❖ Note: r1 is called the base register

Base Displacement Addressing Mode


❖ To specify a memory address to copy from, specify two things:
❖ A register which contains a pointer to memory
❖ A numerical offset (in bytes)
❖ The effective memory address is the sum of these two values.
❖ Example: [r0,#8]
❖ specifies the memory address pointed to by the value in r0, plus 8 bytes
LDR/STR <dest_reg>[<base_reg>,offset]

MBITS, Nellimattom Page 13


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Examples:
LDR/STR r1 [r2, #4]; offset: immediate 4
;The effective memory address is calculated as r2+4
LDR/STR r1 [r2, r3]; offset: value in register r3
;The effective memory address is calculated as r2+r3
LDR/STR r1 [r2, r3, LSL #3]; offset: register value *23
;The effective memory address is calculated as r2+r3*23
❖ Example: LDR r0,[r1,#12]
This instruction will take the pointer in r1, add 12 bytes to it, and then load the value from the memory
pointed to by this calculated sum into register r0
❖ Example: STR r0,[r1,#-8]
This instruction will take the pointer in r1, subtract 8 bytes from it, and then store the value from
register r0 into the memory address pointed to by the calculated sum
❖ Notes:
❖ r1 is called the base register
❖ #constant is called the offset
❖ offset is generally used in accessing elements of array or structure: base reg points to
beginning of array or structure
FLOW CONTROL INSTRUCTIONS:
ARM's Flow Control Instructions modify the default sequential execution. They control the operation
of the processor and sequencing of instructions. Determine the instruction to be executed next.

Branch instruction
B label

MBITS, Nellimattom Page 14


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4


label: …
Conditional branches:
MOV R0, #0
loop:

ADD R0, R0, #1
CMP R0, #10
BNE loop
Branch conditions:

Conditional Branches:

MBITS, Nellimattom Page 15


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Branch and link


• BL instruction save the return address to R14 (lr)
Calling a subroutine from main program: only Link Register is enough

Calling a subroutine from another subroutine: Need stack memory access

Conditional execution

MBITS, Nellimattom Page 16


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

3 STAGE PIPELINE:
The organization of an ARM with a 3-stage pipeline is illustrated in Figure.

The principal components are:


• The register bank, which stores the processor state. It has two read ports and one write port which
can each be used to access any register, plus an additional read port and an additional write port that
give special access to r15, the program counter. (The additional write port on r15 allows it to be
updated as the instruction fetch address is incremented and the read port allows instruction fetch to
resume after a data address has been issued.)
• The barrel shifter, which can shift or rotate one operand by any number of bits.
• The ALU, which performs the arithmetic and logic functions required by the instruction set.

MBITS, Nellimattom Page 17


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

• The address register and incrementer, which select and hold all memory addresses and generate
sequential addresses when required.
• The data registers, which hold data passing to and from memory.
• The instruction decoder and associated control logic.
In a single-cycle data processing instruction, two register operands are accessed, the value on the B
bus is shifted and combined with the value on the A bus in the ALU, then the result is written back
into the register bank. The program counter value is in the address register, from where it is fed into
the incrementer, then the incremented value is copied back into rl5 in the register bank and also into
the address register to be used as the address for the next instruction fetch.

ARM processors up to the ARM7 employ a simple 3-stage pipeline with the following pipeline stages:
• Fetch:
the instruction is fetched from memory and placed in the instruction pipeline.
• Decode:
the instruction is decoded and the datapath control signals prepared for the next cycle. In this stage
the instruction 'owns' the decode logic but not the datapath.
• Execute:
the instruction 'owns' the datapath; the register bank is read, an operand shifted, the ALU result
generated and written back into a destination register.
At any one time, three different instructions may occupy each of these stages, so the hardware in each
stage has to be capable of independent operation.

When the processor is executing simple data processing instructions the pipeline enables one
instruction to be completed every clock cycle. An individual instruction takes three clock cycles to
complete, so it has a three-cycle latency, but the throughput is one instruction per cycle. The 3-stage
pipeline operation for single-cycle instructions is shown in Figure:

MBITS, Nellimattom Page 18


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

5 STAGE PIPELINING:
Instruction Fetch (IF):
Function: Fetches the next instruction from memory.
Explanation: In this stage, the program counter (PC) is used to fetch the instruction from the memory
address pointed to by the PC. The fetched instruction is then passed to the next stage.
Instruction Decode (ID):
Function: Decodes the instruction to determine the operation to be performed and the operands.
Explanation: The fetched instruction is decoded to understand its opcode and any associated operands.
This stage determines what operation the instruction is requesting and what data it needs.
Execute (EX):
Function: Performs the actual operation or calculation specified by the instruction.
Explanation: This stage executes the operation indicated by the decoded instruction. For example, if
the instruction is an arithmetic operation, the actual computation takes place in this stage.
Memory Access (MEM):
Function: Accesses memory if the instruction involves a memory operation (e.g., load or store).
Explanation: In this stage, the processor interacts with the memory subsystem. If the instruction
involves reading from or writing to memory, the necessary data is transferred between the processor
and memory.
Write Back (WB):
Function: Writes the result of the executed instruction back to the register file.
Explanation: The final stage involves writing the results of the executed instruction back to the register
file. This stage updates the processor's internal registers with the results of the computation.

The advantage of a pipeline is that it allows multiple instructions to be processed simultaneously,


improving overall throughput and performance. Each stage of the pipeline can work on a different
instruction at the same time, making the processor more efficient. However, designing an effective
pipeline requires addressing issues such as data hazards and control hazards to ensure correct and
efficient execution of instructions.This 5-stage pipeline has been used for many RISC processors and
is considered to be the 'classic' way to design such a processor. Although the ARM instruction set was
not designed with such a pipeline in mind, it maps onto it relatively simply. The principal concessions

MBITS, Nellimattom Page 19


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

to the ARM instruction set architecture in the organization shown in Figure are the three source
operand read ports and two write ports in the register file (where a 'classic' RISC has two read ports
and one write port), and the inclusion of address incrementing hardware in the execute stage to support
load and store multiple instructions.

MBITS, Nellimattom Page 20


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

ARM Instruction Execution

Data Processing Instructions:


A data processing instruction requires two operands, one of which is always a register and the other is
either a second register or an immediate value. The second operand is passed through the barrel shifter
where it is subject to a general shift operation, then it is combined with the first operand in the ALU
using a general ALU operation. Finally, the result from the ALU is written back into the destination
register (and the condition code register may be updated). All these operations take place in a single
clock cycle. The PC value in the address register is incremented and copied back into both the address
register and r15 in the register bank, and the next instruction is loaded into the bottom of the instruction
pipeline. The immediate value, when required, is extracted from the current instruction at the top of
the instruction pipeline. For data processing instructions only the bottom eight bits (bits [7:0]) of the
instruction are used in the immediate value.

Data Transfer Instructions

A data transfer (load or store) instruction computes a memory address in a manner very similar to the
way a data processing instruction computes its result. A register is used as the base address, to which
is added an offset which again may be another register or an immediate value. A 12-bit immediate
value is used without a shift operation rather than a shifted 8-bit value. The address is sent to the
address register, and in a second cycle the data transfer takes place. Rather than leave the datapath
largely idle during the data transfer cycle, the ALU holds the address components from the first cycle
and is available to compute an auto-indexing modification to the base register if it is required.

Branch Instructions
Branch instructions compute the target address in the first cycle. A 24-bit immediate field is extracted
from the instruction and then shifted left two bit positions to give a word-aligned offset which is added
to the PC. The result is issued as an instruction fetch address, and while the instruction pipeline refills
the return address is copied into the link register (r14) if this is required. The third cycle, which is
required to complete the pipeline refilling, is also used to make a small correction to the value stored
in the link register in order that it points directly at the instruction which follows the branch.

ARM IMPLEMENTATION
The design is divided into a datapath section that is described in register transfer level (RTL) notation
and a control section that is viewed as a finite state machine (FSM).

MBITS, Nellimattom Page 21


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Clocking Scheme
The design is based around 2-phase non-overlapping clocks, as shown in Figure, which are generated
internally from a single input clock signal. This scheme allows the use of level-sensitive transparent
latches. Data movement is controlled by passing the data alternately through latches which are open
during phase 1 and latches which are open during phase 2. The non-overlapping property of the phase
1 and phase 2 clocks ensures that there are no race conditions in the circuit.

Datapath Timing
The normal datapath timing is a 3-stage pipeline. The ALU has input latches which are open during
phase 1, allowing the operands to begin combining in the ALU as soon as they are valid, but they close
at the end of phase 1 so that the phase 2 precharge does not get through to the ALU. The ALU then
continues to process the operands through phase 2, producing a valid output towards the end of the
phase which is latched in the destination register at the end of phase 2.

MBITS, Nellimattom Page 22


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

The minimum datapath cycle time is therefore the sum of:


• the register read time;
• the shifter delay;
• the ALU delay;
• the register write set-up time;
• the phase 2 to phase 1 non-overlap time.
Adder Design

The first ARM processor prototype used a simple ripple-carry adder as shown in Figure 4.10. Using a CMOS
AND-OR-INVERT gate for the carry logic and alternating AND/OR logic so that even bits use the circuit
shown and odd bits use the dual circuit with inverted inputs and outputs and AND and OR gates swapped
around, the worst-case carry path is 32 gates long. In order to allow a higher clock rate, ARM2 used a 4-bit
carry look-ahead scheme to reduce the worst-case carry path length.

ALU Functions
The ALU does not only add its two inputs. It must perform the full set of data oper ations defined by
the instruction set, including address computations for memory transfers, branch calculations, bit-wise
logical functions, and so on.

The Barrel Shifter


The ARM architecture supports instructions which perform a shift operation in series with an ALU
operation. The shifter performance is critical since the shift time contributes directly to the datapath
cycle time.

Multiplier Design
All ARM processors apart from the first prototype have included hardware support for integer
multiplication. Two styles of multiplier have been used:

MBITS, Nellimattom Page 23


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

• Older ARM cores include low-cost multiplication hardware that supports only the 32-bit result
multiply and multiply-accumulate instructions.
• Recent ARM cores have high-performance multiplication hardware and support the 64-bit result
multiply and multiply-accumulate instructions.

The Register Bank


The last major block on the ARM datapath is the register bank. This is where all the user-visible state
is stored in 31 general-purpose 32-bit registers, amounting to around 1 Kbits of data altogether. Since
the basic 1-bit register cell is repeated so many times in the design, it is worth putting considerable
effort into minimizing its size.

Datapath Layout
The ARM datapath is laid out to a constant pitch per bit. The pitch will be a compromise between the
optimum for the complex functions (such as the ALU) which are best suited to a wide pitch and the
simple functions (such as the barrel shifter) which are most efficient when laid out on a narrow pitch.
Control Structures
The control logic on the simpler ARM cores has three structural components which relate to each
other.
1. An instruction decoder PLA (programmable logic array). This unit uses some of the instruction bits
and an internal cycle counter to define the class of operation to be performed on the datapath in the
next cycle.
2. Distributed secondary control associated with each of the major datapath function blocks. This logic
uses the class information from the main decoder PLA to select other instruction bits and/or processor
state information to control the datapath.
3. Decentralized control units for specific instructions that take a variable number of cycles to complete
(load and store multiple, multiply and coprocessor operations). Here the main decoder PLA locks into
a fixed state until the remote control unit indicates completion.

Physical Design
There are two principal mechanisms used to implement an ARM processor core (or any other core, for
than matter) on a particular process:
• a hard macrocell is delivered as physical layout ready to be incorporated into the final design;
• a soft macrocell is delivered as a synthesizable design expressed in a hardware description language
such as VHDL.

MBITS, Nellimattom Page 24


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

SIMPLE ASSEMBLY LANGUAGE PROGRAMS


Program For Moving Immediate Value 30 to R0

Program For Moving Data From Memory

Program for Sum of a List

MBITS, Nellimattom Page 25


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Program For Arithmetic Operaion [Addition]

Program For Logical Operaion [AND]

Program For Move Negeted

Program For Finding Greatest of a Number

MBITS, Nellimattom Page 26


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

Program For Adding “R2+1” iff “R0<R1”

Program For Returing Back To Calling Function

Program For Storing Values to Stack

Printing “Hello World”

MBITS, Nellimattom Page 27


ECT381- EMBEDDED SYSTEM DESIGN: MODULE-4

MBITS, Nellimattom Page 28

You might also like