ARM Processor
ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other companies (ARM does not fabricate chips) 2005: ARM had 75% of embedded RISC market, with 2.5 billion processors ARM available as microcontrollers, IP cores, etc. www.arm.com
Based on Lecture Notes by Marilyn Wolf
ARM instruction set - outline
ARM versions. ARM assembly language. ARM programming model. ARM memory organization. ARM data operations. ARM flow of control.
Based on Lecture Notes by Marilyn Wolf
ARM versions
ARM architecture has been extended over several versions. ARM7TDMI. ARM9 includes Thumb instruction set ARM10 for multimedia (graphics, video, etc.) ARM11 high performance + Jazelle (Java) SecurCore for security apps (smart cards) Cortex-M Optimized for microcontrollers Cortex-A - High performance (multimedia systems) Cortex-R Optimized for real-time apps StrongARM portable communication devices
Based on Lecture Notes by Marilyn Wolf
ARM Architecture versions
(From arm.com)
Based on Lecture Notes by Marilyn Wolf
Based on Lecture Notes by Marilyn Wolf
Based on Lecture Notes by Marilyn Wolf
RISC CPU Characteristics
32-bit load/store architecture Fixed instruction length Fewer/simpler instructions than CISC CPU Limited addressing modes, operand types Simple design easier to speed up, pipeline & scale
Based on Lecture Notes by Marilyn Wolf
ARM assembly language
Fairly standard assembly language:
label
LDR r0,[r8] ADD r4,r0,r1
; a comment ;r4=r0+r1
destination source/left
source/right
Based on Lecture Notes by Marilyn Wolf
ARM Register Set
(16 32-bit general-purpose registers)
(change during exceptions)
Based on Lecture Notes by Marilyn Wolf
ARM Cortex register set
Changes from standard ARM architecture: Stack-based exception model Only two processor modes Thread Mode for User tasks* Handler Mode for OS tasks and exceptions* Vector table contains addresses
*Only SP changes between modes
Based on Lecture Notes by Marilyn Wolf
CPSR Current Processor Status Register
31 30 29 28 7 6 5 4 3 2 1 0
NZCV ALU Flags IRQ disable FIQ disable Thumb/ARM mode
I F T M 4 M3 M2 M1 M0 Processor Mode** 10000 User 10001 FIQ 10010 IRQ 10011 Supervisor (SWI) 10111 Abort D/I memy 11001 Undefined instr. 11111 - System **2 modes in Cortex: Thread & Handler
Must be in a privileged mode to change the CPSR MRS rn,CPSR MSR CPSR,rn
Based on Lecture Notes by Marilyn Wolf
Endianness
Relationship between bit and byte/word ordering defines
endianness:
bit 31
bit 0
bit 0
bit 31
byte 3 byte 2 byte 1 byte 0 little-endian (default)
byte 0 byte 1 byte 2 byte 3 big-endian
Based on Lecture Notes by Marilyn Wolf
ARM data types
Word is 32 bits long. Word can be divided into four 8-bit bytes. ARM addresses can be 32 bits long. Address refers to byte. Address 4 starts at byte 4. Configure at power-up in either little- or bit-endian
mode.
Based on Lecture Notes by Marilyn Wolf
ARM status bits
Every arithmetic, logical, or shifting operation can set
CPSR bits: N (negative), Z (zero), C (carry), V (overflow) Examples: -1 + 1 = 0: NZCV = 0110. 231-1+1 = -231: NZCV = 1001.
Setting status bits must be explicitly enabled on each
instruction
ex. adds sets status bits, whereas add does not
Based on Lecture Notes by Marilyn Wolf
ARM Instruction Code Format
31 28 25 24 21 20 19 16 15 12 11 0
cond 00
X opcode
Rn
Rd
Format determined by X bit
condition for execution
force update of CPSR
source reg dest reg
11
7 6
5 4
X = 0:
# shifts
11
shift 0
8 7
Rm
0
3rd operand is Rm 3rd operand is immediate
X = 1:
alignment scale factor
8-bit literal
Based on Lecture Notes by Marilyn Wolf
ARM data instructions
Basic format: ADD r0,r1,r2 Computes r1+r2, stores in r0. Immediate operand: (8-bit constant can be scaled by 2k) ADD r0,r1,#2 Computes r1+2, stores in r0. Set condition flags based on operation: ADDS r0,r1,r2
set status flags
Recently-added assembler translation:
ADD r1,r2 = ADD r1,r1,r2
Based on Lecture Notes by Marilyn Wolf
(but not MUL)
Flexible 2nd operand
2nd operand = constant or register Constant with optional shift: (#8bit_value)
8-bit value, shifted left any #bits (up to 32) 0x00ab00ab, 0xab00ab00, 0xabababab (a,b hex digits)
Register with optional shift: Rm,shift_type,#nbits
shift_type = ASR, LSL, LSR, ROR, with nbits < 32 shift_type RRX (rotate through X) by 1 bit
Based on Lecture Notes by Marilyn Wolf
Barrel shifter for 2nd operand
Based on Lecture Notes by Marilyn Wolf
ARM arithmetic instructions
ADD, ADC : add (w. carry)
[Rd] <= Op1 + Op2 + C SUB, SBC : subtract (w. carry) [Rd] <= Op1 Op2 + (C 1) RSB, RSC : reverse subtract (w. carry) [Rd] <= OP2 Op1 + (C 1) MUL: multiply (32-bit product no immediate for Op2) [Rd] <= Op1 x Op2 MLA : multiply and accumulate (32-bit result) MLA Rd,Rm,Rs,Rn : [Rd] <= (Rm x Rs) + Rn
Based on Lecture Notes by Marilyn Wolf
ARM logical instructions
AND, ORR, EOR: bit-wise logical ops BIC : bit clear
[Rd] <= Op1 ^ Op2 LSL, LSR : logical shift left/right (combine with data ops) ADD r1,r2,r3, LSL #4 : [r1] <= r2 + (r3x16) Vacated bits filled with 0s ASL, ASR : arithmetic shift left/right (maintain sign) ROR : rotate right RRX : rotate right extended with C from CPSR C 33-bit shift:
Based on Lecture Notes by Marilyn Wolf
New Thumb2 bit operations
Bit field insert/clear (to pack/unpack data within a register)
BFC r0,#5,#4 ;Clear 4 bits of r0, starting with bit #5 BFI r0,r1,#5,#4 ;Insert 4 bits of r1 into r0, start at bit #5
Bit reversal (REV) reverse order of bits within a register
Bit [n] moved to bit [31-n], for n = 0..31 Example:
REV r0,r1 ;reverse order of bits in r1 and put in r0
Based on Lecture Notes by Marilyn Wolf
ARM comparison instructions
These instructions only set the NZCV bits of CPSR no other result is saved. (Set Status is implied)
Op1 Op2 CMN : negated compare : Op1 + Op2 TST : bit-wise AND : Op1 ^ Op2 TEQ : bit-wise XOR : Op1 xor Op2
CMP : compare :
Based on Lecture Notes by Marilyn Wolf
ARM move instructions
MOV, MVN : move (negated), constant = 8 or 16 bits
MOV r0, r1 ; sets r0 to r1 MOVN r0, r1 ; sets r0 to r1 MOV r0, #55 ; sets r0 to 55 MOV r0,#0x5678 ;Thumb2 r0[15:0] MOVT r0,#0x1234 ;Thumb2 r0[31:16] Use shift modifier to scale a value: MOV r0,r1,LSL #6 ; [r0] <= r1 x 64 Special pseudo-op:
LSL rd,rn,shift = MOV rd,rn,LSL shift
Based on Lecture Notes by Marilyn Wolf
ARM load/store instructions
Load operand from memory into target register LDR load 32 bits LDRH load halfword (16 bit unsigned #) & zero-extend to 32 bits LDRSH load signed halfword & sign-extend to 32 bits LDRB load byte (8 bit unsigned #) & zero-extend to 32 bits LDRSB load signed byte & sign-extend to 32 bits Store operand from register to memory STR store 32-bit word STRH store 16-bit halfword (right-most16 bits of register) STRB : store 8-bit byte (right-most 8 bits of register)
Based on Lecture Notes by Marilyn Wolf
ARM load/store addressing
Addressing modes: base address + offset register indirect :
LDR with second register : LDR with constant : LDR pre-indexed: LDR post-indexed: LDR
r0,[r1] r0,[r1,-r2] r0,[r1,#4] r0,[r1,#4]! r0,[r1],#8
Immediate #offset = 12 bits (2s complement)
Based on Lecture Notes by Marilyn Wolf
ARM Load/Store Code Format
31 28 25 24 23 22 21 20 19 16 15 12 11 0
cond 01 condition for execution
I P U B W L Rn
Rd
Format determined by I bit dest reg
add/sub offset u-byte/ post/preword indexed
update base reg
source reg load/store
11
7 6
5 4
I = 0:
# shifts
11
shift 0
Rm
0
Offset is Rm Offset is immediate
i = 1:
Based on Lecture Notes by Marilyn Wolf
12-bit offset
ARM load/store examples
ldr r1,[r2]
; address = (r2) ldr r1,[r2,#5] ; address = (r2)+5 ldr r1,[r2,#-5] ; address = (r2)-5 ldr r1,[r2,r3] ; address = (r2)+(r3) ldr r1,[r2,-r3] ; address = (r2)-(r3) ldr r1,[r2,r3,SHL #2] ; address=(r2)+(r3 x 4)
Scaled index
Base register r2 is not altered in these instructions
Based on Lecture Notes by Marilyn Wolf
ARM load/store examples
(base register updated by auto-indexing)
ldr r1,[r2,#4]! ldr r1,[r2,r3]! ldr r1,[r2],#4 ldr r1,[r2],[r3]
; use address = (r2)+4 ; r2<=(r2)+4 (pre-index) ; use address = (r2)+(r3) ; r2<=(r2)+(r3) (pre-index) ; use address = (r2) ; r2<=(r2)+4 (post-index) ; use address = (r2) ; r2<=(r2)+(r3) (post-index)
Based on Lecture Notes by Marilyn Wolf
Additional addressing modes
Base-plus-offset addressing: LDR r0,[r1,#16] Loads from location [r1+16] Auto-indexing increments base register: LDR r0,[r1,#16]! Loads from location [r1+16], then sets r1 = r1 + 16 Post-indexing fetches, then does offset: LDR r0,[r1],#16 Loads r0 from [r1], then sets r1 = r1 + 16
Recent assembler addition: SWP{cond} rd,rm,[rn] :swap mem & reg M[rn] -> rd, rd -> M[rn]
Based on Lecture Notes by Marilyn Wolf
ARM ADR pseudo-op
Cannot refer to an address directly in an instruction
(with only 32-bit instruction).
Assembler will try to translate:
LDR Rd,label = LDR Rd,[pc,#offset]
Generate address value by performing arithmetic on PC.
(if address in code section) ADR pseudo-op generates instruction required to calculate address (in code section ONLY)
ADR r1,LABEL (uses MOV,MOVN,ADD,SUB ops)
Based on Lecture Notes by Marilyn Wolf
ARM 32-bit load pseudo-op
LDR r3,=0x55555555 Produces MOV if immediate constant can be found Otherwise put constant in a literal pool
LDR r3,[PC,#immediate-12] .. DCD 0x55555555 ;in literal pool following code
Based on Lecture Notes by Marilyn Wolf
Example: C assignments
C: x = (a + b) - c; Assembler:
ADR LDR ADR LDR ADD ADR LDR SUB ADR STR r4,a r0,[r4] r4,b r1,[r4] r3,r0,r1 r4,c r2,[r4] r3,r3,r2 r4,x r3,[r4] ; ; ; ; ; ; ; ; ; ; get address for a get value of a get address for b, reusing r4 get value of b compute a+b get address for c get value of c complete computation of x get address for x store value of x
Based on Lecture Notes by Marilyn Wolf
Example: C assignment
C: y = a*(b+c); Assembler:
LDR LDR LDR LDR ADD LDR LDR MUL LDR STR r4,=b ; get address for b r0,[r4] ; get value of b r4,=c ; get address for c r1,[r4] ; get value of c r2,r0,r1 ; compute partial result r4,=a ; get address for a r0,[r4] ; get value of a r2,r2,r0 ; compute final value for y r4,=y ; get address for y r2,[r4] ; store y
Based on Lecture Notes by Marilyn Wolf
Example: C assignment
C: z = (a << 2) | Assembler:
LDR LDR MOV LDR LDR AND ORR LDR STR
(b & 15);
a
r4,=a ; get address for r0,[r4] ; get value of a r0,r0,LSL 2 ; perform shift r4,=b ; get address for r1,[r4] ; get value of b r1,r1,#15 ; perform AND r1,r0,r1 ; perform OR r4,=z ; get address for r1,[r4] ; store value for
z z
Based on Lecture Notes by Marilyn Wolf
ARM flow control operations
All operations can be performed conditionally, testing CPSR
(only branches in Thumb/Thumb2): EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE Branch operation: B label
Conditional branch:
Target < 32M(ARM),2K(Thumb),16M(Thumb2)
BNE label
Thumb2 additions (compare & branch if zero/nonzero):
Target < 32M(ARM),-252..+258(T),1M(T2)
CBZ r0,label CBNZ r0,label
Based on Lecture Notes by Marilyn Wolf
;branch if r0 == 0 ;branch if r0 != 0
Example: if statement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition LDR r4,=a ; get address for a LDR r0,[r4] ; get value of a LDR r4,=b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BLE fblock ; if a ><= b, branch to false block
Based on Lecture Notes by Marilyn Wolf
If statement, contd.
; true block MOV r0,#5 LDR r4,=x STR r0,[r4] LDR r4,=c LDR r0,[r4] LDR r4,=d LDR r1,[r4] ADD r0,r0,r1 LDR r4,=y STR r0,[r4] B after ; ; ; ; ; ; ; ; ; ; ; generate value for x get address for x store x get address for c get value of c get address for d get value of d compute y get address for y store y branch around false block
Based on Lecture Notes by Marilyn Wolf
If statement, contd.
; false block fblock LDR r4,=c LDR r0,[r4] lDR r4,=d LDR r1,[r4] SUB r0,r0,r1 LDR r4,=x STR r0,[r4] after ... ; ; ; ; ; ; ; get address for c get value of c get address for d get value for d compute a-b get address for x store value of x
Based on Lecture Notes by Marilyn Wolf
Example: Conditional instruction implementation
; true block MOVLT r0,#5 ADRLT r4,x STRLT r0,[r4] ADRLT r4,c LDRLT r0,[r4] ADRLT r4,d LDRLT r1,[r4] ADDLT r0,r0,r1 ADRLT r4,y STRLT r0,[r4]
; ; ; ; ; ; ; ; ; ;
generate value for x get address for x store x get address for c get value of c get address for d get value of d compute y get address for y store y
Based on Lecture Notes by Marilyn Wolf
Conditional instruction implementation, contd.
; false ADRGE LDRGE ADRGE LDRGE SUBGE ADRGE STRGE block r4,c r0,[r4] r4,d r1,[r4] r0,r0,r1 r4,x r0,[r4] ; ; ; ; ; ; ; get address for c get value of c get address for d get value for d compute a-b get address for x store value of x
Based on Lecture Notes by Marilyn Wolf
Thumb2 conditional execution
(IF-THEN) instruction, IT, supports conditional execution in
Thumb2 of up to 4 instructions in a block
if (r0 > r1) { add r2,r3,r4 sub r3,r4,r5 } else { and r2,r3,r4 orr r3,r4,r5 } Pseudo-C
Designate instructions to be executed for THEN and ELSE Format: ITxyz condition, where x,y,z are T/E/blank
cmp r0,r1 ;set flags ITTEE GT ;condition 4 instr addgt r2,r3,r4 ;do if r0>r1 subgt r3,r4,r5 ;do if r0>r1 andle r2,r3,r4 ;do if r0<=r1 orrle r3,r4,f5 ;do if r0<=r1 Thumb2 code
Based on Lecture Notes by Marilyn Wolf
Example: switch statement
C:
switch (test) { case 0: break; case 1: }
Assembler:
LDR r2,=test ; get address for test LDR r0,[r2] ; load value for test ADR r1,switchtab ; load switch table address LDR r15,[r1,r0,LSL #2] ; index switch table switchtab DCD case0 DCD case1 ...
Based on Lecture Notes by Marilyn Wolf
Example: switch statement with new Table Branch instruction
C:
Branch address = PC + 2*offset from table of offsets Offset = byte (TBB) or half-word (TBH)
switch (test) { case 0: break; case 1: }
Assembler:
LDR r2,=test ; get address for test LDR r0,[r2] ; load value for test TBB [pc,r0] ; add offset byte to PC switchtab DCB (case0 switchtab) >> 1 ;byte offset DCB (case1 switchtab) >> 1 ;byte offset case0 instructions case1 instructions (TBH similar, but with 16-bit offsets/DCI)
Based on Lecture Notes by Marilyn Wolf
Finite impulse response (FIR) filter
c1 c2 x1 x2 c3 x3 c4 x4
f =
1i n
c x
i i
Xis are data samples Cis are constants
Based on Lecture Notes by Marilyn Wolf
Example: FIR filter
C:
for (i=0, f=0; i<N; i++) f = f + c[i]*x[i];
Assembler
; loop initiation MOV r0,#0 MOV r8,#0 LDR r2,=N LDR r1,[r2] MOV r2,#0 LDR r3,=c LDR r5,=x code ; use r0 for I ; use separate index for arrays ; get address for N ; get value of N ; use r2 for f ; load r3 with base of c ; load r5 with base of x
Based on Lecture Notes by Marilyn Wolf
FIR filter, cont.d
; loop body loop LDR r4,[r3,r8] LDR r6,[r5,r8] MUL r4,r4,r6 ADD r2,r2,r4 ADD r8,r8,#4 ADD r0,r0,#1 CMP r0,r1 BLT loop ; ; ; ; ; ; ; ; get c[i] get x[i] compute c[i]*x[i] add into running sum f add word offset to array index add 1 to i exit? if i < N, continue
Based on Lecture Notes by Marilyn Wolf
FIR filter with MLA & auto-index
AREA TestProg, CODE, READONLY ENTRY mov r0,#0 ;accumulator mov r1,#3 ;number of iterations ldr r2,=carray ;pointer to constants ldr r3,=xarray ;pointer to variables loop ldr r4,[r2],#4 ;get c[i] and move pointer ldr r5,[r3],#4 ;get x[i] and move pointer mla r0,r4,r5,r0 ;sum = sum + c[i]*x[i] subs r1,r1,#1 ;decrement iteration count bne loop ;repeat until count=0 here b here carray dcd 1,2,3 xarray dcd 10,20,30 END Also, need time delay to prepare x array for next sample
Based on Lecture Notes by Marilyn Wolf
ARM subroutine linkage
Branch and link instruction: BL foo ;copies current PC to r14. To return from subroutine: BX r14 ; branch to address in r14 or: MOV r15,r14 --Not recommended for Cortex May need subroutine to be reentrant interrupt it, with interrupting routine calling the
subroutine (2 instances of the subroutine) support by creating a stack (not supported directly)
Based on Lecture Notes by Marilyn Wolf
Branch instructions (B, BL)
31 28 27 25 24 23 0
Cond
1 0 1 L
Offset
Link bit
0 = Branch 1 = Branch with link
Condition field
The processor core shifts the offset field left by 2 positions,
sign-extends it and adds it to the PC
How to perform longer branches?
Based on Lecture Notes by Marilyn Wolf
32 Mbyte range(ARM), 16 Mbyte range(Thumb/Thumb2)
Nested subroutine calls
Nested function calls in C:
void f1(int a){ f2(a);} void f2 (int r){ int g; g = r+5; } main () { f1(xyz); }
Based on Lecture Notes by Marilyn Wolf
Nested subroutine calls (1)
Nesting/recursion requires a coding convention to
save/pass parameters:
AREA Code1,CODE LDR r13,=StackEnd MOV r1,#5 STR r1,[r13,#-4]! BL func1 B here
Main
;r13 points to last element on stack ;pass value 5 to func1 ; push argument onto stack ; call func1()
here
Based on Lecture Notes by Marilyn Wolf
Nested subroutine calls (2)
; Function func1() Func1 LDR r0,[r13] ; call func2() STR r14,[r13,#-4]! STR r0,[r13,#-4]! BL func2 ; return from func1() ADD r13,#4 LDR r15, [r13],#4
Based on Lecture Notes by Marilyn Wolf
; load arg into r0 from stack ; store func1s return adrs ; store arg to f2 on stack ; branch and link to f2 ; "pop" func2s arg off stack ; restore register and return
Nested subroutine calls (3)
; Function func2() Func2 BX r14 ;preferred return instruction
; Stack area AREA Data1,DATA Stack SPACE 20 ;allocate stack space StackEnd END
Based on Lecture Notes by Marilyn Wolf
Register usage conventions
Reg r0 r1 r2 r3 r4 r5 r6 r7 Usage* a1 a2 a3 a4 v1 v2 v3 v4 Reg r8 r9 r10 r11 r12 r13 r14 r15 Usage* v5 v6 v7 v8 Ip (intra-procedure scratch reg.) sp (stack pointer) lr (link register) pc (program counter)
* Alternate register designation a1-a4 : argument/result/scratch v1-v8: variables
Based on Lecture Notes by Marilyn Wolf
Saving/restoring multiple registers
LDM/STM load/store multiple registers LDMIA increment address after xfer LDMIB increment address before xfer LDMDA decrement address after xfer LDMDB decrement address before xfer LDM/STM default to LDMIA/STMIA
Examples: ldmia r13!,{r8-r12,r14} ;r13 updated at end stmda r13,{r8-r12,r14} ;r13 not updated at end
Based on Lecture Notes by Marilyn Wolf
ARM assembler new additions
PUSH {reglist} = STMDB sp!,{reglist} POP {reglist} = LDMIA sp!,{reglist}
Based on Lecture Notes by Marilyn Wolf
Mutual exclusion support
Test and set a lock/semaphore for shared data access LDREX Rt,[Rn{,#offset}]
Lock=0 indicates shared resource is unlocked (free to use) Lock=1 indicates the shared resource is locked (in use) read lock value into Rt from memory to request exclusive access to a
STREX Rd,Rt,[Rn{,#offset}] CLREX
resource
Write Rt value to memory and return status to Rd Rd=0 if successful write, Rd=1 if unsuccessful write Force next STREX to return status of 1to Rd (cancels LDREX)
Cortex notes that LDREX has been performed, and waits for
STRTX
Based on Lecture Notes by Marilyn Wolf
Mutual exclusion example
Location Lock is 0 if a resource is free, 1 if not free
try
ldr mov ldrex cmp itt strexeq cmpeq bne
r0,=Lock r1,#1 r2,[r0] r2,#0 eq r2,r1,[r0] r2,#0 try
;point to lock ;prepare to lock the resource ;read Lock value ;is resource unlocked/free? ;next 2 ops if resource free ;store 1 in Lock ;was store successful? ;repeat loop if lock unsuccessful
LDREXB/LDREXH - STREXB/STREXH for byte/halfword Lock
Based on Lecture Notes by Marilyn Wolf
Common assembler directives
Allocate storage and store initial values (CODE area)
Label Label Label Label
DCD value1,value2 allocate word DCW value1,value2 allocate half-word DCB value1,value2 allocate byte SPACE n reserve n bytes (uninitialized)
Allocate storage without initial values (DATA area)
Based on Lecture Notes by Marilyn Wolf
Summary
Load/store architecture Most instructions are RISCy, operate in single cycle. Some multi-register operations take longer. All instructions can be executed conditionally.
Based on Lecture Notes by Marilyn Wolf