This document provides an introduction to the ARM (Acorn/Advanced RISC Machines) architecture. It discusses the history and development of ARM, the different ARM architecture versions, key features like registers and instruction cycles. It also outlines the tools that will be used like the GCC compiler and GAS assembler. The schedule includes labs on topics like Fibonacci, atomic operations and interrupts. QEMU will be used to emulate an ARM Cortex-A9 board for hands-on exercises including booting Linux with u-boot.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
141 views203 pages
Introduction To ARM Systems-11!17!2012
This document provides an introduction to the ARM (Acorn/Advanced RISC Machines) architecture. It discusses the history and development of ARM, the different ARM architecture versions, key features like registers and instruction cycles. It also outlines the tools that will be used like the GCC compiler and GAS assembler. The schedule includes labs on topics like Fibonacci, atomic operations and interrupts. QEMU will be used to emulate an ARM Cortex-A9 board for hands-on exercises including booting Linux with u-boot.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 203
lnLroducuon Lo A8M (Acorn/
Advanced 8lsc Machlnes)
Cananand klnl
!une 18 2012 1 AcknowledgemenLs rof. 8a[eev Candhl, uepL. LCL, Carnegle Mellon unlverslLy rof. uave C'Pallaron, School of CS, Carnegle Mellon unlverslLy xeno kovah uana PuLchlnson uave keppler !lm lrvlng uave WelnsLeln Ceary Suuereld . 2 Co-requlslLes lnLro x86 lnLermedlaLe x86 - would be very helpful 3 8ook(s) A8M SysLem ueveloper's Culde: ueslgnlng and Cpumlzlng SysLem Soware" by Andrew n. Sloss, uomlnlc Symes, and Chrls WrlghL 4 Schedule uay 1 arL 1 lnLro Lo A8M baslcs Lab 1 (llbonaccl Lab) uay 1 arL 2 More of A8Ms feaLures Lab 2 (8CM8 Lab) uay 2 arL 1 A8M hardware feaLures Lab 3 (lnLerrupLs lab) uay 2 arL 1.3 CCC opumlzauon Lab 4 (ConLrol llow Pl[ack Lab) uay 2 arL 2 lnllne and Mlxed assembly ALomlc lnsLrucuons Lab 3 (ALomlc Lab) 3 !"# % &"'( % 6 lnLroducuon SLarLed as a hobby ln mlcroconLrollers ln hlgh school wlLh roboucs 8ackground ln soware developmenL and elecLrlcal englneerlng ln school, Look many courses relaLed Lo mlcro conLrollers and compuLer archlLecLure Small amounL of experlence wlLh assembly 7 CbllgaLory xkCu Source: hup://xkcd.com/676/ 8 ShorL 8evlew short ByteMyShorts[2] = {0x3210, 0x7654} in little endian? Answer: 0x10325476 int NibbleMeInts = 0x4578 in binary, in octal? (no endianness involved) Answers: 0b0100 0101 0111 1000 0b0 100 010 101 111 000 0o42570 (Take 3 bits of binary and represent in decimal) Twos complement of 0x0113 Answer: 0xFEED What does the following code do? (Part of output from gcc at O3) movl (%rsi), %edx movl (%rdi), %eax xorl %edx, %eax xorl %eax, %edx xorl %edx, %eax movl %edx, (%rsi) movl %eax, (%rdi) ret
How can we optimize above for code size? Could this macro be used for atomic operations? 9 We'll learn how and why int main(void) { printf(Hello world!\n); return 0; } 1hls Lurns lnLo. 10 And Lhen lnLo Lhe followlng 11 CeneraLed uslng ob[dump lnLroducuon Lo A8M Acorn CompuLers LLd. (Cambrldge, Lngland) nov. 1990 llrsL called Acorn 8lSC Machlne, Lhen Advanced 8lSC Machlne 8ased on 8lSC archlLecLure work done aL uCal 8erkley and SLanford A8M only sells llcenses for lLs core archlLecLure deslgn Cpumlzed for low power & performance versauleLxpress board wlLh CorLex-A9 (A8Mv7) core wlll be emulaLed" uslng Llnaro bullds. 1hls also means some Lhlngs may noL work. ?ou've been warned.
12 A8M archlLecLure verslons ")*+,-.*-/). 012,34 A8Mv1 A8M1 A8Mv2 A8M2, A8M3 A8Mv3 A8M6, A8M7 A8Mv4 SLrongA8M, A8M71uMl, A8M91uMl A8Mv3 A8M7L!, A8M9L, A8M10L, xscale A8Mv6 A8M11, A8M CorLex-M A8Mv7 A8M CorLex-A, A8M CorLex-M, A8M CorLex-8 A8Mv8 noL avallable yeL. Wlll supporL 64-blL addresslng + daLa A8M ArchlLecLure." Wlklpedla, 1he lree Lncyclopedla. Wlklmedla loundauon, lnc. 3 March 2012. Web. 3 March 2012. 13 A8M LxLra leaLures Slmllar Lo 8lSC archlLecLure (noL purely 8lSC) varlable cycle lnsLrucuons (Lu/S18 muluple) lnllne barrel shler 16-blL (1humb) and 32-blL lnsLrucuon seLs comblned called 1humb2 Condluonal execuuon (reduces number of branches) AuLo-lncremenL/decremenL addresslng modes Changed Lo a Modled Parvard archlLecLure slnce A8M9 (A8Mv3) LxLenslons (noL covered ln Lhls course): 1rusLZone vl, nLCn & SlMu (uS & Mulumedla processlng) 14 8eglsLers 1oLal of 37 reglsLers avallable (lncludlng banked reglsLers): 30 general purpose reglsLers 1 C (program-counLer) 1 CS8 (CurrenL rogram SLaLus 8eglsLer) 3 SS8 (Saved rogram SLaLus 8eglsLer) 1he saved CS8 for each of Lhe ve excepuon modes Several excepuon modes lor now we wlll refer Lo user" mode 13 8eglsLers r0 r1 r2 r3 r4 r3 r6 r7 r8 r9 810 (SL) r11 (l) r12 (l) r13 (S) r14 (L8) CS8 r13 (C) SLack olnLer (S) - 1he address of Lhe Lop elemenL of sLack.
Llnk 8eglsLer (L8) - 8eglsLer used Lo save Lhe C when enLerlng a subrouune.
rogram CounLer (C) - 1he address of 5.6- lnsLrucuon. (A8M mode polnLs Lo currenL+8 and 1humb mode polnLs Lo currenL+4)
CurrenL rogram SLaLus 8eglsLer (CS8) - 8esulLs of mosL recenL operauon lncludlng llags, lnLerrupLs (Lnable/ulsable) and Modes
812 or l ls noL lnsLrucuon polnLer, lL ls Lhe lnLra procedural call scraLch reglsLer
16 lnsLrucuon cycle leLch - feLch nexL lnsLrucuon from memory uecode - decode feLched lnsLrucuon LxecuLe - execuLe feLched lnsLrucuon SLarL Lnd 17 A8M vs. x86 Lndlanness (8l-Lndlan) lnsLrucuons are llule endlan (excepL on Lhe -8 prole for A8Mv7 where lL ls lmplemenLauon dened) uaLa endlanness can be mlxed (depends on Lhe L blL ln CS8) llxed lengLh lnsLrucuons lnsLrucuon operand order ls generally: C uLS1, S8C (A1&1 synLax) ShorL lnsLrucuon execuuon umes 8eglsLer dlerences (CS8, SS8.) Pas a few exLra reglsLers Cperauons only on reglsLers noL memory (Load/SLore archlLecLure) lpellnlng & lnLerrupLs Lxcepuons rocessor Modes Code & Compller opumlzauons due Lo Lhe above dlerences 18 A8M uaLa slzes and lnsLrucuons A8Ms mosLly use 16-blL (1humb) and 32-blL lnsLrucuon seLs 32-blL archlLecLure 8yLe = 8 blLs (nlbble ls 4 blLs) [byLe or char ln x86] Palf word = 16 blLs (Lwo byLes) [word or shorL ln MS x86] Word = 32 blLs (four byLes) [uoubleword or lnL/long ln MS x86] uouble Word = 64 blLs (elghL byLes) [Cuadword or double/long long ln MS x86] Source: hup://sLackoverow.com/quesuons/39419/vlsual-c-how-large-ls-a- dword-wlLh-32-and-64-blL-code 19 1he Llfe of 8lnarles SLarLs wlLh c or cpp source code wrluen by us A *728,3.) Lakes Lhe source code and generaLes assembly lnsLrucuons An 199.2:3.) Lakes Lhe assembly lnsLrucuons and generaLes ob[ecLs or .o les wlLh machlne code 1he 3,5;.) Lakes ob[ecLs and arranges Lhem for execuuon and generaLes an execuLable. (A dynamlc llnker wlll lnserL ob[ecL code durlng runume ln memory) A 371<.) prepares Lhe blnary code and loads lL lnLo memory for CS Lo run 20 1he Lools we wlll use Compller - gcc for A8M Assembler - gcc or as (gas) for A8M Llnker - gcc for A8M or gold Loader - gcc for A8M and ld-llnux for A8M 21 AL ower on. 8CM has code LhaL has been burned ln by SoC vendor (slmllar Lo 8lCS buL noL Lhe same) use of memory mapped lC dlerenL memory componenLs (can be a mlx of 8CM, S8AM, Su8AM eLc.) ConLalns Code for memory conLroller seLup Pardware and perlpheral lnlL (such as clock and umer) A booL loader such as lasLbooL, u-booL, x-Loader eLc. 22 u-8ooL process Source: 8alduccl, lrancesco.hup://balau82.wordpress.com/2010/04/12/booung-llnux-wlLh-u-booL-on-qemu-arm/ 23 u-booL exerclse on a versaule 8 8un Lhe followlng ln ~/pro[ecLs/ubooL- exerclse: qemu-system-arm -M versatilepb -m 128M -kernel flash.bin -serial stdio ash.bln conLalns: u-booL blnary (aL 0x10000 ln lmage) a rooL lesysLem (aL 0x210000 ln lmage) Lhe llnux kernel (aL 0x410000 ln lmage) u-booL has booLm <address> Lo booL code Source: 8alduccl, lrancesco.hup://balau82.wordpress.com/2010/04/12/booung-llnux-wlLh-u-booL-on-qemu-arm/ 24 u-booL exerclse u-booL was paLched ln earller example b/c lL dld noL supporL ramdlsk usage wlLh booLm command. Cood 'nough for slmulauon. u-booL uses booLm <kernel address> <roous lmage address> Lo booL u-booL relocaLes lLself Lo speclc address (0x1000000) before loadlng kernel. Source: 8alduccl, lrancesco.hup://balau82.wordpress.com/2010/04/12/booung-llnux-wlLh-u-booL-on-qemu-arm/ 23 8x w/ CorLex-A9 Memory Map 26 Source: hup://lnfocenLer.arm.com/help/lndex.[sp?Loplc=/com.arm.doc.dul0440b/8ba[lhec.hLml CorLex M3 Memory Map Source: hup://www.[oral.ca/blog/wp-conLenL/uploads/2009/10/CorLexrlmer.pdf 27 A8M ArchlLecLure 28 Source: hup://www.arm.com/les/pdf/armcorLexa-9processors.pdf lnsLrucuon cycle leLch - feLch nexL lnsLrucuon from memory uecode - decode feLched lnsLrucuon LxecuLe - execuLe feLched lnsLrucuon SLarL Lnd 29 8ehavlor of Lhe C/813 C - rogram counLer (llke Lhe x86 Ll) has Lhe address of nexL lnsLrucuon Lo execuLe When execuung an A8M lnsLrucuon, C reads as Lhe address of currenL lnsLrucuon + 8 When execuung a 1humb lnsLrucuon, C reads as Lhe address of currenL lnsLrucuon + 4 When C ls wrluen Lo, lL causes a branch Lo Lhe wrluen address 1humb lnsLrucuons cannoL access/modlfy C dlrecLly 30 1haL means.
00008380 <add>: 8380: b480 push {r7} 8382: b083 sub sp, #12 8384: af00 add r7, sp, #0 8386: 6078 str r0, [r7, #4] 8388: 6039 str r1, [r7, #0] 838a: 687a ldr r2, [r7, #4] 838c:683b ldr r3, [r7, #0] 838e: 18d3 adds r3, r2, r3 8390: 4618 mov r0, r3 8392: f107 070c add.w r7, r7, #12 8396: 46bd mov sp, r7 8398: bc80 pop {r7} 839a: 4770 bx lr =+.5 .6.*/>5? ,59-)/*>75 @ 6ABAC &DEF6FFFFABAG 31 A8M Assembly and some convenuons now uses unled Assembly Language (comblnes A8M & 1humb lnsLrucuon seLs and code allowed Lo have lnLermlxed lnsLrucuons) Ceneral form (Lhere are excepuons Lo Lhls): <Instruction><Conditional>{S bit} <destination> <source> <Shift/ operand/immediate value> Load/SLore archlLecLure means lnsLrucuons only operaLe on reglsLers, nC1 memory MosL of Lhe lnsLrucuons expecL desunauon rsL followed by source, buL noL all. 32 A8M Assembly and some convenuons conLd. <dsL> wlll be desunauon reglsLer <src> wlll be source reglsLer <reg> wlll be any specled reglsLer <lmm> wlll be lmmedlaLe value <reg|cxfz..> whaLever follows '|' means wlLh Lhe specled ag enabled 33 Condluonal llags lndlcaLe lnformauon abouL Lhe resulL of an operauon n - negauve resulL recelved from ALu (8lL 31 of Lhe resulL lf lL ls Lwo's complemenL slgned lnLeger) Z - Zero ag (1 lf resulL ls zero) C - Carry generaLed by ALu v - overow generaLed by ALu (1 means overow) C -overow or saLurauon generaLed by ALu (Sucky ag seL unul CS8 ls overwrluen manually) llags are ln a speclal reglsLer called CS8 (CurrenL rogram SLaLus 8eglsLer) llags are noL updaLed unless used wlLh a sux of S on lnsLrucuon 34 CurrenL/Appllcauon rogram SLaLus 8eglsLer (CS8/AS8) 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
L 8
A 7
l 6
l 3
1 4
3
M 2
C 1
u 0
L N Negative flag Z Zero flag C Carry flag V Overflow flag Q Sticky overflow I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits 33 ush and op operauons uSP <reg llsL> - decremenLs Lhe S and sLores Lhe value ln <reg llsL> aL LhaL locauon C <reg llsL> - SLores Lhe value aL S lnLo <reg llsL> and lncremenLs Lhe S 8oLh operauons only operaLe on S 36 uSP operauon S 0x7Llll930 0x7Llll934 0x7Llll938 0x7Llll93C 0x7Llll938 0x00008330 0x00008330 lnS18uC1lCn: push r7, lr 0x0a012434 0x0a012434 0x00008330 0x0a012434 0x7Llll934 0x0A080C0u 0x00008010 87 L8 0x0A080C0u 0x00008010 0x0A080C0u 0x00008010 0x7Llll930 0x0A080C0u 0x00008010 0x0A080C0u 37 ArlLhmeuc operauons Auu: add <dsL> = <src> + <lmm> or <src> + <reg> AuC: add wlLh carry <dsL> = <src|c> + <lmm> or <src|c> + <reg> Su8: subLracL <dsL> = <src> - <lmm> or <src> - <reg> S8C: subLracL wlLh carry <dsL> = <src|c> - <lmm> or <src|c> - <reg> 8S8: reverse subLracL <dsL> = <lmm> - <src> or <reg> - <src> 8SC: reverse subLracL wlLh carry <dsL> = <lmm|c> - <src> or <reg|c> - <src> 38 Closer look aL Lxample 1.c int main(void) { int a, b, c; a=10; b=12; c=add(a,b); return 0; }
000083c0 <multiplyadd>: 83c0: fb01 2000 mla r0, r1, r0, r2 83c4: 4770 bx lr 83c6: bf00 nop int main(void) { int a, b, c, d; a=2; b=3; c=4; d = multiply(a,b); printf(a * b is %d\n, d); d = multiplyadd(a,b,c); printf(a * b + c is %d\n, d); return 0; }
int multiply(int a, int b) { return (a*b); }
Int multiplyadd(int a, int b, int c) { return ((a*b)+c); } 42 MLA & MLS operauons 80 0x0000000A lnS18uC1lCn: mla r0, r0, r1, r2 mls r0, r0, r1, r2 MLAnS : r0 = r0 r1 + r2 r0 = r2 - (r0 r1) (no ags updaLed) 0x0000000L 0x20000010 81 CS8 0x0000000L 80 81 CS8 0x0000000L 0x0000008l 0x20000010 0x20000010 0x20000010 0x0000000L 0xllllll77 0x0000000A 0x00000003 82 0x00000003 0x00000003 82 0x00000003 43 8efore Cperauon Aer Cperauon ArlLhmeuc operauons parL 3 Sulv - Slgned dlvlde uulv - unslgned dlvlde Cn Lhe CorLex-A prole Lhere ls no dlvlde operauon LLASL nC1L: 1hese lnsLrucuons are only avallable on CorLex-8 prole 44 Lxample x.s 000083e4 <divide>: 83e4: e710f110 sdiv r0, r0, r1 83e8: e12fff1e bx lr 83ec: e1a00000 nop ; (mov r0, r0)
000083f0 <unsigneddivide>: 83f0: e730f110 udiv r0, r0, r1 83f4: e12fff1e bx lr 83f8: e1a00000 nop ; (mov r0, r0) 43 uslng Lhe emulaLor cd ~/projects/linaro ./startsim Password is passw0rd 1o copy <localle> Lo </paLh/Lo/le> on emulaLor: scp P 2200 <localfile> root@localhost:</path/to/file> 1o copy </paLh/Lo/le> from emulaLor Lo <localle>: scp P 2200 root@localhost:</path/to/file> <localfile> 46 ob[dump lnLroducuon dumps Lhe ob[ecLs ln an LLl (LxecuLable Llnkable lormaL) le. ob[ecLs LhaL are ln a form before Lhey are llnked -g gdb opuon for gcc adds debug symbols LhaL ob[dump can read -d opuon for ob[dump used for dlssassembllng (geL assembly code from Lhe LLl formaL) 47 ob[dump usage int main(void) { printf(Hello world!\n); return 0; } helloworld.c objdump d helloworld | less 48 1ry dlvldlng now on Lhe emulaLor CoLo ~/pro[ecLs/examples Copy example1 Lo dlvexample 8eplace Lhe add () funcuon ln example1.c wlLh dlvlde and reLurn (a/b) 8un make clobber && make ulsassemble. ob[dump -d example1 | less WhaL do you see? 49 nC lnsLrucuon A mosL lnLeresung lnsLrucuon conslderlng lL does noLhlng A8M 8eference Manual menuons LhaL Lhls lnsLrucuon does noL relaLe Lo code execuuon ume (lL can lncrease, decrease or leave Lhe execuuon ume unchanged). Why? rlmary purpose ls for lnsLrucuon allgnmenL. (A8M and 1humb lnsLrucuons LogeLher. WhaL could go wrong?) Can also be used as parL of vecLor Lables ln some mlcroconLrollers, lL ls also used for synchronlzauon of plpellne. 30 8arrel Shler Pardware opumlzauon lnllne wlLh Lhe ALu allows for a mulupller (power of 2) wlLhln same lnsLrucuon cycle Allows for shllng a reglsLer value by elLher an unslgned lnLeger (MAxvAL of 32) or a value specled ln bouom byLe of anoLher reglsLer. AS8 - ArlLhmeuc Shl 8lghL (MS8 copled aL le, lasL blL o rlghL ls Carry) LSL - Loglcal Shl Le (0s aL rlghL, lasL blL o le ls Carry) MCv 87, 83, LSL 2 means (87=834) or (83<<2) Auu 80, 81, 81, LSL 1 means 80=81+(81<<1) LS8 - Loglcal Shl 8lghL (0s aL le, lasL blL o rlghL ls Carry) 8C8 - 8oLaLe 8lghL (blLs popped o Lhe rlghL end, ls dlrecLly pushed lnLo le, lasL blL o rlghL ls Carry) 88x - 8oLaLe 8lghL wlLh LxLend (blLs popped o Lhe rlghL end rsL go lnLo Carry, Carry ls shled ln Lo le, lasL blL o rlghL ls Carry) 31 PlnLs on how Lo 81lM S - updaLes ags ln Lhe CS8 <c> - allows mnemonlc of condluonal Lo be added <q> - lnsLrucuon sux wlLh elLher: .n narrow, assembler musL use 16-blL encodlng for Lhe lnLrucuon .W Wlde, assembler musL use 32-blL encodlng for Lhe lnsLrucuon uo noL use Lhe .n or .W ln your assembly code. As per manual, lL wlll Lhrow errors. Cnu Assembler decldes on encodlng dependlng on opuons selecLed. 32 Lxample 3.1.c int main(void) { int a, b, d; a = 6; b = 8; d = multiplybytwo(a) * multiplybytwo(b); printf("2a * 2b is %d\n", d);
return 0; }
int multiplybytwo(int a) { return a*2; } 00008318 <main>: 8318: b508 push {r3, lr} 831a: 2001 movs r0, #1 831c: 22c0 movs r2, #192 ; 0xc0 831e: f248 4100 movw r1, #33792 ; 0x8400 8322: f2c0 0100 movt r1, #0 8326: f7ff efec blx 8300 <_init+0x3c> 832a: 2000 movs r0, #0 832c: bd08 pop {r3, pc} 832e: bf00 nop 000083a8 <multiplybytwo>: 83a8: 0040 lsls r0, r0, #1 83aa: 4770 bx lr 33 Lxample 3.2.c int main(void) { int a, b, d; a = -6; b = 8; d = dividebytwo(a) / dividebytwo(b); printf("a/2 / b/2 is %d\n", d);
return 0; } 39 8everslng byLe order 8Lv - reverses byLe order (& endlanness) of value ln reglsLer and sLores lnLo desunauon reglsLer 8Lv16 - reverses byLe order of each 16-blL halfword ln reglsLer and sLores lnLo desunauon reglsLer 8LvSP - reverses byLe order of lower 16-blL halfword ln reglsLer, slgn exLends Lo 32 blLs and sLore lnLo desunauon reglsLer 60 8Lv & 8Lv16 operauons 80 0xA8CuuLll lnS18uC1lCn: rev r0, r0 rev16 r0, r0 0x20000010 CS8 80 CS8 0x20000010 0x20000010 0x20000010 0xA8CuuLll 0xlluLCuA8 0xCuA8lluL 61 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
L 8
A 7
l 6
l 3
1 4
3
M 2
C 1
u 0
L N Negative flag Z Zero flag C Carry flag V Overflow flag Q Sticky overflow I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits 62 Loglcal & Comparlson operauons Anu - 8lLwlse Anu 8lC - 8lLwlse blL clear LC8 - 8lLwlse Lxcluslve C8 C88 - 8lLwlse C8 C8n - 8lLwlse C8 nC1 CM - Compare. Su8 buL wlLh HI <.9>51>75. (Same as Su8S) CMn - Compare negauve. Auu buL wlLh HI <.9>51>75. (Same as AuuS) 1LC - 1esL Lqulvalence. Llke LC8 buL wlLh HI <.9>51>75. 1S1 - 1esL. Llke Anu buL wlLh HI <.9>51>75. 63 Lxample 7.1.c
000083d0 <and>: 83d0: 4008 ands r0, r1 83d2: 4770 bx int main(void) { int a, b, d; a = 221412523; b = 374719560;
d = and(a,b);
printf("a & b is %d\n", d);
return 0; }
int and(int a, int b) { return (a&b); } 64 Lxample 7.2.c 000083d0 <orr>: 83d0: 4308 orrs r0, r1 83d2: 4770 bx lr int main(void) { int a, b, d; a = 221412523; b = 374719560;
if((a ^ b) > 0) d = add(a,b); else d = subtract(b,a);
printf("a & b is %d\n", d);
return 0; }
int add(int a, int b) { return (a+b); }
int subtract(int a, int b) { return (a-b); } 66 8lC 8lC clears Lhe blLs specled ln a mask lor example, 80 = 0x37 or 0b0101 0111 81 = 0x24 or 0b0010 0100 8lC <82> <80> <81> Means 82 = 80 & ~(81) = 0b0101 0011 or 0x33 Mask can also be a shled value (uslng Shl operauons) 67 Memory operauons arL l Lu8 - Load daLa from memory lnLo reglsLers S18 - SLore daLa from reglsLers Lo memory CaveaL: Lu8/S18 can load/sLore daLa on a boundary allgnmenL LhaL ls Lhe same as Lhe daLa Lype slze belng loaded/sLored. Lu8 can only load 32-blL words on a memory address LhaL ls muluples of 4 byLes. 68 Memory Cperauons arL l conLd. Lu8 r0, [r1] loads r0 wlLh conLenLs of memory address polnLed Lo by r1 S18 r0, [r1] sLores Lhe conLenLs of r0 Lo Lhe memory address polnLed Lo by r1. Warnlng: 1hls can be confuslng slnce desunauon ls acLually specled ln Lhe second argumenL Also Lu8 r0, [r1, 4] means r0 = [r1 + 4] and r1 value remalns unchanged Slmllarly S18 r0, [r1, 4] means [r1+4] = r0 and r1 value remalns unchanged 1he above Lwo lnsLrucuons addresslng mode ls called pre-lndexed addresslng 69 Lxample 8.c 0000838c <main>: 838c: b580 push {r7, lr} 838e: b084 sub sp, #16 8390: af00 add r7, sp, #0 8392: f04f 0308 mov.w r3, #8 8396: 607b str r3, [r7, #4] 8398: f04f 0309 mov.w r3, #9 839c: 60fb str r3, [r7, #12] 839e: f107 0304 add.w r3, r7, #4 83a2: 60bb str r3, [r7, #8] 83a4: 68bb ldr r3, [r7, #8] 83a6: 681b ldr r3, [r3, #0] 83a8: f103 0302 add.w r3, r3, #2 83ac: 60fb str r3, [r7, #12] 83ae: f248 4330 movw r3, #33840 ; 0x8430 83b2: f2c0 0300 movt r3, #0 83b6: 4618 mov r0, r3 83b8: 68b9 ldr r1, [r7, #8] 83ba: f7ff ef92 blx 82e0 <_init+0x20> 83be: f248 434c movw r3, #33868 ; 0x844c 83c2: f2c0 0300 movt r3, #0 83c6: 4618 mov r0, r3 83c8: 68f9 ldr r1, [r7, #12] 83ca: f7ff ef8a blx 82e0 <_init+0x20> 83ce: f04f 0300 mov.w r3, #0 83d2: 4618 mov r0, r3 83d4: f107 0710 add.w r7, r7, #16 83d8: 46bd mov sp, r7 83da: bd80 pop {r7, pc} int main(void) { int a, b; int *x; a = 8; b = 9;
x = &a; b = *x + 2; printf("The address of a is 0x%x\n",x); printf("The value of b is now %d\n",b); return 0; } 70 Memory operauons arL l conLd. 87 ln Lhe prevlous example ls known as :19. 1<<).99 ).?,9-.), where Lhe base address reglsLer can by any one of 80-812, S, or L8 We wlll cover consecuuve muluple loads ln one lnsLrucuon laLer 71 ConLrol llow operauons (1able A4-1) J59-)/*>75 !.9*),8>75 (+/2: 27<. )15?. "'K 27<. )15?. 8 <label> 8ranch Lo LargeL address +/- 16 M8 +/- 32 M8 8L, 8Lx <lmm> Call a subrouune Call a subrouune, change lnsLrucuon seL +/- 16 M8 +/- 32 M8 8Lx <reg> Call a subrouune, !"#!$%&&' change lnsLrucuon seL Any Any 8x 8ranch Lo LargeL address, change lnsLrucuon seL Any Any C8Z Compare and 8ranch on Zero 0-126 byLes uoes noL exlsL C8nZ Compare and 8ranch on nonzero 0-126 byLes uoes noL exlsL 188 1able 8ranch (byLe oseLs) 0-310 byLes uoes noL exlsL 18P 1able 8ranch (halfword oseLs) 0-131070 byLes uoes noL exlsL 72 Condluonal 8ranchlng 8LL: 8ranch lf less Lhan or equal Z=1 C8 n=v 8C1: 8ranch lf greaLer Lhan Z=0 Anu n=v 8LC: 8ranch lf equal Z=1 8nL: 8ranch lf noL equal Z=0 Pow do n and v ags Lell us lf someLhlng ls less or greaLer Lhan? Cenerally Lhere ls a CM or 1S1 lnsLrucuon before CM <r0> <r1> means perform <r0> - <r1> 73 Lxample 9.s 0000835c <__libc_csu_init>: 835c: e92d 43f8 stmdb sp!, {r3, r4, r5, r6, r7, r8, r9, lr} 8360: 4606 mov r6, r0 8362: f8df 9034 ldr.w r9, [pc, #52] ; 8398 <__libc_csu_init+0x3c> 8366: 460f mov r7, r1 8368: 4d0c ldr r5, [pc, #48] ; (839c <__libc_csu_init+0x40>) 836a: 4690 mov r8, r2 836c: 44f9 add r9, pc 836e: f7ff ff91 bl 8294 <_init> 8372: 447d add r5, pc 8374: ebc5 0909 rsb r9, r5, r9 8378: ea5f 09a9 movs.w r9, r9, asr #2 837c: d009 beq.n 8392 <__libc_csu_init+0x36> 837e: 2400 movs r4, #0 8380: f855 3b04 ldr.w r3, [r5], #4 8384: 4630 mov r0, r6 8386: 4639 mov r1, r7 8388: 4642 mov r2, r8 838a: 3401 adds r4, #1 838c: 4798 blx r3 838e: 454c cmp r4, r9 8390: d1f6 bne.n 8380 <__libc_csu_init+0x24> 8392: e8bd 83f8 ldmia.w sp!, {r3, r4, r5, r6, r7, r8, r9, pc} 8396: bf00 nop 8398: 00008ba0 .word 0x00008ba0 839c: 00008b96 .word 0x00008b96 74 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
L 8
A 7
l 6
l 3
1 4
3
M 2
C 1
u 0
L N Negative flag Z Zero flag C Carry flag V Overflow flag Q Sticky overflow I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits 73 Pello, World ln A8M Assembly .text _start: .global _start
Syscall lnvoked wlLh SWl/SvC lnsLrucuon (supervlsor mode) Source: hup://peLerdn.com/posL/e28098Pello-Worlde28099-ln-A8M-assembly.aspx 76 lnsLrucuons covered so far. nC Auu, AuC, Su8, S8C, 8S8, 8SC AS8, LSL, LS8, 8C8, 88x MCv, Mvn 8Lv, 8LvSP, 8Lv16 Anu, LC8, C88, C8n, CM, CMn 8lC, 1LC, 1S1 8, 8L, 8Lx, 8LL, 8C1 SWl 77 PlnLs on how Lo 81lM S - updaLes ags ln Lhe CS8 <c> - allows mnemonlc of condluonal Lo be added <q> - lnsLrucuon sux wlLh elLher: .n narrow, assembler musL use 16-blL encodlng for Lhe lnLrucuon .W Wlde, assembler musL use 32-blL encodlng for Lhe lnsLrucuon uo noL use Lhe .n or .W ln your assembly code. As per manual, lL wlll Lhrow errors. Assembler decldes on encodlng dependlng on opuons selecLed. 78 Lab 1 Agaln commands glven below for copylng les lnLo and ouL of Lhe slmulaLor scp P 2200 <localfile> root@localhost:/path/to/file scp P 2200 root@localhost:/path/to/file <localfile> Password is passw0rd llbonaccl program WrlLe assembly funcuon Lo calculaLe bonaccl value aL a glven posluon x 80 has x lor example: [0, 1, 2, 3, 4, 3, 6 .] x [0, 1, 1, 2, 3, 3, 8 .] bonaccl(x) Cnly modlfy b.s 79 Sample algorlLhms // Non-recursive int fibonacci(int x) { int previous = -1; int result = 1; int i=0; int sum=0; for (i = 0; i <= x; i++) { sum = result + previous; previous = result; result = sum; } return result; } // Recursive int fibonacci(int x) { if(x<=0) return 0; if(x==1) return 1; return fibN(x-1) + fibN(x-2); } nC1L: llller code follows 8ecurslve algorlLhm. 80 osslble soluuon fibonacci: push {r3, r4, r5, lr} ; function prolog subs r4, r0, #0 ; r4 = r0 - 0 ble .L3 ; if (r0 <= 0) goto .L3
cmp r4, #1 ; Compare r4 to 1 beq .L4 ; if (r4 == 1) goto .L4
88 A noLe on Lu8/S18 lor loadlng large consLanLs lnLo reglsLers, Lhe assembler generally prefers uslng MCvn <8d>, <~large consLanL> (~ ls 8lLwlse nC1) Assembler llkes Lo use values beLween 0 and 233 along wlLh barrel shls Lo arrlve aL value Lxample: lnsLead of: LDR R0, #ffffff23 MOVN R0, #0xDC
89 CLher lnsLrucuons SSA1 <reg1> <lmm> <reg2> - Slgned SaLuraLe uSA1 <reg1> <lmm> <reg2> - unslgned SaLuraLe CAuu <reg1> <reg2> <reg3> - Add & saLuraLe Lhe resulL (<reg1> = saL(<reg2> + <reg3>) CSu8 -SubLracL & saLuraLe Lhe resulL <reg1> = saL(<reg2> - <reg3>) CuAuu - SaLuraLe uouble & Add <reg1>=saL (<reg2> + 2<reg3>) CuSu8 - <reg1> = saL(<reg2> - 2<reg3>) 90 ConLrol llow operauons (1able A4-1) J59-)/*>75 !.9*),8>75 (+/2: 27<. )15?. "'K 27<. )15?. 8 <label> 8ranch Lo LargeL address +/- 16 M8 +/- 32 M8 8L, 8Lx <lmm> Call a subrouune Call a subrouune, change lnsLrucuon seL +/- 16 M8 +/- 32 M8 8Lx <reg> Call a subrouune, !"#!$%&&' change lnsLrucuon seL Any Any 8x 8ranch Lo LargeL address, change lnsLrucuon seL Any Any C8Z Compare and 8ranch on Zero (16-blL) ermlued oseLs are even from 0 - 126 +4 Lo +130 byLes uoes noL exlsL C8nZ Compare and 8ranch on nonzero (16-blL) ermlued oseLs are even from 0 - 126 +4 Lo +130 byLes uoes noL exlsL 188 1able 8ranch (byLe oseLs) (32-blL) 0-310 byLes uoes noL exlsL 18P 1able 8ranch (halfword oseLs) (32-blL) 0-131070 byLes uoes noL exlsL 91 Condluonal execuuon MosL lnsLrucuons can be made condluonal by addlng Lwo leuer mnemonlc from Lable A8-1 Lo end of an exlsung lnsLrucuon lL lncreases performance by reduclng Lhe of branches Lxample: AuuLC r0, r1, r2 lf zero ag ls seL Lhen r0=r1+r2 92 Condluonal operauons (1able A8-1) O/P6 !.9*),8>75 031?9 -.9-.< LC Lqual Z=1 nL noL Lqual Z=0 CS/PC unslgned hlgher or same C=1 CC/LC unslgned lower C=0 Ml Mlnus n=1 L osluve or Zero n=0 vS Cverow v=1 vC no overow v=0 Pl unslgned Plgher C=1 Anu Z=0 LS unslgned lower or same C=0 C8 Z=1 CL CreaLer or equal n=v L1 Less Lhan n=v C1 CreaLer Lhan Z=0 Anu n=v LL Less Lhan or equal Z=1 C8 n=v AL Always 93 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
L 8
A 7
l 6
l 3
1 4
3
M 2
C 1
u 0
L N Negative flag Z Zero flag C Carry flag V Overflow flag Q Sticky overflow I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits 94 lpellnlng uoes noL decrease lnsLrucuon execuuon ume lncreases LhroughpuL 1lme allocaLed dependenL on longesL cycle lnsLrucuon leLches and decodes lnsLrucuons ln parallel whlle execuung currenL lnsLrucuon. Source: hup://www-cs-faculLy.sLanford.edu/~eroberLs/courses/soco/pro[ecLs/2000-01/rlsc/plpellnlng/ lndex.hLml
Also see hup://www.cse.unsw.edu.au/~cs9244/06/semlnars/08-leonldr.pdf 93 lpellnlng ln acuon Source: hup://web.eecs.umlch.edu/~prabal/Leachlng/eecs373-f10/readlngs/A8MArchlLecLureCvervlew.pdf
96 lssues assoclaLed wlLh plpellnlng 8ranch lnsLrucuons Condluonal execuuon reduces number of branches, whlch reduces of plpellne ushes lnsLrucuons dependenL on prevlous lnsLrucuons (daLa-dependency) lnLerrupLs ln Lhe beglnnlng/mlddle/end of cycle? Pow code ls opumlzed for plpellnlng ls compller & processor dependenL Source: hup://bnrg.eecs.berkeley.edu/~randy/Courses/CS232.S96/LecLure08.pdf 97 CLher ways of branchlng Lu8 C, [C, oseL] value wrluen has Lo be allgned for mode Larller processors (armv4 and earller) used Lo have prefeLch C polnLs Lwo lnsLrucuons ahead rogrammer has Lo accounL for C+8 SLore address of branch locauon aL currenL address + oseL + 8 Same Lradluon conunues for all arm archlLecLures so far Source: hup://en.wlklpedla.org/wlkl/LlsLofA8Mmlcroprocessorcores 98 Lxample 12.s 0x10000000 add r0, r1, r2 0x10000004 ldr pc, [pc, #4] 0x10000008 sub r1, r2, r3 0x1000000c cmp r0, r1 0x10000010 0x20000000 ! Branch target 0x20000000 str r5, [r13, -#4]! 99 CnL lnsLrucuon Lo rule Lhem all.. LuM/S1M - Load muluple/SLore muluple used ln con[uncuon wlLh a sux (called mode) for how Lo move consecuuvely LowesL reglsLer uses Lhe lowesL memory address 100 LuM/S1M modes K7<. O+7)- <.9*),8>75 Q!K 9457542 O(K 9457542 O-1)- "<<).99 R5< "<<).99
'5N lA lncremenL Aer =0, u=1 =0, u=1 8n 8n +4n-4 8n+4n l8 lncremenL 8efore =1, u=1 =1, u=1 8n+4 8n+4n 8n+4n uA uecremenL aer =0, u=0 =0, u=0 8n-4n +4 8n 8n-4n u8 uecremenL before =1, u=0 =1, u=0 8n-4n 8n-4 8n-4n lA lull Ascendlng uA l8 LA LmpLy Ascendlng u8 lA lu lull uescendlng lA u8 Lu LmpLy uescendlng l8 uA n ls Lhe number of reglsLers n goes from 1..n 101 SLack operauons lnsLead of C, we use Load-Muluple lnsLead of uSP, we use SLore-Muluple SLacks can be (A)scendlng - sLack grows Lo hlgher memory addresses (u)escendlng - sLack grows Lo lower memory addresses 102 LuM/S1M palrs O-7). K/3>83. Q71< K/3>83. S1MlA LuMu8 S1Ml8 LuMuA S1MuA LuMl8 S1Mu8 LuMlA 103 S1Mu8 operauon 87 0xl00u0000 0x00008018 S 0x8000 0x8004 0x8008 lnS18uC1lCn: S1Mu8 sp, r3, r4, r3, r7 0x800C 0x8010 0x8014 0x8018 83 0xlLLu0000 84 0x0000CAlL 83 0xA8CuuLll 0xA8CuuLll S 0x00008008 0x0000CAlL 0xlLLu0000 0xl00u0000 104 LuMlA operauon 87 0xl00u0000 0x00008018 S 0x8000 0x8004 0x8008 lnS18uC1lCn: LuMlA sp, r3, r4, r3, r7 0x800C 0x8010 0x8014 0x8018 83 0xlLLu0000 84 0x0000CAlL 83 0xA8CuuLll 0xA8CuuLll S 0x00008008 0x0000CAlL 0xlLLu0000 0xl00u0000 103 Lxample 13.s 0000835c <__libc_csu_init>: 835c: e92d 43f8 stmdb sp!, {r3, r4, r5, r6, r7, r8, r9, lr} 8360: 4606 mov r6, r0 8362: f8df 9034 ldr.w r9, [pc, #52] ; 8398 <__libc_csu_init+0x3c> 8366: 460f mov r7, r1 8368: 4d0c ldr r5, [pc, #48] ; (839c <__libc_csu_init+0x40>) 836a: 4690 mov r8, r2 836c: 44f9 add r9, pc 836e: f7ff ff91 bl 8294 <_init> 8372: 447d add r5, pc 8374: ebc5 0909 rsb r9, r5, r9 8378: ea5f 09a9 movs.w r9, r9, asr #2 837c: d009 beq.n 8392 <__libc_csu_init+0x36> 837e: 2400 movs r4, #0 8380: f855 3b04 ldr.w r3, [r5], #4 8384: 4630 mov r0, r6 8386: 4639 mov r1, r7 8388: 4642 mov r2, r8 838a: 3401 adds r4, #1 838c: 4798 blx r3 838e: 454c cmp r4, r9 8390: d1f6 bne.n 8380 <__libc_csu_init+0x24> 8392: e8bd 83f8 ldmia.w sp!, {r3, r4, r5, r6, r7, r8, r9, pc} 8396: bf00 nop 8398: 00008ba0 .word 0x00008ba0 839c: 00008b96 .word 0x00008b96 106 SwlLchlng beLween A8M and 1humb sLaLes A processor ln 1humb can enLer A8M sLaLe by execuung any of Lhe followlng: 8x, 8Lx, or Lu8/LuM operauon on C (813) A processor ln A8M can enLer 1humb sLaLe by execuung any of Lhe followlng: AuC, Auu, Anu, AS8, 8lC, LC8, LSL, LS8, MCv, Mvn, C88, 8C8, 88x, 8S8, 8SC, S8C, or Su8 operauon on C (813) and whlch does noL seL Lhe condluon ags. 107 1humb2 lnsLrucuon seL means . 1he lnsLrucuons ln 1humb2 lLself are a mlx of 16-blL and 32- blL lnsLrucuons and are run ln 1humb-mode Compller opuon Lo mlx A8M-mode and 1humb-mode lnsLrucuons: -m-Lhumb-lnLerwork uefaulL ls -mno-Lhumb-lnLerwork 1he xeno Cuesuon - So how can we Lell Lhe dlerence? Menuoned ln Lhe A1CS manual (lncluded ln Lhe references) 1he LS8 (rlghLmosL blL) of branch address has Lo be 1 lf Lhe lnsLrucuons aL LhaL address are Lo be lnLerpreLed as 1humb2 lf you wanL Lo [ump Lo address conLalnlng a mlx of 16-blL and 32-blL lnsLrucuons make sure Lhe address ls odd. 108 Pow does 1humb mode dlerenuaLe b/w 16-blL and 32-blL lnsLrucuons? ln 1humb mode A8M processor only reads halfword-allgned halfwords Looks aL lnsLrucuon encodlng: lf blLs 13:11 of Lhe halfword belng decoded ls one of followlng, Lhen lL ls Lhe rsL halfword of a 32 blL lnsLrucuon 0b11101 0b11110 0b11111 CLherwlse, lL ls lnLerpreLed as 16-blL lnsLrucuon 109 A8M-1humb rocedure Call SLandard lollowed by compllers Caller saved reglsLers: 1he caller subrouune musL preserve Lhe conLenLs of 80 - 83 lf lL needs Lhem before calllng anoLher subrouune Callee saved reglsLers: 1he called subrouune musL preserve Lhe conLenLs of 84 - 811 (usually on Lhe sLack ln memory) and musL resLore Lhe values before reLurnlng (lf used). WhaL abouL lnLerrupLs? 110 A1CS '.?,9-.) O457542 O8.*,13 '73. ,5 -+. 8)7*.</). *133 9-15<1)< r13 C 1he rogram CounLer. (x86 Ll) r14 L8 1he Llnk 8eglsLer. (x86 saved Ll) r13 S 1he SLack olnLer. (x86 LS) r12 l 1he lnLra-rocedure-call scraLch reglsLer. (x86 8Sl) r11 v8 varlable-reglsLer 8/lrame olnLer (x86 L8) r10 v7 varlable-reglsLer 7/SLack LlmlL r9 v6/S8/18 lauorm speclc reglsLer. r8 v3 varlable-reglsLer 3. r7 v4 varlable-reglsLer 4. (can also be x86 L8) r6 v3 varlable-reglsLer 3. r3 v2 varlable-reglsLer 2. r4 v1 varlable-reglsLer 1. r3 a4 ArgumenL/scraLch reglsLer 4. r2 a3 ArgumenL/scraLch reglsLer 3. r1 a2 ArgumenL/resulL/scraLch reglsLer 2. r0 a1 ArgumenL/resulL/scraLch reglsLer 1. 111 A1CS r0 r1 r2 r3 r4 r3 r6 r7 r8 r9 810 (SL) r11 (l) r12 (l) r13 (S) r14 (L8) CS8 r13 (C) Caller saved Callee saved SLack olnLer should be same upon Callee reLurn as lL was upon Callee enLry. So should Lhe Llnk 8eglsLer l ls nelLher mandaLed nor precluded from use. lf lL ls used, lL musL be Callee saved. ln A8M sLaLe, 811 ls used. ln 1humb sLaLe, 84-87 can be used. 112 A1CS ln acuon int main(void) { one(); return 0; }
void zero(void) { return; } int main(void) { r0-r3 saved. call to one() made. }
void one(void) { r4-r11 saved. lr saved fp saved to point to one above lr in stack // use r0-r3 as arguments two(); r4-r11, lr restored bx lr (branches to lr) }
113 So, how does Lhls sLack up? (pun lnLended) . maln() frame" undened undened J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() 114 8ranch wlLh Llnk occurs Lo one() . maln() frame" undened undened J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() rocessor coples C lnLo L8 SeLs C = one() ln memory 113 A8M now execuung rsL lnsLrucuon ln one() . maln() frame" Cne() frame" undened J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() Callee-save reglsLers are pushed onLo sLack uslng S1Mlu sp, reglsLers along wlLh 814 (L8)
Local varlables Caller-save reglsLers Args Lo Cne() Local varlables are also added Lo Lhe sLack Callee-save reglsLers Local varlables L8 = Cmaln 117 C now abouL Lo branch Lo Lwo() . maln() frame" Cne() frame" undened J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() Caller-save reglsLers for one() are saved. ArgumenLs Lo Lwo are also pushed Callee-save reglsLers Local varlables L8 = Cmaln Caller-save reglsLers Args Lo 1wo() 118 8ranch wlLh Llnk occurs Lo Lwo() . maln() frame" Cne() frame" 1wo() frame" J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() Callee-save reglsLers Local varlables L8 = Cmaln Caller-save reglsLers Args Lo 1wo() rocessor coples C lnLo L8 SeLs C = one() ln memory 119 A8M now execuLes rsL lnsLrucuon ln Lwo() . maln() frame" Cne() frame" 1wo() frame" J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() Callee-save reglsLers Local varlables L8 = Cmaln() Caller-save reglsLers Args Lo 1wo() Saves Lhe callee-save reglsLers Also saves Lhe 814(Llnk 8eglsLer) Callee-save reglsLers L8 = CCne() 120 So, how dld lL sLack up? Slmllar Lo x86 ln some ways. Powever, 811(l) ls noL really used much. S ls updaLed uslng S1Mlu and LuMlu uesplLe Lhe reLurn address belng saved ln Lhe L8, mosL oen lL ls puL on Lhe sLack and Lhen resLored laLer dlrecLly lnLo C Whlch may help you ln Lab 3. 121 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
L 8
A 7
l 6
l 3
1 4
3
M 2
C 1
u 0
L N Negative flag Z Zero flag C Carry flag V Overflow flag Q Sticky overflow I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits XK7<. LYZFM K7<. 10000 user 10001 llC 10010 l8C 10011 SvC (Supervlsor) 10111 AborL 11011 undened 11111 SysLem 122 Cenerlc A8M Modes user: normal program execuuon mode llC: used for handllng a hlgh prlorlLy (fasL) lnLerrupL l8C: used for handllng a low prlorlLy (normal) lnLerrupL Supervlsor: enLered on board reseL and when a Soware lnLerrupL lnsLrucuon ls execuLed AborL: used for handllng memory access vlolauons SysLem: a prlvlleged mode uslng same reglsLers as user mode 123 8anked 8eglsLers r0 r1 r2 r3 r4 r3 r6 r7 r8 r9 r10 r11 (l) r12 (l) r13 (S) r14 (L8) CS8 r13 (C) r8 r9 r10 r11 r12 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 8anked reglsLers are preserved across mode changes. 124 Arm rocessor modes user: normal program execuuon mode llC: used for handllng a hlgh prlorlLy (fasL) lnLerrupL l8C: used for handllng a low prlorlLy (normal) lnLerrupL Supervlsor: enLered on reseL and when SWl (soware lnLerrupL lnsLrucuon) ls execuLed AborL: used for handllng memory access vlolauons undened: used for handllng undened lnsLrucuons SysLem: a prlvlleged mode LhaL uses Lhe same reglsLers as Lhe user mode 123 A8Mv7 rocessor modes (1able 81-1) &)7*.997) 27<. R5*7<,5? &),W,3.?. Q.W.3 J283.2.5-.< O.*/),-4 O-1-. J59-)/*>75SD75<,>75 U,[ 1W1,31:3.V user usr 10000 L0 Always 8oLh llC q 10001 L1 Always 8oLh ln1L88u1 l8C lrq 10010 L1 Always 8oLh ln1L88u1 Supervlsor svc 10011 L1 Always 8oLh SvC/SWl MonlLor mon 10110 L1 SecurlLy LxLenslons (1rusLZone) Secure only SMC/Secure MonlLor Call LxCL1lCn AborL abL 10111 L1 Always 8oLh uaLa/refeLch AborL LxCL1lCn Pyp hyp 11010 L2 vlrLuallzauon LxLenslons non-secure only PvC/LxCL1lCn undened und 11011 L1 Always 8oLh unuLllnLu SysLem sys 11111 L1 Always 8oLh 126 Mode changlng lnsLrucuons SvC - Supervlsor Call or SWl - SoWare lnLerrupL Changes mode Lo Supervlsor mode SMC - Secure MonlLor Call Changes mode Lo Secure (wlLh 1rusLZone) PvC - Pypervlsor Call Changes mode supervlsor (wlLh hardware vlrLuallzauon exLensuons) 127 SwlLchlng modes Speclc lnsLrucuons for swlLchlng beLween processor modes (SvC/SWl eLc.) PvC (Pypervlsor call) only avallable wlLh speclc hardware supporL SMC (Secure MonlLor call) also only avallable only wlLh speclc hardware supporL (1rusLZone) MCvS C, L8 (coples SS8 Lo CS8/AS8) Llnux kernel and oLher 81CS (rlch feaLured" CS) run ln Supervlsor mode generally 8emember Lhe SWl from Pello World? 128 Speclal lnsLrucuons Su8S C, L8, <lmm> SubLracLs <lmm> value from L8 and branches Lo resulung address lL also coples SS8 Lo CS8 MCvS C, L8 Same as above buL branches Lo address ln L8 and also coples SS8 Lo CS8 lor use ln reLurnlng Lo user/SysLem mode from excepuon/lnLerrupL modes 129 Pow Lo read/wrlLe SLaLus reglsLers CS8 and AS8 value can be saved lnLo reglsLer MS8 - Move Lo Speclal reglsLer from A8M core reglsLer Lxample: msr <cpsr/apsr> <r0> M8S - Move Lo A8M core 8egslLer from speclal reglsLer Lxample: mrs <r0> <cpsr/apsr> 130 SC1L8 8eglsLer SysLem ConLrol 8eglsLer: parL of Coprocessor C13 reglsLers Allows conLrolllng sysLem wlde sengs such as: Mode (A8M/1humb) for excepuons 8ase address for excepuon vecLor Lable noL fully emulaLed ln kvm/qemu ulerenL for dlerenL processor proles ConLrols excepuon handllng congurauons WheLher excepuons should be handled ln A8M sLaLe or 1humb sLaLe 131 SC1L8 8eglsLer 1hese sengs are only avallable on CorLex-8 and noL on any oLhers SC1L8.uZ = 0 means a ulvlde-by-Zero reLurns zero resulL SC1L8.uZ = 1 means a ulvlde-by-Zero generaLes and undened lnsLrucuon excepuon lL blL glves lnsLrucuon endlanness as lmplemenLed and ls 8LAu CnL? 132 Cnu uebugger (Cu8) lnLro 1he Cnu debugger ls a command llne debugglng Lool A graphlcal fronLend ls also avallable called ddd 133 Cnu uebugger (Cu8) lnLro SLarL gdb uslng: gdb <blnary> ass lnlual commands for gdb Lhrough a le gdb <blnary> -x <lnlulle> lor help help 1o sLarL runnlng Lhe program run or r <argv> 134 Cu8 lnlual commands Cne posslble seL of lnlual commands: b maln run dlsplay/10l pc dlsplay/x r0 dlsplay/x r1 dlsplay/x r2 dlsplay/x r3 dlsplay/x r4 dlsplay/x r3 dlsplay/x r6 dlsplay/x r7 dlsplay/x r11 dlsplay/32xw sp dlsplay/32xw cpsr dlsplay/formaL sLrlng - prlnLs Lhe expresslon followlng Lhe command every ume debugger sLops
formaL sLrlng lnclude Lwo Lhlngs: CounL - repeaL specled number of slze elemenLs lormaL - formaL of how whaLever ls dlsplayed
x (hexadeclmal), o(ocLal), d(declmal), u(unslgned declmal), L(blnary), f(oaL), a(address), l(lnsLrucuon), c (char) and s(sLrlng).
Slze leuers are b(byLe), h(halfword), w(word), g(glanL, 8 byLes).
1hese commands can be enLered lnLo Lhe lnlL le, and helps Lo see Lhe values ln Lhe reglsLers aer execuung each sLaLemenL or seL of sLaLemenLs. 133 Cu8 8reakpolnLs 1o puL breakpolnLs (sLop execuuon on a cerLaln llne) b <funcuon name> b <lnsLrucuon address> b <lename:llne number> b <llne number> 1o show breakpolnLs lnfo b 1o remove breakpolnLs clear <funcuon name> clear <lnsLrucuon address> clear <lename:llne number> clear <llne number> 136 Cu8 examlnlng varlables/memory Slmllar Lo dlsplay, Lo look aL conLenLs of memory use examlne" or x" command x/32xw <memory locauon> Lo see memory conLenLs aL memory locauon, showlng 32 hexadeclmal words x/3s <memory locauon> Lo show 3 sLrlngs (null LermlnaLed) aL a parucular memory locauon x/10l <memory locauon> Lo show 10 lnsLrucuons aL parucular memory locauon
137 Cu8 dlsassembly & llsung Lhlngs Can see dlsassembly lf complled wlLh gdb symbols opuon ln gcc (-ggdb) dlsass <funcuon name> Can see breakpolnLs lnfo breakpolnLs Can see reglsLers lnfo reg 138 Cu8 sLepplng 1o sLep one lnsLrucuon sLepl or sl 1o conunue ull nexL breakpolnL Conunue or c 1o see backLrace backLrace or bL 139 Lab 2 use of gdb and your knowledge of A8M assembly Lo sLop ur. Lvll gdb -x <lnlulle> bomb (Can opuonally speclfy lnlual commands le uslng -x) b explodebomb() (breakpolnL aL explodebomb) dlsass phase1 (Lo see phase1 code) lnfo reg Lo see all reglsLers llnd Lhe rlghL lnpuLs Lo defuse lL Cu8 cheaL sheeL on /home/arm/ueskLop Shl + gup Lo scroll up and Shl + guown Lo scroll down
140 !"# C &"'( % 141 ConLrol llow operauons (1able A4-1) J59-)/*>75 !.9*),8>75 K.15,5? 8 <label> 8ranch Lo label C = &label 8L <label> 8ranch Lo label wlLh llnk reglsLer L8 = C+4 C = &label 8Lx <8m or lmm> 8ranch exchange wlLh llnk reglsLer L8 = & of lnsLr. aer 8Lx lnsLr. C = 8m & 0xlllllllL 1 blL = 8m & 1 8x <8m or lmm> 8ranch exchange L8 = & of lnsLr. aer 8Lx lnsLr. C = 8m & 0xlllllllL 1 blL = 8m & 1 Source: hup://www.slldeshare.neL/guesL36d1b781/arm-fundamenLals 142 ConLrol llow operauons (1able A4-1) J59-)/*>75 !.9*),8>75 (+/2: 27<. )15?. "'K 27<. )15?. 8 <label> 8ranch Lo LargeL address +/- 16 M8 +/- 32 M8 8L, 8Lx <lmm> Call a subrouune Call a subrouune, change lnsLrucuon seL +/- 16 M8 +/- 32 M8 8Lx <reg> Call a subrouune, !"#!$%&&' change lnsLrucuon seL Any Any 8x 8ranch Lo LargeL address, change lnsLrucuon seL Any Any C8Z Compare and 8ranch on Zero (16-blL) ermlued oseLs are even from 0 - 126 +4 Lo +130 byLes uoes noL exlsL C8nZ Compare and 8ranch on nonzero (16-blL) ermlued oseLs are even from 0 - 126 +4 Lo +130 byLes uoes noL exlsL 188 1able 8ranch (byLe oseLs) (32-blL) 0-310 byLes uoes noL exlsL 18P 1able 8ranch (halfword oseLs) (32-blL) 0-131070 byLes uoes noL exlsL 143 More Lu8/S18 lnsLrucuons Lu88 8d, [8m] - load byLe aL memory address ln 8m lnLo 8d S188 8d, [8m] - sLore byLe from 8d lnLo memory address ln 8m Lu8P 8d, [8m] - load halfword aL memory address ln 8m lnLo 8d S18P 8d, [8m] - sLore halfword aL memory address ln 8m lnLo 8d Lu8S8 8d, [8m] - load slgned byLe aL memory address ln 8m lnLo 8d (slgn exLend Lo 32 blLs) Lu8SP 8d, [8m] - load slgned half-word aL memory address ln 8m lnLo 8d (slgn exLend Lo 32 blLs) 144 CLher Mlsc." lnsLrucuons - PlnLs Lu, LuW [<reg>, <lmm>] - reload daLa from memory aL address ln <reg> wlLh oseL of <lmm> Ll [<reg>, <lmm>] - reload lnsLrucuons from memory uM8 - uaLa memory barrler ensures order of memory operauons uS8 - uaLa Synchronlzauon barrler ensures compleuon of memory access operauon lS8 -lnsLrucuon Synchronlzauon barrler ushes plpellne 143 More Mlsc. lnsLrucuons SL1Lnu 8L/LL - SeLs Lhe endlanness Lo 8lg Lndlan or Llule Lndlan for memory access (only applles Lo daLa) S8SuA|u8|lA|l8 - Save 8eLurn SLaLe saves Lhe L8 and SS8 of one mode lnLo Lhe sLack polnLer of anoLher mode 146 8anked 8eglsLers r0 r1 r2 r3 r4 r3 r6 r7 r8 r9 r10 r11 (l) r12 (l) r13 (S) r14 (L8) CS8 r13 (C) r8 r9 r10 r11 r12 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 r13 (S) r14 (L8) SS8 8anked reglsLers are preserved across mode changes. 147 ls umlng lmporLanL? Source: hup://xkcd.com/612/ 148 8x-A9 Memory Map 149 Source: hup://lnfocenLer.arm.com/help/lndex.[sp?Loplc=/com.arm.doc.dul0440b/8ba[lhec.hLml WaLchdog umer usually used ln embedded sysLems scenarlos A hardware umer LhaL hard reseLs Lhe sysLem when lL reaches zero up Lo sysLem deslgner Lo make sure counLer does noL reach zero 1lmer accesslble Lhrough reglsLer 8eseL crlucal code secuons where deadlocks can occur Source: hup://www.eeumes.com/dlscusslon/beglnner-s-corner/4023849/lnLroducuon-Lo- WaLchdog-1lmers 130 lnLerrupLs & WaLchdog umers ls lL worLh lL? MeanL for malnly 81CS Pelps recover from lnconslsLenL sLaLe Powever sysLem deslgner has Lo speclfy conslsLenL sLaLe" Source: hup://caLless.ncl.ac.uk/8lsks/19.49.hLml 131 lnLerrupLs lnLroducuon lnLerrupLs can be synchronous (soware generaLed) can be asynchronous (hardware generaLed) LlLerally lnLerrupL Lhe conLrol ow of Lhe program CeneraLed when SysLem power o/reseL undened lnsLrucuon non-allgned memory access non-readable memory access age faulLs . 132 lnLerrupL handlers 8eferred Lo as lS8 or lnLerrupL Servlce 8ouune use masks ln reglsLers Lo enable/dlsable lnLerrupLs Secuon ln memory LhaL has addresses Lo lS8s called an lnLerrupL vecLor Lable (usually locaLed aL 0x00000000) Wlre Lhe handler by wrlung code dlrecLly aL locauon ln memory or roll your own lookup Lable code and lnserL lnLo vecLor Lable 133 lnLerrupL Wlrlng R6*.8>75 -48. K7<. \.*-7) "<<).99 &),7),-4 8eseL Supervlsor 0x00000000 1 (hlghesL) uaLa AborL AborL 0x00000010 2 llC (lasL lnLerrupL) llC 0x0000001C 3 l8C (normal lnLerrupL) l8C 0x00000018 4 refeLch AborL AborL 0x0000000C 3 Sofware lnLerrupL (SWl/SvC) Supervlsor 0x00000008 6 undened lnsLrucuon undened 0x00000004 6 (lowesL) 134 lnLerrupL vecLor Lable unuLllnLu SWl 8LlL1CP A8C81 uA1A A8C81 8LSL8vLu l8C llC 8LSL1 0x00 0x04 0x08 0x0C 0x10 0x14 0x18 0x1C Lu8 C, C, 100 SWl Pandler Code here. SWl Pandler 0x6C 0x70 133 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
8
7
l 6
l 3
1 4
3
M 2
C 1
u 0
L I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits XK7<. LYZFM K7<. 10000 user 10001 llC 10010 l8C 10011 SvC (Supervlsor) 10111 AborL 11011 undened 11111 SysLem 136 lnLerrupL handlers ll When an excepuon occurs, processor Coples CS8 lnLo SS8<mode> Changes CS8 blLs Lo reecL new mode, and (A8M/1humb) sLaLe ulsables furLher lnLerrupLs lf approprlaLe SLores C + 4 (A8M) or C + 2 (1humb) ln L8<mode> SeLs C Lo address from vecLor Lable correspondlng Lo excepuon When reLurnlng from an lS8 SysLem developer needs Lo resLore CS8 from SS8<mode> 8esLore C from L8<mode> 8oLh can be done ln one lnsLrucuon MCvS C, L8 137 lnLerrupL handlers lll When l8C excepuon occurs, only l8Cs are dlsabled When llC excepuon occurs, boLh l8Cs and llCs are dlsabled Cenerally each excepuon mode's L8 has prevlous C + 4 (excepL for uaLa aborL excepuon) uaLa aborL excepuon mode's L8 has prevlous C + 8 (A8M & 1humb) Source: hup://www.csle.ncLu.edu.Lw/~w[Lsal/LmbeddedSysLemueslgn/Ch3-1.pdf 138 Sample l8C Pandler l8CPandler (A8M mode): STMFD sp!, {r0-r12,lr} BL ISR_IRQ @ Go to second level IRQ handler SUB lr, lr, #4 LDMFD sp!, {r0-r12,lr}^ SUBS pc, lr, #4 139 Sample llC Pandler llC Pandler SUB lr, lr, #4 STMFD sp!, {r0-r7,lr} @ Renable any interrupts needed here MRS R0, CPSR CMP R1, #0x00000012 ; Test for IRQ mode BICNE R0, R0, #0x80 @ Optionally renable IRQs here @ Handle FIQ event here LDMFD sp!, {r0-r7,lr}^ SUBS pc, lr, #4 160 SWl (Soware lnLerrupL) handler wlrlng MosL hardware dene vecLor Lables lndexed by excepuon Lype. SWl handler address usually aL 0x08 As was seen earller, Llnux syscalls use SWl SWl encodlng allows for 24-blL commenL, whlch ls generally lgnored Can be used for dlerenuaung b/w Lypes of SWl
void C_SWI_Handler(int swi_num, !) { switch(swi_num) { case 0x00: service_SWI1(); case 0x01: service_SWI2(); ! } } SWl lnsLrucuon ls sLored ln L8<Mode>
Lncoded wlLh Lhe 24-blL value
Mask LhaL 24-blL value lnLo r0 8ranch Lo SWl Pandler
8un Lhe approprlaLe handler based on LhaL value 162 Lab 3 lnLerrupLs lab Lmulaung a serlal drlver uslng uA81 ln order Lo see someLhlng lnLeresung ln Lhls lab, we Lake Lhe lnpuL characLer and add 1 Lo lL Modlfy lnLer.c and vecLors.S les Add one or more llnes where lL says / Auu CCuL PL8L / 163 lnLer.c void __attribute__((interrupt)) irq_handler() { /* echo the received character + 1 */ UART0_DR = UART0_DR + 1; } 164 vecLors.S reset_handler: /* set Supervisor stack */ LDR sp, =stack_top /* copy vector table to address 0 */ BL copy_vectors /* get Program Status Register */ MRS r0, cpsr /* go in IRQ mode */ BIC r1, r0, #0x1F ORR r1, r1, #0x12 MSR cpsr, r1 /* set IRQ stack */ LDR sp, =irq_stack_top /* Enable IRQs */ BIC r0, r0, #0x80 /* go back in Supervisor mode */ MSR cpsr, r0 /* jump to main */ BL main B . 163 CurrenL rogram SLaLus 8eglsLer 3 1
n 3 0
Z 2 9
C 2 8
v 2 7
C 2 6
2 3
2 4
2 3
2 2
2 1
2 0
1 9
1 8
1 7
1 6
1 3
1 4
1 3
1 2
1 1
1 0
9
8
7
l 6
l 3
1 4
3
M 2
C 1
u 0
L I 1: Disable IRQ mode F 1: Disable FIQ mode T 0: ARM state 1: Thumb state _MODE Mode bits XK7<. LYZFM K7<. 10000 user 10001 llC 10010 l8C 10011 SvC (Supervlsor) 10111 AborL 11011 undened 11111 SysLem 166 A8M LLl lormaL RQ0 ].1<.) .lnlL .LexL .rodaLa .daLa .bss .symLab .rel.LexL .rel.daLa .debug .llne .sLrLab O.*>75 +.1<.) -1:3. 8ead-only Code segmenL 8ead/wrlLe uaLa segmenL Symbol Lable and debugglng lnfo nC1 loaded lnLo memory 167 A8M LLl lormaL .LexL - has your code .rodaLa - has consLanLs and read-only daLa .daLa - has your global and sLauc varlables .bss - conLalns unlnluallzed varlables Peap sLarLs aer .bss secuon ln memory grows Lowards lncreaslng memory SLack sLarLs aL Lhe opposlLe end and grows Loward heap 168 A8M LLl lormaL O.*>75 !.9*),8>75 .LexL rogram lnsLrucuons and daLa .rodaLa 8ead-only daLa llke formaL sLrlngs for prlnu .daLa lnluallzed global daLa .bss un-lnluallzed global daLa .symLab 1hls secuon has Lhe symbol lnformauon such as global varlables and funcuons .rel.LexL LlsL of locauons ln Lhe .LexL LhaL llnker needs Lo deLermlne when comblnlng .o les .rel.daLa 8elocauon lnformauon for global varlables .debug uebugglng lnformauons (such as Lhe one Lurned on wlLh gcc -g) .llne Mapplng b/w llne numbers ln C program and machlne code (debug) .sLrLab SLrlng Lable for symbols ln .symLab and .debug 169 Pow Lo perform a conLrol hl[ack We can wrlLe Lo Lhe S glven a vulnerable funcuon (sLrcpy or memcpy wlLh no bounds check lnLo local varlable) A1CS as we saw requlres args Lo be passed ln Lhrough 80-83 lor popplng a shell we can make a sysLem() wlLh argumenLs conLalnlng sLrlng /bln/sh" 170 A8M now execuung rsL lnsLrucuon ln one() . maln() frame" Cne() frame" undened J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
Local varlables Caller-save reglsLers Args Lo Cne() Callee-save reglsLers are pushed onLo sLack uslng S1Mlu sp, reglsLers along wlLh 814 (L8)
And 811/87/83(l) can also be updaLed relauve Lo (813)S Callee-save reglsLers L8 = Cmaln 171 lLzhak Avraham's approach use a reLurn Lo llbc sLyle meLhod We can overwrlLe L8 ln sLack 8eLurn Lo a funcuon LhaL conLalns lnsLrucuons Lo pop values from sLack lnLo 80 (conLalnlng our /bln/sh" address) and anoLher L8 ln sLack polnung Lo sysLem() 1he above funcuon LhaL conLalns Lhls code for us ls erand48() 172 Source: hups://medla.blackhaL.com/bh-dc-11/Avraham/8lackPaLuC2011Avraham-opplngAndrolduevlces-Slldes.pdf SLack J 5 * ) . 1 9 , 5 ?
K . 2 7 ) 4
olnL Lo erand48()+x 80: olnL Lo /bln/sh Lu8u 80, 81 Su8 S, 12 C C, L8 81: Can be [unk olnL Lo sysLem() !unk value uL /bln/sh" sLrlng here erand48()+x: 8uf[3] 8uf[3] 8eglsLer val Callee saved reglsLer(s) L8 80 for sysLem() 81 !unk &sysLem() sLrlng 173 Lab 4 ConLrol ow hl[ack lab Cb[ecuve: CeL a shell uslng reLurn Lo llbc sLyle auack lLzhak Avraham's paper lncluded CLher useful llnks: hup://research.shell-sLorm.org/les/research-4- en.php 174 Lab 4 noLes lMC81An1: echo 0 > /proc/sys/kernel/randomlzevaspace ln gdb you can breakpolnL and run p sLr // CeLs address of /bln/sh sLrlng p erand48 // CeLs address of erand48 meLhod p sysLem // CeLs address of Lhe sysLem meLhod 8emember Lo add 1 Lo Lhe erand48 address (Lhumb2 lnsLrucuon seL requlres LS8 Lo be 1) 1o verlfy run x/s <enLer address from prevlous> 173 Lab 4 noLes conLd. 1o cra your explolL sLrlng run: perl -e 'prlnL A8Cu"x3 . xA8xCuxuLxLl" . LlCP"' > soluuon gdb ./boverow b sLage1" or whaLever ls ln your lnlL commands le run caL soluuon 176 osslble Soluuon My erand48+x locaLed aL 0x76l28L36 + 1 My sysLem locaLed aL 0x76l2u768 +1 My /bln/sh" passed ln Lhrough sLrlng locaLed aL 0x7Llll6L8 As per Lhe sLack dlagram l need A8Cu"x3 + 0x378Ll276 + 0xL8l6ll7L + LlCP" + l!kL" + 0x69u7l276 + /bln/sh" 177 !"# C &"'( %^_ 178 Code Cpumlzauon Ck we can wrlLe assembly and C programs Powever, do we know whaL really happens Lo LhaL C program once we glve lL Lo Lhe compller? We assume cerLaln Lhlngs happen for us lor example, dead code ls removed Powever wlLh greaL compllers comes greaL responslblllLy. 179 CCC Cpumlzauons Can opumlze for code slze, memory usage usually compller knows besL, however can also be nC1 whaL sysLem deslgner has ln mlnd. We can help compller declde lor more evll C, checkouL hup://www.sLelke.com/code/useless/evll-c/ int func1(int *a, int *b) { *a += *b; *a += *b; } Source: 8ryan, 8., C'Pallaron, u. CompuLer SysLems: A rogrammer's erspecuve" int func2(int *a, int *b) { *a += ((*b)<<1); } 180 CCC opumlzauons 2 Common sub-expresslon ellmlnauon uead code removal use lfdefs helps compller ellmlnaLe dead code lnducuon varlables & SLrengLh reducuon Loop unrolllng lncreases code slze, buL reduces Lhe number of branches luncuon lnllnlng Agaln can reduce number of branches ln C code, add lnllne before funcuon spec
181 Source: hup://gcc.gnu.org/onllnedocs/ A8M speclc opumlzauons use of consLanLs uslng barrel shler: lnsLead of 3x, use (x<<2) + x use of condluonal execuuon Lo reduce code slze and execuuon cycles CounL down loops Counung upwards produces Auu, CM and 8x lnsLrucuons Counung downwards produces Su8S & 8CL use 32-blL daLa Lypes as much as posslble Avold dlvlslons or remalnder operauon () 8eglsLer accesses more eclenL Lhan memory accesses Avold reglsLer spllllng (more parameLers Lhan reglsLers end up ln memory on sLack) use pure funcuons when posslble and only lf Lhey do noL have slde eecLs 182 A8M speclc opumlzauon: CounL down loops int checksum(int *data) { unsigned i; int sum = 0;
for(i=0; i<64; i++) sum += *data++;
return sum; }
int checksum(int *data) { unsigned i; int sum = 0;
for(i=63; i>=0; i--) sum += *data++;
return sum; } MOV r2, r0 ; r2=data MOV r0 #0 ; sum=0 MOV r2, r0 ; r2=data r0, MOV r1, #0; i=0 L1 LDR r3,[r2],#4 ; r3=*(data++) ADD r1, r1, #1 ; i=i+1 CMP r1, 0x40 ; cmp r1, 64 ADD r0, r3, r0 ; sum +=r3 BCC L1 ; if i < 64, goto L1 MOV pc, lr ; return sum MOV r2, r0 ; r2=data MOV r0, #0 ; sum=0 MOV r1, #0x3f ; i=63 L1 LDR r3,[r2],#4 ; r3=*(data++) ADD r0, r3, r0 ; sum +=r3 SUBS r1, r1, #1 ; i--, set flags BGE L1 ; if i >= 0, goto L1 MOV pc, lr ; return sum 183 A8M speclc opumlzauon: 32-blL daLa Lypes void t3(void) { char c; int x=0; for(c=0;c<63;c++) x++; }
void t4(void) { int c; int x=0; for(c=0;c<63;c++) x++; } MOV r0,#0 ; x=0 MOV r1,#0 ; c=0 L1 CMP r1,#0x3f ; cmp c with 63 BCS L2 ; if c>= 63, goto L2 ADD r0,r0,#1 ; x++; ADD r1,r1,#1 ; c++ AND r1,r1,#0xff ; c=(char) r1 B L1 ; branch to L1 L2 MOV pc,r14 184 A8M speclc opumlzauon: funcuon calls void test(int x) { return(square(x*x) + square(x*x)); } void test(int x) { return(2*square(x*x)); } The following case shows square() has a side effect: int square(int x) { counter++; /* counter is a global variable */ return(x*x); }
If no side effect, declare as pure function for compiler to optimize __pure int square(int x);
183 A8M speclc opumlzauon: code allgnmenL SLrucLure/Code allgnmenL 12 byLes vs. 8 byLes Could use packed keyword Lo remove paddlng Powever A8M emulaLes unallgned load/sLore by uslng several allgned byLe access (lneclenL) struct { char a; int b; char c; short d; } struct { char a; char c; short d; int b; } 186 !"# C &"'( C 187 Wrlung assembly ln whaLever your edlLor may be. Source: hup://xkcd.com/378/ 188 lnllne assembly (uslng buuerles) lollows Lhe followlng form: asm(code : output operand list : input operand list: clobber list); 1he lnpuL/ouLpuL operand llsL lncludes c and assembly varlables Lxample: /* Rotating bits example */ asm("mov %[result], %[value], ror #1" : [result] "=r" (y) : [value] "r" (x)); =r" r ls referred Lo as a consLralnL = ls referred Lo as a modler Source: hup://www.eLhernuL.de/en/documenLs/arm-lnllne-asm.hLml 189 osslble consLralnLs for lnllne assembly D759-)1,5- `91?. ,5 "'K 9-1-. `91?. ,5 (+/2: 9-1-. F Floating point registers f0..f7 Not Available H Not Available Registers r8..r15 G Immediate floating point constant Not available H Same a G, but negated Not available I Immediate value in data processing instructions e.g. ORR R0, R0, #operand Constant in the range 0 .. 255 e.g. SWI operand J Indexing constants -4095 .. 4095 e.g. LDR R1, [PC, #operand] Constant in the range -255 .. -1 e.g. SUB R0, R0, #operand K Same as I, but inverted Same as I, but shifted L Same as I, but negated Constant in the range -7 .. 7 e.g. SUB R0, R1, #operand l Same as r Registers r0..r7 e.g. PUSH operand M Constant in the range of 0 .. 32 or a power of 2 e.g. MOV R2, R1, ROR #operand Constant that is a multiple of 4 in the range of 0 .. 1020 e.g. ADD R0, SP, #operand m Any valid memory address N Not available Constant in the range of 0 .. 31 e.g. LSL R0, R1, #operand O Not available Constant that is a multiple of 4 in the range of -508 .. 508 e.g. ADD SP, #operand r General register r0 .. r15 e.g. SUB operand1, operand2, operand3 Not available w Vector floating point registers s0 .. s31 Not available X Any operand Source: hup://www.eLhernuL.de/en/documenLs/arm-lnllne-asm.hLml 190 Modlers = ls wrlLe-only operand, usually for all ouLpuL operands + ls read-wrlLe operand, musL be llsLed as an ouLpuL operand & ls a reglsLer LhaL should be used for ouLpuL only Source: hup://www.eLhernuL.de/en/documenLs/arm-lnllne-asm.hLml 191 Lxample 6.c 0000838c <main>: 838c: b590 push {r4, r7, lr} 838e: b085 sub sp, #20 8390: af00 add r7, sp, #0 8392: f04f 0306 mov.w r3, #6 8396: 60fb str r3, [r7, #12] 8398: f3ef 8400 mrs r4, CPSR 839c: 60bc str r4, [r7, #8] 839e: 68fa ldr r2, [r7, #12] 83a0: f243 535d movw r3, #13661 ; 0x355d 83a4: f6cf 73fd movt r3, #65533 ; 0xfffd 83a8: 18d3 adds r3, r2, r3 83aa: 607b str r3, [r7, #4] 83ac: f3ef 8400 mrs r4, CPSR 83b0: 603c str r4, [r7, #0] 83b2: f248 4344 movw r3, #33860 ; 0x8444 83b6: f2c0 0300 movt r3, #0 83ba: 4618 mov r0, r3 83bc: 6879 ldr r1, [r7, #4] ... int main(void) { int a, b; a = 6; asm(mrs %[result], apsr: [result] =r (x) : ); b = a - 182947; asm(mrs %[result], apsr: [result] =r (y) : ); printf("a's negatory is %d\n", b);
return 0; } Before the subtraction operation
APSR = 0x60000010
After the subtraction operation
APSR = 0x80000010 192 Wrlung C funcuons ln assembly ln C le, say lL ls called lsawesome.c, declare Lhe funcuon: extern int mywork(int arg1, char arg2, !); ln assembly lnclude .syntax unified @ For UAL .arch armv7-a .text .align 2 .thumb .thumb_func .global mywork .type mywork, function @ CODE HERE .size mywork, .-mywork .end ln make le use gcc -c -o mywork.o mywork.s llnally gcc -o awesomeprogram mywork.o lsawesome.o Source: hup://omappedla.org/wlkl/WrlungA8MAssembly 193 LvenL handllng WlL - WalL for LvenL, wakes up when elLher of followlng happens: SLv ls called A physlcal l8C lnLerrupL A physlcal llC lnLerrupL A physlcal asynchronous aborL SLv - Send LvenL See 8 1.8.13 ln manual for more deLalls used wlLh spln-locks 194 Lxcluslve lnsLrucuons Lu8Lx8|u|P <reg1> <8m> Load excluslve from 8m lnLo <reg1> S18Lx8|u|P <reg1> <reg2> <8m> SLore excluslve from <reg2> lnLo <8m> and wrlLe Lo <reg1> wlLh 0 lf successful or 1 lf unsuccessful 8oLh lnLroduced slnce A8Mv6 SW & SW8 - used on A8Mv6 and earller now deprecaLed lL ls read-locked-wrlLe Powever does noL allow for operauons beLween Lhe read lock and wrlLe AL LhaL polnL you use Lu8Lx/S18Lx 193 Lxcluslve lnsLrucuons conLd. no memory references allowed beLween Lu8Lx and S18Lx lnsLrucuons Powever aer sLarung excluslve access uslng Lu8Lx, can dlsengage uslng CL8Lx lnsLrucuon use of uM8 (uaLa Memory 8arrler) ln beLween excluslve accesses Lnsures correcL orderlng of memory accesses Lnsures all expllclL memory accesses nlsh or compleLe before expllclL memory access aer Lhe uM8 lnsLrucuon 196 Lab 3 ALomlc lab lmplemenL a slmple muLex ln assembly wlLh Lhreads ln C Clven code LhaL uses llbpLhread Lo do Lhreadlng CreaLes Lwo Lhreads whlch use dosomeLhlng() Lo do work
197 Lab 3 seudocode for muLexlock: Load locked value lnLo a Lemp reglsLer Loop: Lu8Lx from [r0] and *7281). Lo unlocked value lf [r0] conLenLs have Lhe unlocked value S18Lx value ln Lemp varlable lnLo [r0] lf noL successful goLo loop 1o load locked value, you can use ldr r2, =locked seudocode for MuLex unlock Load =unlocked value lnLo a Lemp reglsLer SLore value from Lemp reglsLer lnLo [r0] 198 osslble soluuon .equ locked, 1 .equ unlocked, 0
.global unlock_mutex .type unlock_mutex, function unlock_mutex: ldr r1, =unlocked str r1, [r0] bx lr .size unlock_mutex, .-unlock_mutex 199 Assembly on lhone lor lhone: Can use lnllne assembly as we saw above ln Cb[ecuve-C code lnclude Lhe assembly source le ln xCode Pave noL experlmenLed wlLh xcode and assembly lhone A8l Llnk: hup://developer.apple.com/llbrary/los/ documenLauon/xcode/ConcepLual/ lhoneCSA8l8eference/lhoneCSA8l8eference.pdf 200 Source: Assembly on Androld lor Androld: need Lo use Androld nauve uevelopmenL klL (nuk) WrlLe a sLub code ln C LhaL calls assembly meLhod and uses !nl Lypes WrlLe a make le or copy a LemplaLe and lnclude Lhe new assembly le and Lhe sLub-code C le use nuk Lool ndk-bulld Lo bulld ln Androld appllcauon declare Lhe meLhod uslng same slgnaLure uslng !ava Lypes and mark as 51>W. publlc 51>W. lnL myasmfunc(lnL param1) Also load Lhe assembly [nl-llbrary SysLem.loadllbrary(llbrary-name-here") 201 Source: hup://www.eggwall.com/2011/09/androld-arm-assembly-calllng-assembly.hLml Summary We covered: Pow booL ls handled on A8M plauorms Some mechanlcs of A8M assembly and how Lo debug lL uslng Cu8 Pow programs are converLed Lo assembly and run lncludlng A1CS along wlLh conLrol ow hl[ack vulnerablllues CLher feaLures of A8M plauorms lncludlng lnLerrupLs and aLomlc lnsLrucuons Pow Lo wrlLe lnllne assembly ln C and how Lo wrlLe C funcuons ln assembly (for use ln C source)