[go: up one dir, main page]

0% found this document useful (0 votes)
4 views65 pages

02 Architecture

The lecture by Professor Ken Birman covers the evolution and architecture of modern computers, emphasizing the transition from single-core to multicore and NUMA architectures. It discusses the complexities of modern systems, the role of operating systems in managing hardware, and the impact of parallelism on software design. Key topics include the history of Intel processors, machine programming basics, and the importance of understanding architecture for effective programming.

Uploaded by

Mubarek Ramazan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views65 pages

02 Architecture

The lecture by Professor Ken Birman covers the evolution and architecture of modern computers, emphasizing the transition from single-core to multicore and NUMA architectures. It discusses the complexities of modern systems, the role of operating systems in managing hardware, and the impact of parallelism on software design. Key topics include the history of Intel processors, machine programming basics, and the importance of understanding architecture for effective programming.

Uploaded by

Mubarek Ramazan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

THE EVOLUTION AND ARCHITECTURE Professor Ken Birman

OF MODERN COMPUTERS CS4414 Lecture 2

CORNELL CS4414 - FALL 2021. 1


IDEA MAP FOR TODAY

Computers are multicore


Individual CPUs don’t make this NUMA Compiled languages are
NUMA machines capable
dimension obvious. The whole idea is translated to machine language.
of many forms of parallelism.
that if you don’t want to know, you can Understanding this mapping will allow us to
They are extremely complex
ignore the presence of parallelism make far more effective use of the machine.
and sophisticated.

CORNELL CS4414 - FALL 2021. 2


WHAT’S INSIDE? ARCHITECTURE = COMPONENTS
OF A COMPUTER + OPERATING SYSTEM
Registers Registers
CPU (L1 cache) CPU (L1 cache)

L2 Cache L2 Cache
Core A BIGL3PILE
Cache
OF Core
HARDWARE
Memory Bus
REQUIRING A(DRAM)
Memory Unit LOT OF
HIGHLY SKILLED
SSD CARE AND FEEDING! 100G
storage Ethernet

PCIe Bus
CORNELL CS4414 - FALL 2021. 3
WHAT’S INSIDE? ARCHITECTURE = COMPONENTS
OF A COMPUTER + OPERATING SYSTEM
Registers Registers
CPU (L1 cache) CPU (L1 cache)

L2 Cache L2 Cache
Core L3 Cache Core
Memory Bus
Memory Unit (DRAM)

SSD 100G
storage Ethernet

PCIe Bus
CORNELL CS4414 - FALL 2021. 4
WHAT’S INSIDE? ARCHITECTURE = COMPONENTS
OF A COMPUTER + OPERATING SYSTEM

Operating System Process you


launched by
File System running some
program
Network
Bash shell

Job of the operating system (e.g. Linux) is to manage the


hardware and offer easily used, efficient abstractions that hide
details where feasible
CORNELL CS4414 - FALL 2021. 5
ARCHITECTURES ARE CHANGING RAPIDLY!
As an undergraduate (in the late 1970’s) I programmed a DEC
PDP 11/70 computer:
 A CPU (~1/2 MIPS), main memory (4MB)
 A storage device (8MB rotational magnetic disk), tape drive
 I/O devices (mostly a keyboard with a printer).

At that time this cost about $100,000


CORNELL CS4414 - FALL 2021. 6
ARCHITECTURES ARE CHANGING RAPIDLY!
Bill Gates:
“640K ought to be
As an undergraduate (in the late 1970’s) I programmed a DEC
enough computer:
PDP 11/70 for anybody.”
 A CPU (~1/2 MIPS), main memory (4MB)
 A storage device (8MB rotational magnetic disk), tape drive
 I/O devices (mostly a keyboard with a printer).

At that time this cost about $100,000


CORNELL CS4414 - FALL 2021. 7
TODAY: MACHINE PROGRAMMING I: BASICS
History of Intel processors and architectures
Assembly Basics: Registers, operands, move
Arithmetic & logical operations
C/C++, assembly, machine code

CORNELL CS4414 - FALL 2021. 8


MODERN COMPUTER: DELL R-740: $2,600
2 Intel Xenon chips with 28 “hyperthreaded” cores running at 1GIPS
(clock rate is 3Ghz)

Up to 3 TB of memory, multiple levels of memory caches

All sorts of devices accessible directly or over the network

NVIDIA Tesla T4 GPU: adds $6,000, peaks at 269 TFLOPS

CORNELL CS4414 - FALL 2021. 9


One CPU core actually
MODERN COMPUTER: DELL R-740: $2,600
runs two programs at
the same time
2 Intel Xenon chips with 28 “hyperthreaded” cores running at 1GIPS
(clock rate is 3Ghz)

Up to 3 TB of memory, multiple levels of memory caches

All sorts of devices accessible directly or over the network

NVIDIA Tesla T4 GPU: adds $6,000, peaks at 269 TFLOPS

CORNELL CS4414 - FALL 2021. 10


INTEL XENON NVIDIA TESLA

The GPU has so many cores that a photo of the chip is


pointless. Instead they draw graphics like these to help
you visualize ways of using hundreds of cores to process
a tensor (the “block” in the middle) in parallel!

Each core is like a little computer, talking to the others


over an on-chip network (the CMS)
CORNELL CS4414 - FALL 2021. 11
HOW DID WE GET HERE?
In the early years of computing, we went from machines built from
distinct electronic components (earliest generations) to ones built
from integrated circuits with everything on one chip.

Quickly, people noticed that each new generation of computer


had roughly double the capacity of the previous one and could run
roughly twice as fast! Gordon Moore proposed this as a “law”.

CORNELL CS4414 - FALL 2021. 12


BUT BY 2006 MOORE’S LAW
SEEMED TO BE ENDING

CORNELL CS4414 - FALL 2021. 13


WHAT ENDED MOORE’S LAW?
To run a chip at higher and higher speeds, we If you overclock your
use a faster clock rate and keep more of the desktop this can happen…
circuitry busy.

Computing is a form of “work” and work generates heat… as


roughly the square of the clock rate.

Chips began to fail. Some would (literally) melt or catch fire!


CORNELL CS4414 - FALL 2021. 14
BUT PARALLELISM SAVED US!
A new generation of computers emerged in which we ran the
clocks at a somewhat lower speed (usually around 2 GHz, which
corresponds to about 1 billion instructions per second), but had
many CPUs in each computer.

A computer needs to have nearby memory, but applications


needed access to “all” the memory. This leads to what we call a
“non-uniform memory access behavior”: NUMA.
CORNELL CS4414 - FALL 2021. 15
MOORE’S LAW WITH NUMA

Graph from prior slide

CORNELL CS4414 - FALL 2021. 16


… MAKING MODERN MACHINES COMPLICATED!

Prior to 2006, a good program


 Used the best algorithm: computational complexity, elegance
 Implemented it in a language like C++ that offers efficiency
 Ran on one machine

But the past decade has been disruptive! Suddenly even a single
computer might have the ability to do hundreds of parallel tasks!
CORNELL CS4414 - FALL 2021. 17
THE HARDWARE SHAPES THE
APPLICATION DESIGN PROCESS

We need to ask how a NUMA architecture impacts our designs.

If not all variables are equally fast to access, how can we


“code” to achieve the fastest solution?

And how do we keep all of this hardware “optimally busy”?


CORNELL CS4414 - FALL 2021. 18
DEFINITIONS OF TERMS WE OFTEN USE
Architecture: (also ISA: instruction set architecture)
The parts of a processor design that one needs to understand for
writing correct machine/assembly code
 Examples: instruction set specification, registers
 Machine Code: Byte-level programs a processor executes
 Assembly Code: Readable text representation of machine code

CORNELL CS4414 - FALL 2021. 19


DEFINITIONS OF TERMS WE OFTEN USE
Microarchitecture: “drill down”.
Details or implementation of the architecture
 Examples: memory or cache sizes, clock speed (frequency)

Example ISAs:
 Intel: x86, IA32, Itanium, x86-64
 ARM: Used in almost all mobile phones
 RISC V: New open-source ISA
CORNELL CS4414 - FALL 2021. 20
TODAY: MACHINE PROGRAMMING I: BASICS
History of Intel processors and architectures
Assembly Basics: Registers, operands, move
Arithmetic & logical operations
C/C++, assembly, machine code

CORNELL CS4414 - FALL 2021. 21


HOW A SINGLE THREAD COMPUTES
Common way to
depict a single thread

In CS4414 we think of each computation in terms of a “thread”

A thread is a pointer into the program instructions. The CPU


loads the instruction that the “PC” points to, fetches any operands
from memory, does the action, saves the results back to memory.

Then the PC is incremented to point to the next instruction


CORNELL CS4414 - FALL 2021. 22
ASSEMBLY/MACHINE
CODE VIEW
Programmer-Visible State Memory
 PC: Program counter Byte addressable array
 Address of next instruction Code and user data
 Called “RIP” (x86-64) Stack to support procedures

 Register file
 Heavily used program data Puzzle:
 Condition codes  On a NUMA machine, a CPU is near a fast
 Store status information about most recent memory but can access all memory.
arithmetic or logical operation  How does this impact software design?
 Used for conditional branching

CORNELL CS4414 - FALL 2021. 23


ASSEMBLY/MACHINE
Example: With 6 on-board DRAM modules and 12 NUMA CPUs, each pair of
CODE VIEW
CPUs has one nearby DRAM module. Memory in that range of addresses will be
very fast. The other 5 DRAM modules are further away. Data in those address This memory is
ranges is visible and everything looks identical, but access is slower! slower to access!
Programmer-Visible State Memory
 PC: Program counter Byte addressable array Same with this one…
 Address of next instruction Code and user data
 Called “RIP” (x86-64) Stack to support procedures …

 Register file
 Heavily used program data Puzzle: …

 Condition codes  On a NUMA machine, a CPU is near a fast


 Store status information about most recent memory but can access all memory.

arithmetic or logical operation  How does this impact software design?
 Used for conditional branching

CORNELL CS4414 - FALL 2021. 24


LINUX TRIES TO HIDE MEMORY DELAYS
If it runs thread t on core k, Linux tries to allocate memory for t
(stack, malloc…) in the DRAM close to that k.

Yet all memory operations work identically even if the thread is


actually accessing some other DRAM. They are just slower.

Linux doesn’t even tell you which parts of your address space are
mapped to which DRAM units.
CORNELL CS4414 - FALL 2021. 25
(We’ll cover what we
MACHINE LANGUAGE can but probably
won’t have time for
all of this)
CORNELL CS4414 - FALL 2021. 26
THE HARDWARE UNDERSTANDS “PRIMITIVE”
DATA TYPES
“Integer” data of 1, 2, 4, or 8 bytes (SIMD vector data types of 8, 16, 32
Data values or 64 bytes)
Addresses (untyped pointers)

No aggregate types such as arrays or


Floating point data of 4, 8, or 10 structures
bytes (new: 4-bit, 8-bit, 16-bit)  Just contiguously allocated bytes in memory
 Example: Raw images are arrays in a
format defined by the camera or video,
such as RGB, jpeg, mpeg. The camera
Code: Byte sequences encoding understands the format. The host computer
series of instructions the camera is attached to just sees bytes

CORNELL CS4414 - FALL 2021. 27


THE HARDWARE UNDERSTANDS “PRIMITIVE”
DATA TYPES
“Integer” data of 1, 2, 4, or 8 bytes (SIMD vector data types of 8, 16, 32
Data values or 64 bytes)
Addresses (untyped pointers)

No aggregate types such as arrays or


Floating point data of 4, 8, or 10 structures
bytes (new: 4-bit, 8-bit, 16-bit)  Just contiguously allocated bytes in memory
 Example: Raw images are arrays in a
format defined by the camera or video,
such as RGB, jpeg, mpeg. The camera
Code: Byte sequences encoding understands the format. The host computer
series of instructions the camera is attached to just sees bytes

CORNELL CS4414 - FALL 2021. 28


X86-64 INTEGER REGISTERS

Can reference low-order 4 bytes (also low-order 1 & 2 bytes)


Not part of memory (or cache)
CORNELL CS4414 - FALL 2021. 29
SOME HISTORY: IA32 REGISTERS

CORNELL CS4414 - FALL 2021. 30


ASSEMBLY CHARACTERISTICS: OPERATIONS
Transfer data between memory and register
Load data from memory into register
Store register data into memory

Perform arithmetic function on register or memory data

Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
CORNELL CS4414 - FALL 2021. 31
Carnegie Mellon

Moving Data %rax


 Moving Data %rcx
movq Source, Dest %rdx
 Operand Types %rbx
 Immediate: Constant integer data %rsi
Example: $0x400, $-533
 %rdi
 Like C constant, but prefixed with ‘$’
%rsp
 Encoded with 1, 2, or 4 bytes
 Register: One of 16 integer registers
%rbp
 Example: %rax, %r13
%rN
 But %rsp reserved for special use
 Others have special uses for particular instructions
 Memory: 8 consecutive bytes of memory at address given by register
 Simplest example: (%rax)
Warning: Intel docs use
 Various other “addressing modes”
mov Dest, Source
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32
Carnegie Mellon

movq Operand Combinations

Source Dest Src,Dest C/C++ Analog

Reg movq $0x4,%rax temp = 0x4;


Imm
Mem movq $-147,(%rax) *p = -147;

movq Reg movq %rax,%rdx temp2 = temp1;


Reg
Mem movq %rax,(%rdx) *p = temp;

Mem Reg movq (%rax),%rdx temp = *p;

Cannot do memory-memory transfer with a single instruction


Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33
Carnegie Mellon

Simple Memory Addressing Modes


 Normal (R) Mem[Reg[R]]
 Register R specifies memory address
 Aha! Pointer dereferencing in C

movq (%rcx),%rax

 Displacement D(R) Mem[Reg[R]+D]


 Register R specifies start of memory region
 Constant displacement D specifies offset

movq 8(%rbp),%rdx

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 34


Carnegie Mellon

Example of Simple Addressing Modes


void
whatAmI(<type> a, <type> b)
{
????
whatAmI:
}
movq (%rdi), %rax
movq (%rsi), %rdx
movq %rdx, (%rdi)
movq %rax, (%rsi)
ret

%rsi
%rdi

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 35


Carnegie Mellon

Example of Simple Addressing Modes

void swap
(long *xp, long *yp)
{ swap:
long t0 = *xp; movq (%rdi), %rax
long t1 = *yp; movq (%rsi), %rdx
*xp = t1; movq %rdx, (%rdi)
*yp = t0; movq %rax, (%rsi)
} ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 36


Carnegie Mellon

Understanding swap()
Memory Addr
void swap Registers
(long *xp, long *yp) xp
{ %rdi
long t0 = *xp;
%rsi
long t1 = *yp;
*xp = t1; %rax
*yp = t0; yp
} %rdx

Register Value
%rdi xp
%rsi yp swap:
%rax t0 movq (%rdi), %rax # t0 = *xp
%rdx t1 movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 37
Carnegie Mellon

Understanding swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 0x108
%rdx 456 0x100

swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 38


Carnegie Mellon

Understanding swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 0x100

swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 39


Carnegie Mellon

Understanding swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 456 0x100

swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 40


Carnegie Mellon

Understanding swap()
Memory
Registers Address
456 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 456 0x100

swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 41


Carnegie Mellon

Understanding swap()
Memory
Registers Address
456 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 123 0x100

swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 42


Carnegie Mellon

Simple Memory Addressing Modes


 Normal (R) Mem[Reg[R]]
 Register R specifies memory address
 Aha! Pointer dereferencing in C

movq (%rcx),%rax

 Displacement D(R) Mem[Reg[R]+D]


 Register R specifies start of memory region
 Constant displacement D specifies offset

movq 8(%rbp),%rdx

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 43


Carnegie Mellon

Complete Memory Addressing Modes


 Most General Form
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
 D: Constant “displacement” 1, 2, or 4 bytes
 Rb: Base register: Any of 16 integer registers
 Ri: Index register: Any, except for %rsp
 S: Scale: 1, 2, 4, or 8 (why these numbers?)

 Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 44


Carnegie Mellon

Address Computation Examples


%rdx 0xf000

%rcx 0x0100

Expression Address Computation Address


0x8(%rdx) 0xf000 + 0x8 0xf008

(%rdx,%rcx) 0xf000 + 0x100 0xf100

(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400

0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 45


Carnegie Mellon

Address Computation Examples


%rdx 0xf000

%rcx 0x0100

Expression Address Computation Address


0x8(%rdx) 0xf000 + 0x8 0xf008

(%rdx,%rcx) 0xf000 + 0x100 0xf100

(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400

0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 46


Carnegie Mellon

Today: Machine Programming I: Basics


 History of Intel processors and architectures
 Assembly Basics: Registers, operands, move
 Arithmetic & logical operations
 C/C++, assembly, machine code

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 47


Carnegie Mellon

Address Computation Instruction


 leaq Src, Dst
 Src is address mode expression
 Set Dst to address denoted by expression

 Uses
 Computing addresses without a memory reference
E.g., translation of p = &x[i];

 Computing arithmetic expressions of the form x + k*y
 k = 1, 2, 4, or 8

 Example
long m12(long x)
{ Converted to ASM by compiler:
return x*12; leaq (%rdi,%rdi,2), %rax # t = x+2*x
} salq $2, %rax # return t<<2

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 48


Carnegie Mellon

Some Arithmetic Operations


 Two Operand Instructions:
Format Computation
addq Src,Dest Dest = Dest + Src
subq Src,Dest Dest = Dest − Src
imulq Src,Dest Dest = Dest * Src
shlq Src,Dest Dest = Dest << Src Synonym: salq
sarq Src,Dest Dest = Dest >> Src Arithmetic
shrq Src,Dest Dest = Dest >> Src Logical
xorq Src,Dest Dest = Dest ^ Src
andq Src,Dest Dest = Dest & Src
orq Src,Dest Dest = Dest | Src
 Watch out for argument order! Src,Dest
(Warning: very old Intel docs use “op Dest,Src”)
 No distinction between signed and unsigned int (why?)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 49
Carnegie Mellon

Some Arithmetic Operations


 One Operand Instructions
incq Dest Dest = Dest + 1
decq Dest Dest = Dest − 1
negq Dest Dest = − Dest
notq Dest Dest = ~Dest

 See book for more instructions

 Depending how you count, there are 2,034 total x86 instructions

 (If you count all addr modes, op widths, flags, it’s actually 3,683)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 50


Carnegie Mellon

Arithmetic Expression Example


arith:
leaq (%rdi,%rsi), %rax
long arith addq %rdx, %rax
(long x, long y, long z) leaq (%rsi,%rsi,2), %rdx
{ salq $4, %rdx
long t1 = x+y; leaq 4(%rdi,%rdx), %rcx
long t2 = z+t1; imulq %rcx, %rax
long t3 = x+4; ret
long t4 = y * 48;
long t5 = t3 + t4; Interesting Instructions
long rval = t2 * t5;  leaq: address computation
return rval;  salq: shift
}
 imulq: multiplication
 Curious: only used once…

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 51


Carnegie Mellon

Understanding Arithmetic Expression Example


arith:
leaq (%rdi,%rsi), %rax # t1
long arith addq %rdx, %rax # t2
(long x, long y, long z) leaq (%rsi,%rsi,2), %rdx
{ salq $4, %rdx # t4
long t1 = x+y; leaq 4(%rdi,%rdx), %rcx # t5
long t2 = z+t1; imulq %rcx, %rax # rval
long t3 = x+4; ret
long t4 = y * 48;
long t5 = t3 + t4; Register Use(s)
long rval = t2 * t5;
return rval; %rdi Argument x
} %rsi Argument y
%rdx Argument z,
t4
%rax t1, t2, rval
%rcx t5

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 52


Carnegie Mellon

Evolution of Intel Instruction Set


 The Intel instruction set has changed over the decades since it was first introduced.

 Intel is a believer in the “CISC” model: complex instructions that are highly optimized

 Modern example: vector parallel instructions (also called SIMD: Single instruction,
multiple data). Introduced to make the x86 more competitive with GPU accelerators
 Such as “Multiply these two vectors and put the result in this third vector”, or “sum up the elements
in this vector, and put the result here.”
 The underlying hardware uses parallel processing to do the job faster.
 The C++ compiler can recognize many of these patterns and will emit vector parallel instructions (if
the target computer supports them). You can also provide “hints” to the compiler, to do so.

 There are many more examples; we will see a few later in the semester
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 53
Carnegie Mellon

Today: Machine Programming I: Basics


 History of Intel processors and architectures
 Assembly Basics: Registers, operands, move
 Arithmetic & logical operations
 C/C++, assembly, machine code

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 54


Carnegie Mellon

Turning C/C++ into Object Code


 Code in files p1.cpp p2.c
 Compile with command: c++ pp1.cpp p2.c -o p
 There are often additional arguments such as –O3, -pg, -g…
 Put resulting binary in file p

text C/C++ program (p1.cpp p2.c)

Compiler (c++)

text Asm program (p1.s p2.s)

Assembler (c++ or as)

binary Object program (p1.o p2.o) Static libraries


(.a)
Linker (c++ or ld)

binary Executable program (p)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 55


Carnegie Mellon

Compiling Into Assembly


C/C++ Code Generated x86-64 Assembly
long(sum.c)
plus(long x, long y); sumstore:
pushq %rbx
void sumstore(long x, long y, movq %rdx, %rbx
long *dest) call plus
{ movq %rax, (%rbx)
long t = plus(x, y); popq %rbx
*dest = t; ret
}

Obtain with command This uses the “indirect” addressing mode: dest holds
a memory address and *dest is a long integer at that
C++ sum.c address. We are using that location as a variable here!

Produces file sum.s

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 56


Carnegie Mellon

What it really looks like


.globl sumstore
.type sumstore, @function
sumstore:
.LFB35:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdx, %rbx
call plus
movq %rax, (%rbx)
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE35:
.size sumstore, .-sumstore

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 57


Carnegie Mellon

What it really looks like


.globl sumstore
Things that look weird
.type sumstore, @function and are preceded by a ‘.’
sumstore: are generally directives.
.LFB35:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16 sumstore:
.cfi_offset 3, -16 pushq %rbx
movq %rdx, %rbx
movq %rdx, %rbx
call plus
call plus
movq %rax, (%rbx)
movq %rax, (%rbx)
popq %rbx
popq %rbx
ret
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE35:
.size sumstore, .-sumstore

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 58


Carnegie Mellon

Assembly Characteristics: Data Types


 “Integer” data of 1, 2, 4, or 8 bytes
 Data values
 Addresses (untyped pointers)

 Floating point data of 4, 8, or 10 bytes

 (SIMD vector data types of 8, 16, 32 or 64 bytes)

 Code: Byte sequences encoding series of instructions

 No aggregate types such as arrays or structures


 Just contiguously allocated bytes in memory

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 59


Carnegie Mellon

Assembly Characteristics: Operations


 Transfer data between memory and register
 Load data from memory into register
 Store register data into memory

 Perform arithmetic function on register or memory data

 Transfer control
 Unconditional jumps to/from procedures
 Conditional branches

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 60


Carnegie Mellon

Object Code
Code for sumstore
 Assembler
0x0400595:
0x53
 Translates .s into .o
0x48  Binary encoding of each instruction
0x89  Nearly-complete image of executable code
0xd3
0xe8
 Missing linkages between code in different
0xf2 files
0xff  Linker
0xff
0xff  Resolves references between files
• Total of 14 bytes
0x48  Combines with static run-time libraries
0x89 • Each instruction
e.g., code for malloc, printf

0x03 1, 3, or 5 bytes
0x5b • Starts at address
 Some libraries are dynamically linked
0xc3 0x0400595  Linking occurs when program begins
execution

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 61


Carnegie Mellon

Machine Instruction Example


 C Code
*dest = t;
 Store value t where designated by
dest
 Assembly
movq %rax, (%rbx)
 Move 8-byte value to memory
Quad words in x86-64 parlance
 Operands:
t: Register %rax
dest: Register %rbx
*dest: Memory M[%rbx]
 Object Code
0x40059e: 48 89 03
 3-byte instruction
 Stored at address 0x40059e

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 62


Carnegie Mellon

Disassembling Object Code


Disassembled
0000000000400595 <sumstore>:
400595: 53 push %rbx
400596: 48 89 d3 mov %rdx,%rbx
400599: e8 f2 ff ff ff callq 400590 <plus>
40059e: 48 89 03 mov %rax,(%rbx)
4005a1: 5b pop %rbx
4005a2: c3 retq

 Disassembler
objdump –d sum
 Useful tool for examining object code
 Analyzes bit pattern of series of instructions
 Produces approximate rendition of assembly code
 Can be run on either a.out (complete executable) or .o file
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 63
Carnegie Mellon

Alternate Disassembly
Disassembled

Dump of assembler code for function sumstore:


0x0000000000400595 <+0>: push %rbx
0x0000000000400596 <+1>: mov %rdx,%rbx
0x0000000000400599 <+4>: callq 0x400590 <plus>
0x000000000040059e <+9>: mov %rax,(%rbx)
0x00000000004005a1 <+12>:pop %rbx
0x00000000004005a2 <+13>:retq

 Within gdb Debugger


 Disassemble procedure
gdb sum
disassemble sumstore

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 64


Carnegie Mellon

Warning!

 Disassembly is useful when debugging but prohibited in many situations.


A common and valid use is to understand what caused your own code
to crash. With a complex piece of code knowing the line number isn’t always enough.

 Hackers disassemble programs to look for coding errors that they can leverage to
steal passwords or even take control by sending malformed inputs.
This is why it is illegal to disassemble things like Microsoft Word.

 Cornell has harsh penalties for people who engage in hacking activities
while enrolled in the university. A hacker could be suspended or expelled!

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 65

You might also like