[go: up one dir, main page]

0% found this document useful (0 votes)
27 views14 pages

Chapter I: Computer Abstractions and Performance

This document is a comprehensive overview of computer architecture, covering topics such as classes of computers, layers of operation, performance metrics, and key technological trends. It discusses the evolution of computing from personal computers to cloud computing and highlights the importance of abstraction in simplifying design. Additionally, it addresses the components of computers, including processors, memory, and networks, while emphasizing the significance of performance measurement and technology advancements.

Uploaded by

anhtnh.23ba14015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views14 pages

Chapter I: Computer Abstractions and Performance

This document is a comprehensive overview of computer architecture, covering topics such as classes of computers, layers of operation, performance metrics, and key technological trends. It discusses the evolution of computing from personal computers to cloud computing and highlights the importance of abstraction in simplifying design. Additionally, it addresses the components of computers, including processors, memory, and networks, while emphasizing the significance of performance measurement and technology advancements.

Uploaded by

anhtnh.23ba14015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Computer Architecture - Chapter I

Computer Abstractions and Performance


By Bao Huynh Kim Gia from USTH Learning Support
June 2025

Contents
1 Introduction to Computer Architecture 3
1.1 Classes of Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The PostPC Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Layers of Computer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Understanding Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Seven Great Ideas 4


2.1 Key Ideas in Computer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Below Your Program 4


3.1 Levels of programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Under the cover 5


4.1 Components of a Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Touchscreen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Through the Looking Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.4 Opening the Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.5 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.6 A Safe Place for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.7 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Technologies for Building Processors and Memory 6


5.1 Technology Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Semiconductor technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Manufacturing process for integrated circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.4 IC cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

6 Performance 8
6.1 Response Time and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.2 Relative Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.3 Measuring Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.4 CPU Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.5 Instruction Count and CPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 The Power Wall 12


7.1 Power Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

8 The Sea Change 13


8.1 Multiprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.2 SPEC CPU Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.3 SPEC Power Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1
9 Fallacies and Pitfalls 13
9.1 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9.2 Fallacy: Low Power at Idle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3 Pitfall: MIPS as a Performance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2
1 Introduction to Computer Architecture
1.1 Classes of Computers
There are several main classes of computers, each serving different needs:
• Personal Computers (PCs): General-purpose machines used for a wide range of tasks, balancing
cost and performance for individual users.
• Server Computers: Designed for high capacity, reliability, and performance. They run applications
accessed by multiple users, from small office servers to large data centers.
• Supercomputers: The most powerful servers, focused on scientific and engineering calculations. Al-
though they offer the highest computational capability, they constitute only a small part of the market.
• Embedded Computers: The most numerous, these are hidden within other devices and often face
strict constraints on power, cost, and performance.

1.2 The PostPC Era


• Personal Mobile Devices (PMDs): Devices such as smartphones, tablets, and wearables, which
are battery-powered and connect wirelessly to the Internet.
• Cloud Computing: Services and applications increasingly delivered over the Internet using massive
“Warehouse Scale Computers” (WSCs) managed by companies such as Amazon and Google. Software
as a Service (SaaS) allows parts of applications to run in the cloud while others run on user devices.

1.3 Layers of Computer Operation


• Application Software: Written in high-level programming languages, focused on productivity and
portability.
• System Software: The operating system and compilers, which translate high-level code to machine
instructions, manage hardware resources, handle input/output, and schedule tasks.
• Hardware: The processor, memory, I/O controllers, and other physical components responsible for
executing instructions.

1.4 Understanding Performance


Performance is a central concern in computer architecture. It is affected by:
• Algorithm: Determines the total number of operations
• Programming Language, Compiler, and Architecture: Determine number of machine instruc-
tions executed per operation
• Processor and Memory System: Determine how fast instructions are executed
• I/O System (including OS): Determines how fast I/O operations are executed

Performance is measured primarily by response time (how long it takes to do a task) and throughput
(amount of work completed per unit time). Key performance equations involve instruction count, cycles per
instruction (CPI), and clock rate.

3
2 Seven Great Ideas
2.1 Key Ideas in Computer Architecture
• Use abstraction to simplify design
• Make the common case fast
• Performance via parallelism

• Performance via pipelining


• Performance via prediction
• Hierarchy of memories
• Dependability via redundancy

3 Below Your Program

Figure 1: Layers of computer system abstraction

• Application Software: Written in high-level programming languages. Focused on productivity and


user-level tasks.

• System Software: Includes compilers (which translate high-level code to machine code) and the
operating system (which manages input/output, memory, and processes).
• Hardware: Consists of the processor, memory, and I/O controllers

4
3.1 Levels of programming

• High-Level Language: Productive, portable, abstract.


• Assembly Language: Textual, 1-to-1 mapping to machine code.
• Hardware (Binary): Actual instructions/data as 0s and 1s.

4 Under the cover


4.1 Components of a Computer
• All computers have similar components: Desktop, server, embedded
• I/O (Input/Output) includes User-interface devices (Display, keyboard, mouse), Storage devices
(Hard disk, CD/DVD, flash) and Network adapters (For communicating with other computers).

4.2 Touchscreen
• Replaced the keyboard and mouse with touchscreen in the tablets and smartphones
• Resistive (pressure-based), Capacitive (current-based, multi-touch, mostly used)

* Since people are electrical conductors, if an insulator like glass is covered with a transparent conductor,
touching distorts the electrostatic field of the screen, which results in a change in capacitance.

4.3 Through the Looking Glass


• Most personal mobile devices use liquid crystal displays (LCDs)
• LCDs show pixels based on frame buffer content

4.4 Opening the Box


• Chips: devices that drive our advancing technology
• CPU (central processor unit): the active part of the computer, following the instructions of a program
to the letter, it adds numbers, tests numbers, signals I/O devices to activate, and so on
• Datapath: performs the arithmetic operations

5
• Control: sequences the datapath, memory, and I/O devices according to the wishes of the instructions
of the program
• Memory: where the programs are kept when they are running, which is built from DRAM (dynamic
random access memory) chips

• Cache memory: consists of a small, fast memory that acts as a buffer for the DRAM memory.
• SRAM (static random access memory): faster and less dense, hence more expensive than DRAM

4.5 Abstraction
• Abstraction helps us deal with complexity by hiding lower-level details
• ISA (instruction set architecture) is the hardware/software interface, includes anything programmers
need to know to make a binary machine language program work correctly

• ABI (application binary interface) is the ISA and the operating system interface
• Implementation is the hardware that obeys the architecture abstraction

4.6 A Safe Place for Data


• Volatile main memory: Loses instructions and data when power off
• Non-volatile secondary memory: Magnetic disk, Flash memory, Optical disk (CDROM, DVD)

4.7 Networks
• Communication: Information is exchanged between computers at high speeds.
• Resource sharing: Rather than each computer having its own I/O devices, computers on the network
can share I/O devices.

• Nonlocal access: By connecting computers over long distances, users need not be near the computer
they are using.
• Local area network (LAN): Ethernet
• Wide area network (WAN): the Internet

• Wireless network: WiFi, Bluetooth

5 Technologies for Building Processors and Memory


5.1 Technology Trends
Electronics technology continues to evolve
• Increased capacity and performance
• Reduce cost

6
Figure 2: Relative performance per unit cost of technologies used in computers over time

Figure 3: Growth of capacity per DRAM chip over time

5.2 Semiconductor technology


• Semiconductor: Silicon (a substance found in sand which does not conduct electricity well)
• Add materials to transform properties: conductors, insulators, and switches

5.3 Manufacturing process for integrated circuits


• Die: The individual rectangular sections that are cut from a wafer
• Yield: The percentage of working dies from the total number of dies on the wafer

Figure 4: The chip manufacturing process

7
After being sliced from the silicon ingot, blank wafers are put through 20 to 40 steps to create patterned
wafers. These patterned wafers are then tested with a wafer tester, and a map of the good parts is made.
Then, the wafers are diced into dies. In this figure, one wafer produced 20 dies, of which 17 passed testing.
(X means the die is bad.) The yield of good dies in this case was 17/20, or 85%. These good dies are then
bonded into packages and tested one more time before shipping the packaged parts to customers. One bad
packaged part was found in this final test.

5.4 IC cost
Depending on the defect rate and the size of the die and wafer, costs are generally not linear in the die area.
• Wafer cost and area are fixed
• Defect rate determined by manufacturing process
• Die area determined by architecture and circuit design

Cost per wafer


Cost per die =
Dies per wafer × yield

Wafer area
Dies per wafer ≈
Die area
1
Yield = 2
[1 + (Defects per area × Die area/2)]

6 Performance
6.1 Response Time and Throughput
• Response time (Execution time): The total time required for the computer to complete a task
• Throughput: The total work done per unit time

Question: Do the following changes to a computer system increase throughput, decrease response time, or
both?

• Replacing the processor in a computer with a faster version


• Adding additional processors to a system that uses multiple processors for separate tasks. For example,
searching the web
Answer:

• In case 1, decreasing response time and improve throughput


• In case 2, no one task gets work done faster, so only throughput increases

6.2 Relative Performance


To maximize performance, we want to minimize response time or execution time for some task. We can relate
performance and execution time for a computer X:
1
PerformanceX =
Execution timeX
This means that for two computers X and Y, if the performance of X is n times faster than the performance
of Y or the execution time on Y is n times as long as it is on X, we have:
PerformanceX Execution timeY
= =n
PerformanceY Execution timeX

8
Question: If computer A runs a program in 10 seconds and computer B runs the same program in 15
seconds, how much faster is A than B?
Answer:
We know that A is n times as fast as B if
PerformanceA Execution timeB
= =n
PerformanceB Execution timeA
Thus the performance ratio is
15
= 1.5
10
and A is therefore 1.5 times as fast as B.

6.3 Measuring Execution Time


• Elapsed time: The total time to complete a task, including disk accesses, memory accesses, in-
put/output (I/O) activities, operating system overhead which determines system performance

• CPU time (CPU execution time): The time the CPU spends computing for this task and does
not include time spent waiting for I/O or running other programs; which can be divided into user
CPU time and system CPU time. Different programs are affected differently by CPU and system
performance

6.4 CPU Clocking


• Operation of digital hardware governed by a constant-rate clock
• Clock period: duration of a clock cycle

• Clock frequency (rate): cycles per second

Performance improved by:

• Reducing number of clock cycles


• Increasing clock rate
• Hardware designer must often trade off clock rate against cycle count

A simple formula relates the most basic metrics (clock cycles and clock cycle time) to CPU time:

CPU execution time for a program = CPU clock cycles for a program × Clock cycle time

Alternatively, because clock rate and clock cycle time are inverses,
CPU clock cycles for a program
CPU execution time for a program =
Clock rate

Question: Our favorite program runs in 10 seconds on computer A, which has a 2 GHz clock. We are trying
to help a computer designer build a computer, B, which will run this program in 6 seconds. Th e designer
has determined that a substantial increase in the clock rate is possible, but this increase will aff ect the rest
of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for this
program. What clock rate should we tell the designer to target?
Answer:
Let’s first find the number of clock cycles required for the program on A:
CPU clock cyclesA
CPU timeA =
Clock rateA
CPU clock cyclesA
10 seconds = cycles
2 × 109 second

9
cycles
CPU clock cyclesA = 10 seconds × 2 × 109 = 20 × 109 cycles
second
CPU time for B can be found using this equation:
1.2 × CPU clock cyclesA
CPU timeB =
Clock rateB

1.2 × 20 × 109 cycles


6 seconds =
Clock rateB

1.2 × 20 × 109 cycles 0.2 × 20 × 109 cycles 4 × 109 cycles


Clock rateB = = = = 4 GHz
6 seconds 1 second 1 second
To run the program in 6 seconds, B must have twice the clock rate of A.

6.5 Instruction Count and CPI


• Instruction Count for a program: Determined by program, ISA and compiler
• Clock cycles per instruction (CPI): Average number of clock cycles per instruction for a program
or program fragment.
• Average cycles per instruction: Determined by CPU hardware; if different instructions have different
CPI then average CPI affected by instruction mix

CPU clock cycles = Instructions for a program × Average clock cycles per instruction

Clock Cycles = Instruction Count × Cycles per Instruction

CPU Time = Instruction Count × CPI × Clock Cycle Time

Instruction Count × CPI


=
Clock Rate

Question: Suppose we have two implementations of the same instruction set architecture. Computer A has
a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of
500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much?
Answer:
We know that each computer executes the same number of instructions for the program; let’s call this number
I. First, find the number of processor clock cycles for each computer:

CPU clock cyclesA = I × 2.0


CPU clock cyclesB = I × 1.2
Now we can compute the CPU time for each computer:

CPU timeA = CPU clock cyclesA × Clock cycle time


= I × 2.0 × 250 ps = 500 × I ps
Likewise, for B:

CPU timeB = I × 1.2 × 500 ps = 600 × I ps


Clearly, computer A is faster. The amount faster is given by the ratio of the execution times:
CPU performanceA Execution timeB 600 × I ps
= = = 1.2
CPU performanceB Execution timeA 500 × I ps

10
We can conclude that computer A is 1.2 times as fast as computer B for this program.
If different instruction classes take different numbers of cycles:
n
X
Clock Cycles = (CPIi × Instruction Counti )
i=1

Weighted average CPI


n 
Clock Cycles Instruction Counti
X 
CPI = = CPIi ×
Instruction Count i=1
Instruction Count
Question: A compiler designer is trying to decide between two code sequences for a particular computer.
The hardware designers have supplied the following facts:

CPI for each instruction class


A B C
CPI 1 2 3

For a particular high-level language statement, the compiler writer is considering two code sequences that
require the following instruction counts:

Instruction counts for each instruction class


Code sequence
A B C
1 2 1 2
2 4 1 1

Which code sequence executes the most instructions? Which will be faster? What is the CPI for each
sequence?
Answer:
Sequence 1 executes 2 + 1 + 2 = 5 instructions. Sequence 2 executes 4 + 1 + 1 = 6 instructions. Therefore,
sequence 1 executes fewer instructions.
We can use the equation for CPU clock cycles based on instruction count and CPI to find the total number
of clock cycles for each sequence:
n
X
CPU clock cycles = (CPIi × Ci )
i=1

This yields

CPU clock cycles1 = (2 × 1) + (1 × 2) + (2 × 3) = 2 + 2 + 6 = 10 cycles


CPU clock cycles2 = (4 × 1) + (1 × 2) + (1 × 3) = 4 + 2 + 3 = 9 cycles
So code sequence 2 is faster, even though it executes one extra instruction. Since code sequence 2 takes
fewer overall clock cycles but has more instructions, it must have a lower CPI. The CPI values can be
computed by
CPU clock cycles
CPI =
Instruction count
CPU clock cycles1 10
CPI1 = = = 2.0
Instruction count1 5
CPU clock cycles2 9
CPI2 = = = 1.5
Instruction count2 6

Summary:
• Algorithm: affects IC, possibly CPI
• Programming language: affects IC, CPI

11
• Compiler: affects IC, CPI
• Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time = × ×
Program Instruction Clock cycle

7 The Power Wall


7.1 Power Trend

Figure 5: Clock rate and Power for Intel x86 microprocessors over eight generations and 25 years

Power = Capacitive load × Voltage2 × Frequency


Question: Suppose we developed a new, simpler processor that has 85% of the capacitive load of the more

complex older processor. Further, assume that it has adjustable voltage so that it can reduce voltage 15%
compared to processor B, which results in a 15% shrink in frequency. What is the impact on dynamic power?
Answer:

Powernew (Capacitive load × 0.85) × (Voltage × 0.85)2 × (Frequency switched × 0.85)


=
Powerold Capacitive load × Voltage2 × Frequency switched
Thus the power ratio is
0.854 = 0.52
Hence, the new processor uses about half the power of the old processor.

7.2 Problems
Since computer designers slammed into a power wall, the problems are:
• We canât reduce voltage further

• We canât remove more heat


They needed a new way forward.

12
8 The Sea Change
8.1 Multiprocessors
• Multicore microprocessors which has more than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
∗ Hardware executes multiple instructions at once
∗ Hidden from the programmer
– Hard to do
∗ Programming for performance
∗ Load balancing
∗ Optimizing communication and synchronization

8.2 SPEC CPU Benchmark


• Benchmark is a program used to measure performance (Supposedly typical of actual workload)
• Standard Performance Evaluation Corp (SPEC) develops benchmarks for CPU, I/O, Web, ...
• SPEC CPU2006
– Elapsed time to execute a selection of programs (Negligible I/O, so focuses on CPU performance)
– Normalize relative to reference machine
– Summarize as geometric mean of performance ratios (CINT2006 (integer) and CFP2006 (floating-
point))
• The formula for the geometric mean is:
v
u n
uY
n
t Execution time ratioi
i=1

8.3 SPEC Power Benchmark


SPEC Power Benchmark reports power consumption of server at different workload levels where
• Performance: ssj_ops
• Power: Watts (Joules/sec)
The formula for overall performance per watt is
10
!, 10
!
X X
overall ssj_ops per watt = ssj_opsi poweri
i=0 i=0

where ssj_opsi is performance at each 10% increment and poweri is power consumed at each performance
level.

9 Fallacies and Pitfalls


9.1 Pitfalls
Improving an aspect of a computer and expecting a proportional improvement in overall performance. The
execution time of the program after making the improvement is given by the following simple equation known
as Amdahl’s Law:
Taffected
Timproved = + Tunaffected
improvement factor

13
Example: Suppose a program runs in 100 seconds on a computer, with multiply operations responsible for
80 seconds of this time. How much do I have to improve the speed of multiplication if I want my program to
run five times faster?

80 seconds
Execution time after improvement = + (100 − 80 seconds)
n

Since we want the performance to be five times faster, the new execution time should be 20 seconds,
giving:
80 seconds
20 seconds = + 20 seconds
n
80 seconds
0=
n
⇒ There is no amount by which we can enhance-multiply to achieve a fivefold increase in performance

9.2 Fallacy: Low Power at Idle


• Look back at i7 power benchmark
– At 100% load: 258W
– At 50% load: 170W (66%)
– At 10% load: 121W (47%)
• Google data center
– Mostly operates at 10% – 50% load
– At 100% load less than 1% of the time

• Consider designing processors to make power proportional to load

9.3 Pitfall: MIPS as a Performance Metric


MIPS (Millions of Instructions Per Second): A measurement of program execution speed based on the number
of millions of instructions but doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions
Instruction count
MIPS =
Execution time × 106
By substituting for execution time, we see the relationship between MIPS, clock rate, and CPI:
Instruction count Clock rate
MIPS = =
Instruction count×CPI
Clock rate × 10 6 CPI × 106

14

You might also like