Chapter 1 Computer Architecture and Organization
Chapter 1 Computer Architecture and Organization
[ECEg - 4163]
Chapter One:
Overview of Computer Architecture and
Organization
4
Cont’d...
IBM System/370 Architecture
Was introduced in 1970
Included a number of models
Could upgrade to a more expensive, faster model without having to
abandon original software
New models are introduced with improved technology, but retain the same
architecture so that the customer’s software investment is protected
Architecture has survived to this day as the architecture of IBM’s
mainframe product line
5
1.1.2 Structure and Function
A computer is a complex system; contemporary computers contain
millions of elementary electronic components.
How can one clearly describe them?
The key to clearly describe them is to recognize the hierarchical nature of
most complex systems, including the computer [SIMO96].
A hierarchical system is a set of interrelated subsystems, each of the latter,
in turn, hierarchical in structure until we reach some lowest level of
elementary subsystem.
The hierarchical nature of complex systems is essential to both their design
and their description.
6
Cont’d…
The designer need only deal with a particular level of the system at a time.
At each level, the system consists of a set of components and their
interrelationships.
The behavior at each level depends only on a simplified, abstracted
characterization of the system at the next lower level.
At each level, the designer is concerned with structure and function:
➔
Structure: The way in which the components are interrelated.
➔
Function: The operation of each individual component as part of the
structure.
7
Cont’d…
In terms of description, we have two choices:
➔
Starting at the bottom and building up to a complete description, or
➔
Beginning with a top view and decomposing the system into its sub
parts.
Evidence from a number of fields suggests that the top down approach is
the clearest and most effective.
8
Cont’d...
Function
There are four basic functions that a computer can perform:
Data processing: Data may take a wide variety of forms and the range of processing
requirements is broad
Data storage: Short-term/Long-term
Data movement
➔
Input-output (I/O) - when data are received from or delivered to a device (peripheral)
that is directly connected to the computer
➔
Data communications – when data are moved over longer distances, to or from a
remote device
Control
➔
A control unit manages the computer’s resources and orchestrates the performance of
its functional parts in response to instructions
9
Cont’d…
Operating environment (source and destination of data)
16
Cont’d…
Multicore Computer
Structure
19
1.1.3 A Brief History of Computers
The First Generation:Vacuum Tubes
Vacuum tubes were used for digital logic elements and memory
IAS computer
➔
Fundamental design approach was the stored program concept
✔
Attributed to the mathematician John von Neumann
✔
First publication of the idea was in 1945 for the EDVAC
➔
In 1946 design began at the Princeton Institute for Advanced Studies
➔
Completed in 1952
➔
Prototype of all subsequent general-purpose computers
20
Cont’d… Figure 1.6 IAS structure
Instruction register (IR) • •Contains the 8-bit opcode instruction being executed
Contains the 8-bit opcode instruction being executed
Accumulator (AC) and multiplier • •Employed to temporarily hold operands and results of ALU
Employed to temporarily hold operands and results of ALU
quotient (MQ) operations
operations
23
Cont’d…
25
Cont’d…
Second Generation: Transistors
Smaller
Cheaper
Dissipates less heat than a vacuum tube
Is a solid state device made from silicon
Was invented at Bell Labs in 1947
It was not until the late 1950’s that fully transistorized computers were
commercially available
26
Cont’d…
Table 1.2 Computer Generations
27
Cont’d…
Second Generation
Introduced:
More complex arithmetic and logic units and control units
The use of high-level programming languages
Provision of system software which provided the ability to:
➔
Load programs
➔
Move data to peripherals
➔
Libraries perform common computations
28
Cont’d…
30
Cont’d…
Third Generation: Integrated Circuits
1958 – the invention of the integrated circuit
Microelectronics
➔
Small electronics
The two most important members of the third generation were the IBM
System/360 and the DEC PDP-8
31
Cont’d…
33
Cont’d…
Integrated Circuits
A computer consists of gates, memory cells, and interconnections among these
elements
The gates and memory cells are constructed of simple digital electronic
components
Exploits the fact that such components as transistors, resistors, and conductors
can be fabricated from a semiconductor such as silicon
Many transistors can be produced at the same time on a single wafer of silicon
Transistors can be connected with a processor metallization to form circuits
34
Cont’d…
Packaged chip
37
Cont’d…
IBM System/360
Announced in 1964
Product line was incompatible with older IBM machines
Was the success of the decade and cemented IBM as the overwhelmingly
dominant computer vendor
The architecture remains to this day the architecture of IBM’s mainframe
computers
Was the industry’s first planned family of computers
➔
Models were compatible in the sense that a program written for one model
should be capable of being executed by another model in the series
38
Cont’d…
Family Characteristics
Similar or identical instruction set
Similar or identical operating system
Increasing speed
Increasing number of I/O ports
Increasing memory size
Increasing cost
Later Generations
LSI Large Scale Integration
VLSI Very Large Scale Integration
ULSI Ultra Large Scale Integration
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor
memory
➔
Chip was about the size of a single core
➔
Could hold 256 bits of memory
➔
Non-destructive
➔
Much faster than core
41
Cont’d…
In 1974 the price per bit of semiconductor memory dropped below the price per bit of
core memory
➔
There has been a continuing and rapid decline in memory cost accompanied by a
corresponding increase in physical memory density
➔
Developments in memory and processor technologies changed the nature of
computers in less than a decade
Since 1970 semiconductor memory has been through 13 generations
➔
1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of this writing, 8
Gb on a single chip (1 k = 210, 1 M = 220, 1 G = 230).
➔
Each generation has provided four times the storage density of the previous
generation, accompanied by declining cost per bit and declining access time.
42
Cont’d…
Microprocessors
The density of elements on processor chips continued to rise
➔
More and more elements were placed on each chip so that fewer and
fewer chips were needed to construct a single computer processor
1971 Intel developed 4004
➔
First chip to contain all of the components of a CPU on a single chip
➔
Birth of microprocessor
43
Cont’d…
1972 Intel developed 8008
➔
First 8-bit microprocessor
1974 Intel developed 8080
➔
First general purpose microprocessor
➔
Faster, has a richer instruction set, has a large addressing capability
44
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (a) 1970s Processors
45
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (c) 1990s Processors
46
Cont’d…
The Evolution of the Intel x86 Architecture
Two processor families are the Intel x86 and the ARM architectures
Current x86 offerings represent the results of decades of design effort on
complex instruction set computers (CISCs)
An alternative approach to processor design is the reduced instruction set
computer (RISC)
ARM architecture is used in a wide variety of embedded systems and is one
of the most powerful and best-designed RISC-based systems on the market
47
Cont’d…
Highlights of the Evolution of the Intel Product Line:
8080 8086 80286 80386 80486
• World’s first general- • A more powerful 16- • Extension of the 8086 • Intel’s first 32-bit • Introduced the use of
purpose bit machine enabling addressing a machine much more
microprocessor • Has an instruction 16-MB memory sophisticated and
instead of just 1MB • First Intel processor
cache, or queue, that powerful cache
• 8-bit machine, 8-bit to support technology and
prefetches a few multitasking
data path to memory instructions before sophisticated
• Was used in the first they are executed instruction
• The first appearance pipelining
personal computer
(Altair) of the x86 architecture • Also offered a built-
• The 8088 was a in math coprocessor
variant of this
processor and used in
IBM’s first personal
computer (securing
the success of Intel
48
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium
Intel introduced the use of superscalar techniques, which allow multiple
instructions to execute in parallel
Pentium II
An alternative approach to processor design is the reduced instruction set
computer (RISC)
Pentium III
Incorporated additional floating-point instructions
Streaming SIMD Extensions (SSE)
49
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium 4
Includes additional floating-point and other enhancements for multimedia
Core
First Intel x86 micro-core
Core 2
Extends the Core architecture to 64 bits
Core 2 Quad provides four cores on a single chip
More recent Core offerings have up to 10 cores per chip
An important addition to the architecture was the Advanced Vector Extensions
instruction set
50
Cont’d…
ARM
Refers to a processor architecture that has evolved from RISC design principles
and is used in embedded systems
Family of RISC-based microprocessors and microcontrollers designed by ARM
Holdings, Cambridge, England.
Chips are high-speed processors that are known for their small die size and low
power requirements.
Probably the most widely used embedded processor architecture and indeed the
most widely used processor architecture of any kind in the world.
Acorn RISC Machine/Advanced RISC Machine.
51
Cont’d…
ARM Products
Cortex-M
• Cortex-M0
Cortex-R • Cortex-M0+
• Cortex-M3
Cortex-A/Cortex- • Cortex-M4
A50
52
1.2 Performance Issues
Designing for Performance
➔ Microprocessor Speed
➔ Performance Balance
53
1.2.1 Designing for Performance
The cost of computer systems continues to drop dramatically, while the performance and capacity of
those systems continue to rise equally dramatically.
Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago.
Processors are so inexpensive that we now have microprocessors we throw away.
Desktop applications that require the great power of today’s microprocessor-based systems include:
➔
Image processing ➔ Multimedia authoring
➔
Three-dimensional rendering ➔ Voice and video annotation of files
➔
Speech recognition ➔ Simulation modeling
➔
Video conferencing
Businesses are relying on increasingly powerful servers to handle transaction and database processing
and to support massive client/server networks that have replaced the huge mainframe computer centers
of yesteryear.
Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-
transaction-rate applications for a broad spectrum of clients.
54
Microprocessor Speed
Techniques built into contemporary processors include:
Pipelining: Processor moves data or instructions into a conceptual pipe with all stages of
the pipe processing simultaneously
Branch prediction: Processor looks ahead in the instruction code fetched from memory
and predicts which branches, or groups of instructions, are likely to be processed next
Superscalar execution: This is the ability to issue more than one instruction in every
processor clock cycle. (In effect, multiple parallel pipelines are used.)
Data flow analysis: Processor analyzes which instructions are dependent on each other’s
results, or data, to create an optimized schedule of instructions
Speculative execution: Using branch prediction and data flow analysis, some processors
speculatively execute instructions ahead of their actual appearance in the program
execution, holding the results in temporary locations, keeping execution engines as busy as
possible
55
Performance Balance
Adjust the organization and architecture to compensate for the mismatch among the
capabilities of the various components
Architectural examples include:
Increase the number of bits that are retrieved at one time by making DRAMs “wider” rather
than “deeper” and by using wide bus data paths
Change the DRAM interface to make it more efficient by including a cache or other
buffering scheme on the DRAM chip.
Reduce the frequency of memory access by incorporating increasingly complex and
efficient cache structures between the processor and main memory. This includes the
incorporation of one or more caches on the processor chip as well as on an off-chip cache
close to the processor chip.
Increase the interconnect bandwidth between processors and memory by using higher-speed
buses and a hierarchy of buses to buffer and structure data flow.
56
Cont’d…
61
Cont’d…
Many Integrated Core (MIC))
Leap in performance as well as the challenges in developing software to exploit
such a large number of cores
The multicore and MIC strategy involves a homogeneous collection of general
purpose processors on a single chip
63
1.2.3 Two Laws that Provide Insight: Amdahl’s Law and Little’s Law
Amdahl’s Law
Gene Amdahl
Deals with the potential speedup of a program using multiple processors
compared to a single processor
Illustrates the problems facing industry in the development of multi-core
machines
➔
Software must be adapted to a highly parallel execution environment to
exploit the power of parallel processing
Can be generalized to evaluate and design technical improvement in a
computer system
64
Cont’d…
70
1.2.5 Calculating the Mean
• Arithmetic
• Geometric
• Harmonic
71
Cont’d…
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7,
8, 8, 9, 11)
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 11, 11)
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 11)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11)
MD = median
AM = arithmetic mean
GM = geometric mean
HM = harmonic mean
Figure 1.19 Comparison of Means on Various Data Sets (each set has a maximum data point value of 11)
72
Cont’d…
An Arithmetic Mean (AM) is an appropriate measure if the sum of all the measurements
is a meaningful and interesting value
The AM is a good candidate for comparing the execution time performance of several
systems
For example, suppose we were interested in using a system for large-scale simulation studies and
wanted to evaluate several alternative products. On each system we could run the simulation
multiple times with different input values for each run, and then take the average execution time
across all runs. The use of multiple runs with different inputs should ensure that the results are not
heavily biased by some unusual feature of a given input set. The AM of all the runs is a good
measure of the system’s performance on simulations, and a good number to use for system
comparison.
The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
➔ If the total time doubles, the mean value doubles
73
Cont’d…
Table 1.5 A Comparison of Arithmetic and Harmonic Means for Rates
74
Cont’d…
Table 1.6 A Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A
75
Cont’d…
Table 1.7 Another Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A
76
1.2.5 Benchmarks and SPEC
Benchmark Principles
Desirable characteristics of a benchmark program:
1. It is written in a high-level language, making it portable across different
machines
2. It is representative of a particular kind of programming domain or paradigm,
such as systems programming, numerical programming, or commercial
programming
3. It can be measured easily
4. It has wide distribution
77
System Performance Evaluation Corporation (SPEC)
Benchmark suite
➔ A collection of programs, defined in a high-level language
➔ Together attempt to provide a representative test of a computer in a
particular application or system programming area
SPEC
➔ An industry consortium
➔ Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
➔ Performance measurements are widely used for comparison and
research purposes
78
Cont’d…
SPEC CPU2006
Best known SPEC benchmark suite
Industry standard suite for processor intensive applications
Appropriate for measuring performance for applications that spend most of their
time doing computation rather than I/O
Consists of 17 floating point programs written in C, C++, and Fortran and 12
integer programs written in C and C++
Suite contains over 3 million lines of code
Fifth generation of processor intensive suites from SPEC
79
Cont’d…
Table 1.8 SPEC CPU2006
Integer Benchmarks
80
Cont’d…
Table 1.9 SPEC CPU2006
Floating-Point Benchmarks
81
Cont’d…
Terms Used in SPEC Documentation
Benchmark Peak metric
➔
A program written in a high-level language that
This enables users to attempt to optimize system
can be compiled and executed on any computer performance by optimizing the compiler output
that implements the compiler Speed metric
System under test
This is simply a measurement of the time it takes to
➔
This is the system to be evaluated execute a compiled benchmark
Used for comparing the ability of a computer to
Reference machine
complete single tasks
➔
This is a system used by SPEC to establish a Rate metric
baseline performance for all benchmarks
This is a measurement of how many tasks a computer
➔
Each benchmark is run and measured on this can accomplish in a certain amount of time
machine to establish a reference time for that
benchmark
This is called a throughput, capacity, or rate measure
Base metric
Allows the system under test to execute simultaneous
tasks to take advantage of multiple processors
➔
These are required for all reported results and
have strict guidelines for compilation
82
Cont’d…
86
Cont’d…
At a top level, a computer consists of CPU (central processing unit),
memory, and I/O components, with one or more modules of each type.
These components are interconnected in some fashion to achieve the basic
function of the computer, which is to execute programs.
Thus, at a top level, we can characterize a computer system by describing
1.The external behavior of each component, that is, the data and control
signals that it exchanges with other components, and
2.The interconnection structure and the controls required to manage the
use of the interconnection structure.
87
1.3.1 Computer Components
Contemporary computer designs are based on concepts developed by John von
Neumann at the Institute for Advanced Studies, Princeton
Referred to as the von Neumann architecture and is based on three key concepts:
➔
Data and instructions are stored in a single read-write memory
➔
The contents of this memory are addressable by location, without regard to the
type of data contained there
➔
Execution occurs in a sequential fashion (unless explicitly modified) from one
instruction to the next
Hardwired program
➔
The result of the process of connecting the various components in the desired
configuration
88
Cont’d…
90
Cont’d…
Major components:
CPU
➔
Instruction interpreter
➔
Module of general-purpose arithmetic and logic functions
I/O Components
➔
Input module
✔
Contains basic components for accepting data and instructions and
converting them into an internal form of signals usable by the system
➔
Output module
✔
Means of reporting result
91
Memory address register Memory buffer register
(MAR) (MBR) MEMORY
• Specifies the address in • Contains the data to be written
memory for the next read or into memory or receives the
write data read from memory
MAR
I/O address register I/O buffer register
(I/OAR) (I/OBR)
• Specifies a particular I/O device • Used for the exchange of data
between an I/O module and the
CPU
MBR
92
Cont’d…
95
Cont’d…
These actions fall into four categories:
Processor-memory: Data transferred from processor to memory or from
memory to processor.
Processor-I/O: Data transferred to or from a peripheral device by
transferring between the processor and an I/O module.
Data processing: The processor may perform some arithmetic or logic
operation on data.
Control: An instruction may specify that the sequence of execution be
altered.
96
Cont’d…
Figure 1.24 Example of Program Execution (contents of memory and registers in hexadecimal) 98
Cont’d…
100
Cont’d…
(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait
= interrupt occurs during course of execution of user program
Figure 1.25 Program Flow of Control without and with Interrupts 101
Cont’d…
An I/O
module is
allowed to
Processor exchange
Processor Processor Processor data directly
reads an reads data
writes a unit sends data to with memory
instruction or a from an I/O
unit of data of data to the I/O without going
device via an through the
from memory memory device
I/O module processor
using direct
memory
access
111
1.3.4 Bus Interconnection
A communication pathway connecting two or more devices
➔
Key characteristic is that it is a shared transmission medium
Signals transmitted by any one device are available for reception by all other devices
attached to the bus
➔
If two devices transmit during the same time period their signals will overlap and
become garbled
Typically consists of multiple communication lines
➔
Each line is capable of transmitting signals representing binary 1 and binary 0
Computer systems contain a number of different buses that provide pathways between
components at various levels of the computer system hierarchy
System bus
➔
A bus that connects major computer components (processor, memory, I/O)
The most common computer interconnection structures are based on the use of one or
more system buses 112
Cont’d…
Data Bus
Data lines that provide a path for moving data among system modules
May consist of 32, 64, 128, or more separate lines
The number of lines is referred to as the width of the data bus
The number of lines determines how many bits can be transferred at a time
The width of the data bus is a key factor in determining overall system
performance
113
Address Bus Control Bus
Used to designate the source or Used to control the access and the use of
destination of the data on the data bus the data and address lines
➔
If the processor wishes to read a word Because the data and address lines are
of data from memory it puts the shared by all components there must be a
address of the desired word on the means of controlling their use
address lines Control signals transmit both command
Width determines the maximum possible and timing information among system
memory capacity of the system modules
Also used to address I/O ports Timing signals indicate the validity of
➔
The higher order bits are used to select data and address information
a particular module on the bus and the Command signals specify operations to
lower order bits select a memory be performed
location or I/O port within the module
Cont’d…