0% found this document useful (0 votes)

129 views52 pages

Computer Architecture Insights

The document discusses the fundamentals of memory hierarchy design in high-performance computer architecture, detailing the various levels of memory from cache to disk storage. It highlights the performance gap between processors and memory, the characteristics of different memory types (SRAM, DRAM), and the implications of cache organization and replacement policies. Additionally, it covers the impact of block size and associativity on cache performance, along with types of cache misses and write policies.

Uploaded by

Chandan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views52 pages

Computer Architecture Insights

Uploaded by

Chandan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

CS60003: High Performance Computer Architecture

Memory Hierarchy Design – 1: Fundamentals

Instructor:
Prof. Rajat Subhra Chakraborty
Professor
Dept. of Computer Science and Engineering
Indian Institute of Technology Kharagpur
IIT KHARAGPUR Kharagpur, West Bengal, India 721302
E-mail: rschakraborty@cse.iitkgp.ac.in
Different Instruction Set Architectures

The typical levels in the hierarchy slow down and get larger as we move away from the processor for a large workstation or
small server. Embedded computers might have no disk storage and much smaller memories and caches. Increasingly, FLASH is
replacing magnetic disks, at least for first level file storage. The access times increase as we move to lower levels of the hierarchy,
which makes it feasible to manage the transfer less responsively. The implementation technology shows the typical technology used
for these functions. The access time is given in nanoseconds for typical values in 2017; these times will decrease over time.
Bandwidth is given in megabytes per second between levels in the memory hierarchy. Bandwidth for disk/FLASH storage includes
both the media and the buffered interfaces.

Indian Institute of Technology Kharagpur

H&P CA:A QA (6th. Ed.)
Typical Memory Hierarchy

The levels in a typical memory hierarchy in a personal mobile

device (PMD), such as a cell phone or tablet (A), in a laptop or
desktop computer (B), and in a server (C). As we move farther away
from the processor, the memory in the level below becomes slower and
larger. Note that the time units change by a factor of 109 from
picoseconds to milliseconds in the case of magnetic disks and that the
size units change by a factor of 1010 from thousands of bytes to tens of
terabytes. If we were to add warehouse-sized computers, as opposed to
just servers, the capacity scale would increase by three to six orders of
magnitude. Solid-state drives (SSDs) composed of Flash are used
exclusively in PMDs, and heavily in both laptops and desktops. In many
desktops, the primary storage system is SSD, and expansion disks are
primarily hard disk drives (HDDs). Likewise, many servers mix SSDs and
HDDs.

Indian Institute of Technology Kharagpur

H&P CA:A QA (6th. Ed.)
Processor-Memory Performance Gap

Starting with 1980 performance as a baseline, the gap in performance, measured as the difference in the time between
processor memory requests (for a single processor or core) and the latency of a DRAM access, is plotted over time. In mid-
2017, AMD, Intel and Nvidia all announced chip sets using versions of HBM technology. Note that the vertical axis must be on
a logarithmic scale to record the size of the processor-DRAM performance gap. The memory baseline is 64 KiB DRAM in 1980, with
a 1.07 per year performance improvement in latency (see Figure 2.4 on page 88). The processor line assumes a 1.25 improvement
per year until 1986, a 1.52 improvement until 2000, a 1.20 improvement between 2000 and 2005, and only small improvements in
processor performance (on a per-core basis) between 2005 and 2015. As you can see, until 2010 memory access times in DRAM
improved slowly but consistently; since 2010 the improvement in access time has reduced, as compared with the earlier periods,
although there have been continued improvements in bandwidth.
Indian Institute of Technology Kharagpur
H&P CA:A QA (6th. Ed.)
Disk Storage
• Nonvolatile, rotating magnetic storage
• Important concepts: sector, track, cylinder

Indian Institute of Technology Kharagpur

H&P CA:A QA (6th. Ed.)
Disk Access Example

 Given:
 512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms
controller overhead, idle disk

 Average read time:

 4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms

 If actual average seek time is 1 ms:

 Average read time = 3.2 ms

Indian Institute of Technology Kharagpur

Disk Performance Issues

 Manufacturers quote average seek time

 Based on all possible seeks

 Locality and OS scheduling lead to smaller actual average seek times

 Smart disk controller allocate physical sectors on disk

 Present logical sector interface to host
 SCSI, ATA, SATA

 Disk drives include caches

 Pre-fetch sectors in anticipation of access
 Avoid seek and rotational delay

Indian Institute of Technology Kharagpur

Memory Technology

 Static RAM (SRAM)

 0.5ns – 2.5ns, $2000 – $5000 per GB

 Dynamic RAM (DRAM)

 50ns – 70ns, $20 – $75 per GB

 Ideal memory should have:

 Access time of SRAM
 Capacity and cost/GB of disk

Indian Institute of Technology Kharagpur

Main Memory: DRAM

 Data stored as a charge in a capacitor

 Single transistor used to access the charge
 Must periodically be refreshed (read content and write again)

Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.)

DRAM Size

Capacity and access times for DDR SDRAMs by year of production. Access time is for a random
memory word and assumes a new row must be opened. If the row is in a different bank, we assume the
bank is precharged; if the row is not open, then a precharge is required, and the access time is longer. As
the number of banks has increased, the ability to hide the precharge time has also increased. DDR4
SDRAMs were initially expected in 2014, but did not begin production until early 2016.

Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.)

DRAM Performance Improvement

 Row buffer
 Allows several words to be read and refreshed in parallel

 Synchronous DRAM
 Allows for consecutive accesses in bursts without needing to send each address
 Improves bandwidth

 DRAM banking:
 Allows simultaneous access to multiple DRAMs
 Improves bandwidth

Indian Institute of Technology Kharagpur

Cache Memory

 Cache memory: the level of memory closest to the CPU

 SRAM-type circuits, faster than DRAM, but larger and more power-hungry!
 Given accesses X1, X2, …. , Xn-1, Xn , decide:
 Is data present at all?
 Where do we look for it?

Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.)

Direct Mapped Cache
 Location determined by address
 Cache Block (or Cache Line): unit of data storage in cache
 Direct mapped: only one choice
 Cache Index = (Block address) % (No. of blocks in cache)
 No. of blocks is a power of 2
 Use low-order bits of block address as index
 e.g. 8 blocks => lower 3-bits are index bits
 Block size (usually in kB) is very important factor in
calculations!

Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.)

Direct Mapped Cache Organization: Tags and Valid Bits
 Assume: Block size = 1 word
 Assume: 1 word = 4 bytes
 Block offset = word offset = 2 bits in
address
 Tag bits: uniquely identify the block
 Valid bit: is the data valid?
 Valid bit: 1 = present, 0 = not present

 Initial value of valid bit: 0

Indian Institute of Technology Kharagpur H&P CA:A QA (6th. Ed.)

Example: Larger Block Size
 64 blocks, 16 bytes/block
 To what block number does address 1200 map?

 Block address = 1200/16 = 75

 Block number = 75 (mod 64) = 11

31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits