0% found this document useful (0 votes)

31 views108 pages

Module 1&2

The document outlines a course on High Performance Computing (HPC) at the Indian Institute of Information Technology Sri City, detailing the syllabus, evaluation plan, and reference materials. It covers topics such as parallel computing, supercomputers, and various programming models, emphasizing the importance of parallelism in modern computing. The evaluation plan includes components like mid-exams, end-semester exams, projects, and quizzes.

Uploaded by

srinutirumanisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views108 pages

Module 1&2

Uploaded by

srinutirumanisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

HPC-25|IIITS

Indian Institute of Information Technology Sri city

1 High Performance Computing

Dr.Bheemappa H
IIIT, Sricity
bheemappa.h@iiit.in

||https://bheemhh.github.io/||
HPC-25|IIITS

Outline
2

● Course Details
● Syllabus
● Evaluation Plan
● Reference Materials
● SUPERCOMPUTERS

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

● What is computer How Data Represented?

Program
●

○ Algorithm and data structure.

■ Binary
○ Programming Language: Standard ● Base2/0-1/bit
Notation for writing programs
■ Example : C, Java, Intel assembly
language
● Machine Language(IA32).
○ Program translators : Compiler -
GCC
■ What does GCC do?

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Improving of Computer Performance :

● Innovative manufacturing and advancement of VLSI.

● Exploitation of some form parallelism
● ILP
○ Pipelining (IE=F/D/E/M/)
○ Out of order execution
○ Superscalar
● Thread Level Parallelism
● Multiple thread on one processor
● SMT- Hardware Multithreading.
● Multithreading on Multiple processor.
● Example : Loop
● Process Level Parallelism
○ Different processes can be executed in parallel on Multiple process
■ SMP
■ DSM

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Syllabus
5

Module 1: Introduction: Motivating Parallelism, Scope of Parallel ComputingI, Introduction to HPC: Parallel Programming
Platforms; Implicit Parallelism: Trends in Microprocessor Architectures, Limitations of Memory System Performance, Dichotomy
of Parallel Computing Platforms, Physical Organization of Parallel Platforms; . (6- Hrs )
Module 2: Parallel Computer Memory Architectures:Measures of Parallel Algorithms;Analytical Modeling of Parallel Programs:
Sources of Overhead in Parallel Programs, Performance Metrics for Parallel Systems, the Effect of Granularity on Performance;
Parallel Platforms: Models (SIMD,MIMD, SPMD) , Communication (Shared Address Space vs. Message Passing) (7-Hrs)
Module 3: Thread Basics: Why Threads? The POSIX Thread API,Thread Creation and Termination, Synchronization Primitives in
Pthreads,Controlling Thread and Synchronization Attributes, Thread Cancellation, Composite Synchronization Constructs (6-
hrs)
Module 4: Distributed Memory Parallel Programming –Tips for Designing Asynchronous Programs,OpenMP: A Standard for
Directive Based Parallel Programming. (6-Hrs )
Module 5 : The Building Blocks: Send and Receive Operations, MPI: the Message Passing Interface, Topologies and Embedding,
Overlapping Communication with Computation, Collective Communication and Computation Operations, Groups and
Communicators. (7-Hrs )
Module 5 :The Age of Parallel Processing, Central Processing Units, The Rise of GPU Computing, A brief history of GPUs, Early
GPU computing; CUDA: What is CUDA architecture, using the CUDA architecture, Applications of CUDA, Medical Imaging,
Computational Fluid Dynamics, Environmental Science; Introduction to CUDA C: A First
Program, Hello world, A kernel call, Passing parameters, Querying devices, using device properties; Parallel Programming in
CUDA C:CUDA parallel programming, Summing vectors, A fun example. (7-Hrs ) HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

TEXT BOOKS
6

● John Hennessy and David Patterson, Computer Architecture - A Quantitative

Approach. 5ed. Morgan Kaufmann.
● Ananth Grama, Vipin Kumar, Anshul Gupta, George Karypis, Introduction to
Parallel Computing, Addison-Wesley, 2003
● Wen-Mei W Hwu, David B Kirk, Programming Massively Parallel Processors A
Hands-on Approach, Morgann Kaufmann, 3e.
● William Dally and Brian Towles. Principles and Practices of Interconnection
Networks, MK, 2004

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

EVALUATION PLAN
7

Tentative Evolution Plan EVALUATION PLAN :

● Mid Exam = 25%
● End Semester Exam = 45%
● Project = 20%
● Scheduled Quiz 10% = 10%

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Week-1:Class-2

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
9

● Introduction
● Introduction to HPC
● Motivating Parallelism
● Scope of Parallel Computing
● Parallel Programming Platforms
○ Implicit Parallelism: Trends in Microprocessor Architectures
○ Limitations of Memory System Performance
● Dichotomy of Parallel Computing Platforms
○ Control Structure of Parallel Platforms
○ Communication Model of Parallel Platforms
● Physical Organization of Parallel Platforms
○ Architecture of an Ideal Parallel Computer
○ Interconnection Networks for Parallel Computers

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
10

● Introduction to HPC:
● Motivating Parallelism
● Scope of Parallel Computing
● Parallel Programming Platforms
○ Implicit Parallelism: Trends in Microprocessor Architectures
○ Limitations of Memory System Performance
● Dichotomy of Parallel Computing Platforms
○ Control Structure of Parallel Platforms
○ Communication Model of Parallel Platforms
● Physical Organization of Parallel Platforms
○ Architecture of an Ideal Parallel Computer
○ Interconnection Networks for Parallel Computers

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Historical Background
11

● Two Major Stages of Development

○ Mechanical : before 1945
○ Electronic : after 1945
■ 1st Gen 1945-54 : Vacuum Tubes/assembly language/IBM 701 )
■ 2nd Gen 1955-64 : Transistor/compilers/IBM 7090)
■ 3rd Gen 1965- 97: IC-SSI,MSI,I/Multiprogramming/time sharing/IBM
360/70
■ 4th Gen - 197-90 : IC-VLSI/Multiprocessor/HLL/Parallel Processing(home))
■ 5th Gen-1991- Till : (IC-ULSI/Parallel Processing (hetro)/Cray MPP)

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Introduction to HPC
12

● Elements of Modern Computer

○ Hardware
○ System Software
○ Application Programs
■ Functionality of Processor is characterised by Instruction Set.
■ What is Architecture and Organization.
● Architecture : Short form of ISA.(Programmer view of Processor)
○ Data transfer,Add,Mul,sub etc. , register, instruction set etc.
● Organization: High level Design,: Caches , Pipeline , controlling Unit.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

● What is high performance computing (HPC)?

○use of the most efficient algorithms on computers capable of the highest
performance to solve the most demanding problems
○ HPC main resources are : CPU,RAM, GPU
● Why HPC?
○ Large problems – spatially/temporally
■ Large problems – spatially/temporally
● Weather forecasting; protein folding; turbulence simulations/CFD;
aerospace structures; Full-body simulation/
● Earthquake simulation.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Key characteristics of HPC

● Parallel Processing
● Handling Large Datasets
● Complex Simulations
“HPC refers to techniques, algorithms, and methodologies used to achieve high
computational performance”
● HPC systems include:
● Supercomputers, Distributed Computing, Specialized Hardware's.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

SUPERCOMPUTERS
15

● The term is commonly applied to the fastest high-performance systems

available at any given time.
● Computers are used Primarily for scientific and engineering work requiring
exceedingly high-speed computations.
● Top 500 Super computer: https://en.wikipedia.org/wiki/TOP500
● Measured in quadrillions of 64-bit
floating point operations per
second.
● The speed of floating-point
operations, commonly measured in
terms of FLOPS,
● Floating point operations per
second (FLOPS, flops or flop/s) is
a measure of computer
performance,

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

How does HPC systems differ from a regular computer?

● HPC ● Nodes in HPC :

○ Major components ○ Two types of nodes on a cluster
■ CPU,RAM,GPU ■ login nodes (also known as
○ Several computers connected head or submit nodes).
together in a network (forming a ■ compute nodes (also known
HPC cluster). Each of these as worker nodes).
computers is referred to as a node ● Job Scheduler
in the network. ○ Users do not have direct access to the compute
○ HPC clusters is to run resource- nodes and instead submitting jobs via a job
scheduler.
intensive and/or parallel task ○ scheduler’s role is to manage all these jobs

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

High Performance Computers

● In 1980s
○ 1x106 Floating Point Ops/sec (Mflop/s)
○ Scalar based
● In 1990s
○ 1x109 Floating Point Ops/sec (Gflop/s)
○ Vector & Shared memory computing
● Today
○ 1x1012 Floating Point Ops/sec (Tflop/s)
○ Highly parallel, distributed processing, message passing

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Computer Technology
18

● Performance improvements:
○ Improvements in semiconductor technology
■ Feature size, clock speed
○ Improvements in computer architectures
■ Enabled by HLL compilers, UNIX
■ Lead to RISC architectures
○ Together have enabled:
■ Lightweight computers
■ Productivity-based managed/interpreted programming languages
■ SaaS, Virtualization, Cloud
○ Applications evolution:
■ Speech, sound, images, video, “augmented/extended reality”, “big data”
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Moore’s Law
19

Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor

density of semiconductor chips would double roughly every 18 months.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
20

● Introduction to HPC:
● Motivating Parallelism, Scope of Parallel Computing,
● Parallel Programming Platforms
○ Implicit Parallelism: Trends in Microprocessor Architectures
○ Limitations of Memory System Performance
● Dichotomy of Parallel Computing Platforms
○ Control Structure of Parallel Platforms
○ Communication Model of Parallel Platforms
● Physical Organization of Parallel Platforms
○ Architecture of an Ideal Parallel Computer
○ Interconnection Networks for Parallel Computers

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Scope of Parallel Computing Applications

● Applications in Engineering and Design.

● Scientific Applications
○ evolution of galaxies, thermonuclear processes, and the analysis of
extremely large datasets from telescopes
● Commercial Applications
○ optimizing business and marketing decisions
● Applications in Computer Systems
○ Network intrusion detection, cryptography
● Applications in Data Science
○ Google File System, Map-reduce

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Introduction to Parallel Computing

● What is Parallel Computing?

○ “Processing of multiple tasks simultaneously on multiple processor is
called parallel processing”.
● Concurrency is when two tasks can start, run, and complete in
overlapping time periods.
● Parallelism refers to techniques to make programs faster by
performing several computations at the same time.

● Advantages of Parallel Computing

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Motivating Parallelism
23

● The role of parallelism in accelerating computing speeds has been

recognized for several decades.
● Development of parallel software has traditionally been thought of
as time and effort intensive.
● Its role in providing multiplicity of datapaths and increased access
to storage elements has been significant in commercial applications

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Motivating Parallelism
24

● Computational Speed Argument

○ The logical recourse is to rely on parallelism, both implicit and explicit
○ Most serial (or seemingly serial) processors rely extensively on implicit
parallelism.
● Memory/Disk Speed Argument
○ Mismatch in speeds causes significant performance bottlenecks.
○ Parallel platforms provide increased bandwidth to the memory system
■ Principles of locality of data reference and bulk access,
● Data Communication Argument :
○ Vision of the Internet as one large computing platform has emerged.
○ Any analyses on this data must be performed over the network using
parallel techniques
The emergence of standardized parallel programming environments, libraries, and hardware
have significantly reduced time to (parallel) solution

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Implicit Parallelism: Trends in Microprocessor Architectures

● Conventional architectures coarsely comprise All microprocessors

of a processor, memory system, and the currently available rely on
Datapath. parallelism to varying
○ significant performance bottlenecks degrees. These are
● Microprocessor clock speeds gain and High generally hidden from the
programmer
levels of device integration
● How best to utilize these resources?
○ Multiple functional units and execute multiple Implicitly parallelism
instructions in the same cycle. ○ Pipelining
○ VLIW Processor
Implicitly parallel if its compiler or interpreter can
and etc
recognize opportunities for parallelization and
implement them without being told to do so. HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Pipelining
26

Pipelining

1. Pipelining overlaps various stages of instruction execution to

achieve performance
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Superscalar Execution
27

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Superscalar Execution
28

• True Data Dependency:

• The result of one operation is an input to the next.
• Resource Dependency: Two operations require the same resource.
• Branch Dependency: Scheduling instructions across conditional branch statements
cannot be done deterministically a-priori.
• The scheduler, a piece of hardware looks at a large number of instructions in an
instruction queue and selects appropriate number of instructions to execute
concurrently based on these factors.
• The complexity of this hardware is an important constraint on superscalar processors.

• Due to limited parallism in typical instruction traces, dependencies, or the inability of the
scheduler to extract parallelism, the performance of superscalar processors is eventually
limited

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

29
Very Long Instruction Word (VLIW) Processors
1. VLIW processors rely on compile time analysis to identify and bundle
together instructions that can be executed concurrently
2. The instructions are packed and dispatched together, and thus the name
very long instruction word (Intel IA64).
3. Compiler has a bigger context from which to select co-scheduled
instructions.
4. Compilers, however, do not have runtime information such as cache
misses. Scheduling is, therefore, inherently conservative
VLIW performance is highly dependent on the compiler. A

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Current Trends in Architecture

● Cannot continue to leverage Instruction-Level parallelism (ILP)

○ Single processor performance improvement ended in 2003
● New models for performance:
○ Data-level parallelism (DLP)
○ Thread-level parallelism (TLP)
○ Request-level parallelism (RLP)
● These require explicit restructuring of the application

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Limitations of Memory System Performance

Typical computations only run at 10-50% of peak CPU utilization

because of memory bottlenecks. The key question here is how to
connect a 50 ns latency memory to a processor that runs a 0.5 ns clock

Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM

with a latency of 100 ns (no caches). Assume that the processor has two
multiply-add units and is capable of executing four instructions in each cycle
of 1 ns.
The following observations follow:
• The peak CPU rating is 4GFLOPS (floating-point operations per second)
• What is the peak speed of computation?
10MFLOPS - Since the memory latency is 100 cycles
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Limitations of Memory System Performance

● Limitations of Memory System Performance

○ Bandwidth and Latency

● Improving Effective Memory Latency Using Caches

○ memory acts as a low-latency high-bandwidth storage
○ piece of data is repeatedly used, the effective latency of this memory system
can be reduced by the cache.
○ Cache hit ratio achieved by a code on a memory system often determines its
performance.
○ Exploiting spatial and temporal locality in applications is critical for amortizing
memory latency and increasing effective memory bandwidth.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

HW
34

• Find the historical trends in processor clock speeds.

• Plot the data and compare the observed growth to the limits imposed by thermal and physical
constraints.
• Investigate the relationship between the number of cores in processors and their peak FLOPS rate
over time. Plot and analyze the trend, discussing its implications for parallel computing.
• Compare the computational requirements of modern machine learning models, such as GPT or
image generation systems, over the past five years. Plot the growth in FLOPS required for training
and inference.
• Explore the trends in power consumption of supercomputers over the past decade. Compare the
energy efficiency improvements of the Green500 leaders with those of the Top500 leaders.
• Investigate the challenges of cooling in high-performance computing systems. How do modern
techniques like liquid cooling and immersion cooling compare in terms of efficiency and cost

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

1. Compare the capabilities of existing quantum computing platforms

(e.g., IBM Quantum, Google Sycamore, D-Wave) in terms of qubits,
error rates, and computational performance

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
36

● Motivating Parallelism, Scope of Parallel Computing,

● Introduction to HPC:
● Parallel Programming Platforms
○ Implicit Parallelism: Trends in Microprocessor Architectures
○ Limitations of Memory System Performance
● Dichotomy of Parallel Computing Platforms
○ Control Structure of Parallel Platforms
○ Communication Model of Parallel Platforms
● Physical Organization of Parallel Platforms
○ Architecture of an Ideal Parallel Computer
○ Interconnection Networks for Parallel Computers

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Explicitly Parallel Platforms

● Dichotomy of Parallel Computing Platforms

○ An explicitly parallel program must specify concurrency and interaction
between concurrent subtasks.
■ Control structure
■ Parallelism can be expressed at various levels of granularity - from instruction
level to processes
● Control Structure of Parallel Programs
○ Processor units either operate under the centralized control of a single
control unit or work independently.
○ Single control unit that dispatches the same instruction to various
processors (that work on different data)

\ HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Parallelism
38

● Classes of parallelism in applications:

○ Data-Level Parallelism (DLP)
○ Task-Level Parallelism (TLP)
● Classes of architectural parallelism:
○ Instruction-Level Parallelism (ILP)
○ Vector architectures/Graphic Processor Units (GPUs)
○ Thread-Level Parallelism
○ Request-Level Parallelism

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Control Structure of Parallel Programs

○ Single instruction stream,

multiple data stream (SIMD)
■ If there is a single control
unit that dispatches the
same instruction to various
processors
○ multiple instruction stream,
multiple data stream (MiMD)
■ If each processor has its own
control control unit, each
processor can execute
different instructions on
different data items

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
Parallelism from single instruction on multiple HPC-25|IIITS

processors(SIMD)
41

➢ a single control unit dispatches ● Example

instructions to each processing unit. ○ add instruction is dispatched to all
➢ same instruction is executed processors and executed
synchronously by all processing units. concurrently by them.
➢ MMX units in Intel processors and DSP
chips
➢ Structured computations on parallel
data structures such as arrays, often it
is necessary to selectively turn off
operations on certain data items.
➢ Conditional execution can be
detrimental to the performance of
SIMD processors and therefore must
be used with care.
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Multiple Instruction stream, Multiple Data stream (MIMD)

➢ each processing element is capable of

executing a different program
independent of the other processing
elements
➢ Each of the multiple programs can be
inserted into one large if-else block
with conditions specified by the task
identifiers.
■ A simple variant of this model,
called the single program
multiple data
(SPMD) model.
■ An example of MIMD system is Intel Xeon Phi,

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

● SIMD computers require less hardware than MIMD computers

because they have only one global control unit.
● SIMD computers require less memory because only one copy of the
program needs to be stored.
● While MIMD is complex in terms of complexity than SIMD.
● MIMD is more efficient in terms of performance than SIMD.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
44

● Motivating Parallelism, Scope of Parallel Computing,

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Communication Model of Parallel Platforms

● Data exchange between parallel tasks

○ accessing a shared data space (shared address-space machines or
multiprocessors)
○ Exchanging messages(message passing platforms or multicomputers.
● Shared-Address-Space Platforms
○ common data space that is accessible to all processors.
○ Processors interact by modifying data objects stored in this shared-address-
space.
○ If the time taken by a processor to access any memory word in the system
global or local is identical, the platform is classified as a uniform memory
access (UMA), else, a non uniform memory access (NUMA) machine

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

NUMA and UMA Shared-Address-Space Platforms

● NUMA and UMA

architectures are
defined only in terms of
memory access times
and not cache access
times.
● Sun Ultra HPC servers-
class of NUMA
multiprocessors.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

NUMA and UMA Shared-Address-Space Platforms

Presence Global Memory (RAM) : Presence of caches on processors

● presence of a global memory space ● raises the issue of multiple copies of a

makes (parallel) programming such single memory word being manipulated by
two or more processors at the same time.
platforms much easier.
● Ensuring that concurrent operations
● operations require mutual
on multiple copies of the same
exclusion for concurrent accesses.
memory word have well-defined
● Read/write -interactions are, semantics (cache coherence)
however, harder to program than ● address translation mechanism that
the read-only interactions, locates a memory word in the
system.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Message-Passing Platforms
49

● These platforms comprise of a set of processors and their own

(exclusive) memory.
● Instances of such a view come naturally from clustered
workstations and non-shared-address-space multicomputers.
● These platforms are programmed using (variants of) send and
receive primitives.
● Libraries such as MPI and PVM provide such primitives

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Message Passing vs. Shared Address Space Platforms

● Message passing requires little hardware support, other than a

network.
● Shared address space platforms can easily emulate message
passing. The reverse is more difficult to do (in an efficient manner).

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
51

● Motivating Parallelism, Scope of Parallel Computing,

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Physical Organization of Parallel Platforms

Architecture of an Ideal Parallel Computer

• parallel random access machine (PRAM) PRAMs subclasses

• PRAMs consist of p processors and a Exclusive-read, exclusive-write (EREW)
global memory of unbounded size that PRAM.
is uniformly accessible to all Concurrent-read, exclusive-write
processors. (CREW) PRAM.
• Processors share a common clock but Exclusive-read, concurrent-write
may execute different instructions in (ERCW) PRAM.
each cycle
Concurrent-read, concurrent-write
• Simultaneous memory accesses are (CRCW) PRAM
handled, PRAMs can be divided into
four subclasses

concurrent read access does not concurrent write access to a memory

create any semantic discrepancies location requires arbitration
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Architecture of an Ideal Parallel Computer

What does concurrent write mean ?

• The most frequently used protocols are as follows:

• Common, in which the concurrent write is allowed if all the values that the
processors are attempting to write are identical.
• Arbitrary, in which an arbitrary processor is allowed to proceed with the write
operation and the rest fail.
• Priority, in which all processors are organized into a predefined prioritized
list, and the processor with the highest priority succeeds and the rest fail.
• Sum, in which the sum of all the quantities is written (the sum-based write
conflict resolution model can be extended to any associative operator defined
on the quantities being written)

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Architectural Complexity of the Ideal Model

• Processors and memories are connected via switches.

• System of p processors and m words, the switch complexity is O(mp).
• For a reasonable memory size, constructing a switching network of this
complexity is very expensive.
• Thus, PRAM models of computation are impossible to realize in practice
• Solution :
• Interconnection Networks for Parallel Computers
• A blackbox view of an interconnection network consists of n inputs and m outputs

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M1:
55

● Motivating Parallelism, Scope of Parallel Computing,

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Interconnection Networks for Parallel Computers

56
• Interconnects are made of switches and link
• Interconnects are classified as static or dynamic.
• Static networks consist of point-to-point communication links among
processing nodes and are also referred to as direct networks.
• Dynamic networks are built using switches and communication links.
Dynamic networks are also referred to as indirect networks

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Interconnection Networks Components

• Switches :
• a fixed number of inputs to outputs; Degree of the switch : total number of ports
• Network Interfaces:
• Processors talk to the network via a network interface.
• connectivity between the nodes and the network
• Network Topologies:
• Bus-Based Networks :
• Buses are also ideal for broadcasting information among nodes
• transmission medium is shared, there is little overhead associated with broadcast
compared to point-to-point message transfer
• Reducing shared-bus bandwidth using caches

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Interconnection Networks Components

• Network Topologies:

• Crossbar Networks :
• non-blocking network
• Connection of a processing node to a
memory bank does not block the
connection of any other processing
nodes to other memory bank

• Multistage Networks
• Tree-Based Networks

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel
Algorithms

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

• Sources of Overhead in Parallel Programs
• Performance Metrics for Parallel Systems,
• The Effect of Granularity on Performance;
• Parallel Platforms: Models (SIMD,MIMD, SPMD)
• Communication (Shared Address Space vs. Message Passing)

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Sources of Overhead in Parallel Programs

• If I use two processors, shouldn't my program run twice as fast?

• No - a number of overheads, including wasted computation, communication, idling, and
contention cause degradation in performance.

total time collectively spent :Tall = pTP

• Performance Metrics for Parallel Systems: Execution Time :

• time that elapses from the moment the first processor starts to the moment the last
processor finishes execution.
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Five Performance Metrics for Parallel Systems

• Execution Time
• Overhead
• Speedup
• Efficiency
• Cost.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Measures of Parallel Algorithms

• Execution Time • Interprocess Interaction

• Sequential algorithm : Evaluated in

• to interact and communicate data (e.g., intermediate
results)
terms of its execution time, • Idling :
expressed as a function of the size • Due to many reasons such as load imbalance,
of its input. synchronization, and presence of serial
components in a program
• Parallel Algorithm: Input size but also on • Excess Computation :
the number of processing elements used, • some computations are performed multiple
and their relative computation and times on different processing elements.
interposes communication speeds.
• An algorithm must therefore be analyzed
in the context of the underlying platform.

A parallel system is a combination of a parallel algorithm

and an underlying platform.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Sources of Overhead in Parallel Programs

• a number of overheads, including wasted computation, communication, idling, and

contention cause degradation in performance.

total time collectively spent :Tall = pTP

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Performance Metrics for Parallel Systems

• Performance Metrics #3:Speedup

• What is the benefit from parallelism
• Speedup (S) is the ratio of the time taken to solve a problem on a
single processor to the time required to solve the same problem on a
parallel computer with p identical processing elements.
• computing speedup, we always consider the best sequential program as
the baseline

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Speedup Example : Parallel bubble sort

• serial time for bubblesort is 150 second
• The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40
seconds
• The speedup would appear to be 3.75. (?)
• But is this really a fair assessment of the system?

• What if serial quicksort only took 30 seconds? In this case, the speedup is
30/40 = 0.75. This is a more realistic assessment of the system.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Performance Metrics for Parallel Systems

• Performance Metrics #4:Efficiency

• Efficiency is a measure of the fraction of time for which a processing
element is usefully employed; it is defined as the ratio of speedup to
the number of processing elements.
• In an ideal parallel system, speedup is equal to p and efficiency is
equal to one.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Performance Metrics #5: Cost

• Product of parallel execution time and number of Pes
• The total amount of time by all PEs to solve the problem
• Cost-optimal : parallel cost ≅ serial cost

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
Tutorial : Edge detection on HPC-25|IIITS

images
72

1. Given an n x n pixel image, the problem of detecting edges

corresponds to applying a3x 3 template to each pixel.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

What is Granularity
74

• The no and Size of task into the which problem size is decomposed
determines granularity of the decomposition.
• Smaller granularity
• Larger Granularity
• Scale down

What is the cost of parallel add n

What is the cost of serial add n no??
no??

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

The Effect of Granularity on Performance

• Scaling down :
• Often, using fewer processors improves performance of parallel
systems.
• Using fewer than the maximum possible number of processing
elements to execute a parallel algorithm is scaling down.
• number of processing elements decreases by a factor of n/p and the
computation at each processing element increases by a factor of n/p.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Adding n numbers on p processing elements

• Adding n numbers on
p processing elements
• for n = 16
and p = 4.
• Virtual processing
element i is
simulated by the
physical processing
element labeled i
mod p.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

1. The first log p of the

log n steps of the
original algorithm are
simulated in (n/p)
log p steps
in p processing element

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

The Addition and

communication is constant
time ts+tw.

Hence the n-processor

machine Tp is time Θ(log n).

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

What is the speedup and cost, efficiency ?

S= (Ts/Tp)
E= S/P

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Cost: An Example
80

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

1. Adding n numbers cost-optimally:

1. its cost is Θ(n + p log p). As long as n = Ω(p log p), the
cost is Θ(n), which is the same as the serial runtime.
Hence, this parallel system is cost-optimal.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Scaling Characteristics of Parallel Programs

• For a given problem size (i.e., the value of TS remains constant), as

we increase the number of processing elements, To increases.
• The overall efficiency of the parallel program goes down. This is the
case for all parallel programs.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

1. Granularity of Task Decomposition

• Fine-grained decomposition: large number of small tasks
• Coarse-grained decomposition: small number of large tasks
• Degree of Concurrency: # of tasks that can execute in parallel

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Fine-grained parallelism
• – a program is broken down to a large number of small tasks.
• – Tasks are assigned individually to many processors.
• The work is evenly distributed among the processors.
• Increases the communication and synchronization overhead
• Fine-grained parallelism is best exploited in architectures which support fast
communication
• Shared memory architecture are most suitable for fine-grained parallelism?
Why ?

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Coarse-grained parallelism
• a program is split into large tasks. a
large amount of computation takes
place in processors.
• Load imbalance
• Certain tasks process the bulk of the
data while others might be idle.
• Low communication and
synchronization overhead.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Example :Consider a 10*10 image that needs to be processed, given that,

processing of the 100 pixels is independent of each other. Assume there
are
• Fine-grained parallelism:
• 100 processors that are responsible for processing the 10*10 image.
• 100 processors can process the 10*10 image in 1 clock cycle
• Each processor is working on 1 pixel of the image
• Medium-grained parallelism:
• 25 processors processing the 10*10 image.
• The processing of the image will now take 4 clock cycles.
• Coarse-grained parallelism
• we reduce the processors to 2, processing will take 50 clock cycles

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Finding the best grain size, depends on a number of factors and varies
greatly from problem-to-problem.
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Building Granularity: Example

• Consider the problem of adding n numbers on p processing elements
such that p < n and both n and p are powers of 2.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

• Sources of Overhead in Parallel Programs
• Performance Metrics for Parallel Systems,
• The Effect of Granularity on Performance;
• Taxonomy Parallel Platforms: Models (SIMD,MIMD, SPMD)
• Communication (Shared Address Space vs. Message Passing)

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Taxonomy of Parallel computer

• SISD: single instruction and single

data stream: uniprocessor
• • MISD: no commercial
multiprocessor: imagine data going
through a pipeline of execution
engines
• • SIMD: vector architectures: lower
flexibility
• • MIMD: most multiprocessors today:
easy to construct withoff-the-shelf
computers, most flexibility

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• MIMD machines are broadly categorized based on the way PEs are coupled to the
main memory.
• Shared-memory MIMD
• Tightly coupled multiprocessor systems
• Connected to a single global memory and they all have access to it.
• Symmetric Multi-Processing.
• Less likely to scale

• Distributed-memory MIMD
• Loosely coupled multiprocessor systems
• All PEs have a local memory
• Communication between PEs in this model takes place through the interconnection
network.
• – HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

M2 : Measures of Parallel Algorithms

• Analytical Modeling of Parallel Programs:

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Shared Memory(UMA/NUMA)
95

• Global address space provides a user-friendly programming

perspective to memory
• Data sharing between tasks is both fast and uniform due to the
proximity of memory to CPUs
• Primary disadvantage is the lack of scalability between memory
and CPUs.
• coherent systems, geometrically increase traffic associated with
cache/memory management.
• Programmer responsibility for synchronization constructs that
insure "correct" access of global memory

Images : https://cvw.cac.cornell.edu/parallel/shared HPC-22||https://bheemhh.github.io/

||https://bheemhh.github.io/||
HPC-25|IIITS

Distributed Memory
96

• Require a communication network to connect inter-processor memory.

• no concept of global address space across all processors.
• Changes it makes to its local memory have no effect on the memory of other
processors.
• Memory is scalable with number of processors.
• The programmer is responsible for many of the details associated
with data communication between processors
• Non-uniform memory access (NUMA) times

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Scalability of Parallel Systems

• Programs are designed and tested for smaller problems on fewer

processing elements. However, the real problems these programs
are intended to solve are much larger, and the machines contain
larger number of processing elements.

Scalability is a reflection of the system's

ability to efficiently utilize the increased
processing resource

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Scaling Characteristics of Parallel Programs

• Scaling Characteristics of Parallel

Programs
• For a given problem size (i.e., the
value of TS remains constant), as we
increase the number of processing
elements, To increases.
• The overall efficiency of the parallel
program goes down. This is the case
for all parallel programs.

• The total overhead function To is an increasing function of p.

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Example
99
Consider the problem of adding n numbers on p processing
elements.

Assuming unit time for adding two numbers,

• The first phase (local summations) of the algorithm takes roughly n/p time.
• The second phase involves log p steps with a communication and an addition
at each step. If a single communication takes unit time as well, the time for
this phase is 2 log p.
Therefore, we can derive parallel time, speedup, and efficiency as:

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Speedup and Efficiency of adding n numbers on p processing

100

What is the speedup and efficiency

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

101

• A parallel architecture is said to be scalable if it can be

expanded (reduced) to a larger (smaller) system with a linear
increase (decrease) in its performance (cost)

A scalable system is capable of increasing its speed in proportion to the increase

in the number of processors

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
Effect of increasing the problem size keeping
HPC-25|IIITS
the number of processing elements constant.

102

a given problem size, as we increase the

number of processing elements, the overall
efficiency of the parallel system goes down

O-1: efficiency increases if the problem size

is increased keeping the number of
processing elements constant
To maintain efficiency at a fixed
value by simultaneously increasing
Total overhead
the number function Toelements
of processing is a
function of both
and the size problem
of the problemsizeisTS
and the number
exhibited of processing
by many parallel systems.
elements p systems scalable
We call such
parallel systems
HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

Observation on Scaling Characteristics of Parallel Programs

103

• For a given problem size, as we increase the number of processing

elements, the overall efficiency of the parallel system goes down. This
phenomenon is common to all parallel systems.
• In many cases, the efficiency of a parallel system increases if the problem
size is increased while keeping the number of processing elements
constant.
• Scalability of a parallel system is a measure of its capacity to increase
speedup in proportion to the number of processing elements

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

T-02: Speedup and Efficiency

104

1. Suppose a task has:

• f=0.75(75% of the task is parallelizable),
• P=4P = 4P=4 processors.
What is the speedup is: 2.29

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

105

1. Example 1: Effect of Increasing Processors

1. A task consists of:80% of the task is parallelizable and there P=2,4,and 8
processors. Calculate the speedup for each case.
2. Maximum Speedup
• A task has 90% parallelizable and what maximum speedup ?
3. Parallel Efficiency:
4. Identify Bottleneck
4. A task shows a speedup of 3.23 when P=4, determine amount task is
parallelizable

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

106

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

• Amdahl’
107

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||
HPC-25|IIITS

108

HPC-22||https://bheemhh.github.io/
||https://bheemhh.github.io/||

Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
Unit 1
No ratings yet
Unit 1
31 pages
HPC Course for MSc Students
No ratings yet
HPC Course for MSc Students
22 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
No ratings yet
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
179 pages
Module 1-Topic 1
No ratings yet
Module 1-Topic 1
36 pages
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
100% (2)
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
59 pages
High Performance Computing: What Is It Used For and Why?
No ratings yet
High Performance Computing: What Is It Used For and Why?
19 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
No ratings yet
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
29 pages
002 NEIC HPC Intro Riedel Part Two v1
No ratings yet
002 NEIC HPC Intro Riedel Part Two v1
70 pages
HPCfirstlecture
No ratings yet
HPCfirstlecture
4 pages
Good
No ratings yet
Good
39 pages
Unit 4-Mca
No ratings yet
Unit 4-Mca
29 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Lecture 1 Introduction 1
No ratings yet
Lecture 1 Introduction 1
49 pages
Introduction To HPC and Current Usage in HEP
No ratings yet
Introduction To HPC and Current Usage in HEP
33 pages
Parallel Computers Architecture and Programming V. Rajaraman
No ratings yet
Parallel Computers Architecture and Programming V. Rajaraman
56 pages
HPC Tools and Technologies For Web Programming
No ratings yet
HPC Tools and Technologies For Web Programming
33 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
100% (2)
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
506 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
134 pages
Lecture-4 Parallel hardware-Jameel-NNL
No ratings yet
Lecture-4 Parallel hardware-Jameel-NNL
39 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
HPC Lecture 3
No ratings yet
HPC Lecture 3
139 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Handbook HPC 23-24
No ratings yet
Handbook HPC 23-24
18 pages
HPC Seminar Report
No ratings yet
HPC Seminar Report
2 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
High-Performance Computing Guide
No ratings yet
High-Performance Computing Guide
13 pages
Mca 4
No ratings yet
Mca 4
61 pages
Parallel Programming For Modern High Performance Computing Systems Czarnul Latest PDF 2025
No ratings yet
Parallel Programming For Modern High Performance Computing Systems Czarnul Latest PDF 2025
84 pages
Parallel Computing LessonPlan
No ratings yet
Parallel Computing LessonPlan
10 pages
HPC Intro Ad OS
No ratings yet
HPC Intro Ad OS
44 pages
EAS 520 UmassD Syllabus Sheer
No ratings yet
EAS 520 UmassD Syllabus Sheer
2 pages
SPPU - BE - HPC - Unit 1 Notes
67% (3)
SPPU - BE - HPC - Unit 1 Notes
47 pages
Unit 1
No ratings yet
Unit 1
52 pages
CC Notes I Unit
No ratings yet
CC Notes I Unit
31 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
High Performance Computing
100% (1)
High Performance Computing
294 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
01 - Lecture Intro To HPC
No ratings yet
01 - Lecture Intro To HPC
62 pages
L1.1 HPC Environment
No ratings yet
L1.1 HPC Environment
27 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
COSC 4101 Parallel and Distributed Computing Final
100% (1)
COSC 4101 Parallel and Distributed Computing Final
4 pages
CC 1
No ratings yet
CC 1
11 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
17CS72 Mod 1 PPT
No ratings yet
17CS72 Mod 1 PPT
206 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
High Performance Computing Lecture 2 Parallel Programming With MPI Pub
No ratings yet
High Performance Computing Lecture 2 Parallel Programming With MPI Pub
50 pages
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
No ratings yet
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
30 pages
CC Unit1 Notes Compressed
No ratings yet
CC Unit1 Notes Compressed
41 pages
Programming Using The Message-Passing Paradigm
No ratings yet
Programming Using The Message-Passing Paradigm
47 pages
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
No ratings yet
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
27 pages
A Search: F (N) Estimated Cost of The Best Path That Continues From N To A Goal
No ratings yet
A Search: F (N) Estimated Cost of The Best Path That Continues From N To A Goal
20 pages
Input 1
No ratings yet
Input 1
1 page
18 Reducibility
No ratings yet
18 Reducibility
57 pages
Problem Solving by Searching
No ratings yet
Problem Solving by Searching
40 pages
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
No ratings yet
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
23 pages
19 Reduction Computation History PCP
No ratings yet
19 Reduction Computation History PCP
25 pages
17 Decidabi - Ity
No ratings yet
17 Decidabi - Ity
58 pages
16 Turing Machines Variants NTM
No ratings yet
16 Turing Machines Variants NTM
36 pages
20 Properties of RE and R Sets
No ratings yet
20 Properties of RE and R Sets
2 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
Lecture 12
No ratings yet
Lecture 12
13 pages
Chapter 3
No ratings yet
Chapter 3
47 pages
DBMS Concurrency & Serializability
No ratings yet
DBMS Concurrency & Serializability
32 pages
Parallel Processing in OS
No ratings yet
Parallel Processing in OS
53 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
14 pages
Dayananda Sagar Academy of Technology and Management
No ratings yet
Dayananda Sagar Academy of Technology and Management
2 pages
Java Thread Life Cycle Explained
No ratings yet
Java Thread Life Cycle Explained
3 pages
Exploring Latches
No ratings yet
Exploring Latches
14 pages
Deadlocks: Waiting Waiting Not Held This
No ratings yet
Deadlocks: Waiting Waiting Not Held This
32 pages
Chapter One
No ratings yet
Chapter One
40 pages
OS Act 5
No ratings yet
OS Act 5
6 pages
Blockchain Consensus Mechanisms
No ratings yet
Blockchain Consensus Mechanisms
15 pages
Distributed Computing Note
100% (1)
Distributed Computing Note
54 pages
Multiversion Schemes
100% (1)
Multiversion Schemes
2 pages
Operating System Question Bank For ESE
No ratings yet
Operating System Question Bank For ESE
6 pages
Java Multithreading
No ratings yet
Java Multithreading
17 pages
Unit 1 Advanced Process and Thread Management
No ratings yet
Unit 1 Advanced Process and Thread Management
11 pages
Week 8 Deadlock
No ratings yet
Week 8 Deadlock
36 pages
DS Unit-1
No ratings yet
DS Unit-1
9 pages
CPU vs GPU Parallelism Explained
No ratings yet
CPU vs GPU Parallelism Explained
12 pages
Process State Models Explained
No ratings yet
Process State Models Explained
12 pages
3 Mpi
No ratings yet
3 Mpi
42 pages
Cs604 - Operating Systems True/False For Final Term Papers Muhammad Faisal
No ratings yet
Cs604 - Operating Systems True/False For Final Term Papers Muhammad Faisal
2 pages
Using FreeRTOS With The Raspberry Pi Pico
No ratings yet
Using FreeRTOS With The Raspberry Pi Pico
20 pages
High Performance Dynamic Lock-Free Hash Tables and List-Based Sets
No ratings yet
High Performance Dynamic Lock-Free Hash Tables and List-Based Sets
10 pages
OS Process
No ratings yet
OS Process
13 pages
Cs45-Operating Systems: S4 / B.E Cse
No ratings yet
Cs45-Operating Systems: S4 / B.E Cse
25 pages
Deadlock
No ratings yet
Deadlock
41 pages
Java Threads
No ratings yet
Java Threads
85 pages
Introduction To DSM: Unit - III Essay Questions
No ratings yet
Introduction To DSM: Unit - III Essay Questions
21 pages
CPU Scheduling for CS Students
No ratings yet
CPU Scheduling for CS Students
5 pages