[go: up one dir, main page]

0% found this document useful (0 votes)
394 views14 pages

BCS702 Module1 Detailed Notes

The document covers the fundamentals of parallel hardware and software, detailing classifications of parallel computers such as SIMD and MIMD systems, and their respective interconnection networks. It discusses the complexities of parallel software development, including synchronization, shared and distributed memory, and GPU programming. Key challenges like cache coherence and race conditions are also highlighted, along with solutions for effective parallel programming.

Uploaded by

KIRAN KUMAR D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
394 views14 pages

BCS702 Module1 Detailed Notes

The document covers the fundamentals of parallel hardware and software, detailing classifications of parallel computers such as SIMD and MIMD systems, and their respective interconnection networks. It discusses the complexities of parallel software development, including synchronization, shared and distributed memory, and GPU programming. Key challenges like cache coherence and race conditions are also highlighted, along with solutions for effective parallel programming.

Uploaded by

KIRAN KUMAR D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

BCS702 – Module 1 Notes

Parallel Hardware and Parallel Software


Table of Contents
1. Parallel Hardware
1.1 Classifications of Parallel Computers
1.2 SIMD Systems
1.3 MIMD Systems
1.4 Interconnection Networks
1.5 Cache Coherence
1.6 Shared vs Distributed Memory
2. Parallel Software
2.1 Caveats
2.2 Coordinating Processes/Threads
2.3 Shared Memory
2.4 Distributed Memory
2.5 GPU Programming
1. Parallel Hardware
Parallel hardware enables multiple computations to run simultaneously. It is essential in
modern computing to improve performance, especially in data-heavy tasks like simulations
and image processing.
1.1 Classifications of Parallel Computers
Flynn’s Taxonomy classifies computers based on the number of instruction and data
streams:

- SISD: Single instruction, single data

- SIMD: Single instruction, multiple data

- MIMD: Multiple instruction, multiple data

Another classification is based on memory access:

- Shared memory: Cores access common memory

- Distributed memory: Each core has its own memory and uses messages to communicate
1.2 SIMD Systems
SIMD (Single Instruction, Multiple Data) systems apply one instruction to many data points
simultaneously.

Used in image processing, simulations, and vector math operations.

Limitation: All data streams must execute the same instruction.

Vector processors and GPUs are examples of SIMD systems.


1.3 MIMD Systems
MIMD systems use independent processors with separate instruction streams and data sets.

Processors run asynchronously, suitable for complex and diverse tasks.

MIMD includes:

- Shared-memory systems (e.g., multicore CPUs)

- Distributed-memory systems (e.g., clusters of computers)


1.4 Interconnection Networks
These connect processors and memory in a parallel system.

Shared-memory systems use buses and crossbars.

Distributed-memory systems use rings, meshes, hypercubes, and omega networks.

Key terms:

- Latency: Delay before data starts transferring

- Bandwidth: Rate at which data transfers


1.5 Cache Coherence
Problem: When multiple processors cache the same variable, updates by one processor may
not be seen by others.

Solutions:

- Snooping: Cores monitor a shared bus for updates

- Directory-based: Each memory block tracks which cores have cached it

False Sharing: Performance issue when different threads update variables in the same cache
line.
1.6 Shared vs Distributed Memory
Shared-memory: Easier to program, but harder to scale due to bus limitations.

Distributed-memory: Harder to program, but scales better with more processors.


2. Parallel Software
Writing software for parallel systems involves more complexity than serial programs.

Programmers need to manage synchronization, communication, and potential errors like


race conditions.
2.1 Caveats
Not all problems can be parallelized.

Some are 'embarrassingly parallel' (e.g., processing independent images), while others
require complex coordination.
2.2 Coordinating Processes/Threads
Threads/processes must be synchronized to avoid errors.

Load balancing and communication minimization are key.

Parallelizing is the act of converting a serial program to run in parallel.


2.3 Shared Memory
Uses threads that access a common memory space.

Thread types:

- Dynamic (created/destroyed as needed)

- Static (created once and reused)

Issues:

- Nondeterminism: Output may vary by run

- Race conditions: Conflicting writes to the same variable

Solutions: Mutexes (locks), semaphores, and monitors.


2.4 Distributed Memory
Processes use message-passing to share data.

Message-passing APIs include send() and receive() functions.

MPI (Message Passing Interface) is the most popular API.

Drawback: Requires significant program restructuring.


2.5 GPU Programming
GPUs are used for parallel tasks like image processing and simulations.

GPU programming is heterogeneous:

- Code runs on both CPU (host) and GPU (device)

- Data must be transferred between host and device

Popular platforms: CUDA and OpenCL

You might also like