0% found this document useful (0 votes)

394 views14 pages

BCS702 Module1 Detailed Notes

The document covers the fundamentals of parallel hardware and software, detailing classifications of parallel computers such as SIMD and MIMD systems, and their respective interconnection networks. It discusses the complexities of parallel software development, including synchronization, shared and distributed memory, and GPU programming. Key challenges like cache coherence and race conditions are also highlighted, along with solutions for effective parallel programming.

Uploaded by

KIRAN KUMAR D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

394 views14 pages

BCS702 Module1 Detailed Notes

Uploaded by

KIRAN KUMAR D

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

BCS702 – Module 1 Notes

Parallel Hardware and Parallel Software

Table of Contents
1. Parallel Hardware
1.1 Classifications of Parallel Computers
1.2 SIMD Systems
1.3 MIMD Systems
1.4 Interconnection Networks
1.5 Cache Coherence
1.6 Shared vs Distributed Memory
2. Parallel Software
2.1 Caveats
2.2 Coordinating Processes/Threads
2.3 Shared Memory
2.4 Distributed Memory
2.5 GPU Programming
1. Parallel Hardware
Parallel hardware enables multiple computations to run simultaneously. It is essential in
modern computing to improve performance, especially in data-heavy tasks like simulations
and image processing.
1.1 Classifications of Parallel Computers
Flynn’s Taxonomy classifies computers based on the number of instruction and data
streams:

- SISD: Single instruction, single data

- SIMD: Single instruction, multiple data

- MIMD: Multiple instruction, multiple data

Another classification is based on memory access:

- Shared memory: Cores access common memory

- Distributed memory: Each core has its own memory and uses messages to communicate
1.2 SIMD Systems
SIMD (Single Instruction, Multiple Data) systems apply one instruction to many data points
simultaneously.

Used in image processing, simulations, and vector math operations.

Limitation: All data streams must execute the same instruction.

Vector processors and GPUs are examples of SIMD systems.

1.3 MIMD Systems
MIMD systems use independent processors with separate instruction streams and data sets.

Processors run asynchronously, suitable for complex and diverse tasks.

MIMD includes:

- Shared-memory systems (e.g., multicore CPUs)

- Distributed-memory systems (e.g., clusters of computers)

1.4 Interconnection Networks
These connect processors and memory in a parallel system.

Shared-memory systems use buses and crossbars.

Distributed-memory systems use rings, meshes, hypercubes, and omega networks.

Key terms:

- Latency: Delay before data starts transferring

- Bandwidth: Rate at which data transfers

1.5 Cache Coherence
Problem: When multiple processors cache the same variable, updates by one processor may
not be seen by others.

Solutions:

- Snooping: Cores monitor a shared bus for updates

- Directory-based: Each memory block tracks which cores have cached it

False Sharing: Performance issue when different threads update variables in the same cache
line.
1.6 Shared vs Distributed Memory
Shared-memory: Easier to program, but harder to scale due to bus limitations.

Distributed-memory: Harder to program, but scales better with more processors.

2. Parallel Software
Writing software for parallel systems involves more complexity than serial programs.

Programmers need to manage synchronization, communication, and potential errors like

race conditions.
2.1 Caveats
Not all problems can be parallelized.

Some are 'embarrassingly parallel' (e.g., processing independent images), while others
require complex coordination.
2.2 Coordinating Processes/Threads
Threads/processes must be synchronized to avoid errors.

Load balancing and communication minimization are key.

Parallelizing is the act of converting a serial program to run in parallel.

2.3 Shared Memory
Uses threads that access a common memory space.

Thread types:

- Dynamic (created/destroyed as needed)

- Static (created once and reused)

Issues:

- Nondeterminism: Output may vary by run

- Race conditions: Conflicting writes to the same variable

Solutions: Mutexes (locks), semaphores, and monitors.

2.4 Distributed Memory
Processes use message-passing to share data.

Message-passing APIs include send() and receive() functions.

MPI (Message Passing Interface) is the most popular API.

Drawback: Requires significant program restructuring.

2.5 GPU Programming
GPUs are used for parallel tasks like image processing and simulations.

GPU programming is heterogeneous:

- Code runs on both CPU (host) and GPU (device)

- Data must be transferred between host and device

Popular platforms: CUDA and OpenCL

PARALLEL PROGRAMMING Module 1
No ratings yet
PARALLEL PROGRAMMING Module 1
20 pages
Parallel Detailed Explanations
No ratings yet
Parallel Detailed Explanations
2 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Chapter2 Part 3
No ratings yet
Chapter2 Part 3
27 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Module1 PP BDS701 Notes
No ratings yet
Module1 PP BDS701 Notes
27 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Serial (Sequential) Vs Parallel Computing - Simple Explanation
No ratings yet
Serial (Sequential) Vs Parallel Computing - Simple Explanation
5 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
33 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Ans PC
No ratings yet
Ans PC
15 pages
Aca
No ratings yet
Aca
13 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Prebook MCAP
No ratings yet
Prebook MCAP
11 pages
LP V Theory and Practical Explanation: o o o o
No ratings yet
LP V Theory and Practical Explanation: o o o o
96 pages
Pda 2
No ratings yet
Pda 2
105 pages
Architecture
No ratings yet
Architecture
67 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Parallel Computing IA1
No ratings yet
Parallel Computing IA1
29 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Module 2
No ratings yet
Module 2
124 pages
Bcs702 Parallel Computing Module 1
100% (3)
Bcs702 Parallel Computing Module 1
35 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
No ratings yet
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
58 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
PC Module1
No ratings yet
PC Module1
28 pages
Seminar
No ratings yet
Seminar
85 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
24 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Quiz Prep
No ratings yet
Quiz Prep
21 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Module-1: Chapter-1 Parallel Computer Models
No ratings yet
Module-1: Chapter-1 Parallel Computer Models
42 pages
PC Module2
No ratings yet
PC Module2
10 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
BCS702 Module 2 Textbook
No ratings yet
BCS702 Module 2 Textbook
13 pages
Multi-Core Architectures and Programming For R-2013 by Krishna Sankar P., Shangaranarayanee N.P.
No ratings yet
Multi-Core Architectures and Programming For R-2013 by Krishna Sankar P., Shangaranarayanee N.P.
8 pages
Intro to Serial & Parallel Computing
No ratings yet
Intro to Serial & Parallel Computing
39 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
DS ML230
No ratings yet
DS ML230
2 pages
Research Paper Chapter1 510final
No ratings yet
Research Paper Chapter1 510final
41 pages
The Official Ubuntu Book Matthew Helmke Download
100% (2)
The Official Ubuntu Book Matthew Helmke Download
56 pages
Avid HiRes Workflow 62010
No ratings yet
Avid HiRes Workflow 62010
190 pages
Daily Updated TV Shows From
No ratings yet
Daily Updated TV Shows From
1 page
S 1 0 Cod24
No ratings yet
S 1 0 Cod24
9 pages
Intro to Software & Hardware Basics
No ratings yet
Intro to Software & Hardware Basics
27 pages
Performance Evaluation and Comparison of Classification Techniques For Outcome Estimation in Strategic Board Games
No ratings yet
Performance Evaluation and Comparison of Classification Techniques For Outcome Estimation in Strategic Board Games
8 pages
Amazon Managed Streaming For Apache Kafka
No ratings yet
Amazon Managed Streaming For Apache Kafka
11 pages
Coder's Arena
No ratings yet
Coder's Arena
3 pages
Grc350 Complete Book
No ratings yet
Grc350 Complete Book
117 pages
Data Warehousing for IT Students
No ratings yet
Data Warehousing for IT Students
64 pages
Encoding Function Error Analysis
No ratings yet
Encoding Function Error Analysis
8 pages
Reactive Patterns with RxJS for Angular A practical guide to managing your Angular application s data reactively and efficiently using RxJS 7 Lamis Chebbi - The full ebook with all chapters is available for download now
No ratings yet
Reactive Patterns with RxJS for Angular A practical guide to managing your Angular application s data reactively and efficiently using RxJS 7 Lamis Chebbi - The full ebook with all chapters is available for download now
58 pages
ECA2+ - Tests - Vocabulary Check 4A - 2018
No ratings yet
ECA2+ - Tests - Vocabulary Check 4A - 2018
2 pages
Unit I Deeplearning
No ratings yet
Unit I Deeplearning
13 pages
Cisco 8000 Routers for Network Pros
No ratings yet
Cisco 8000 Routers for Network Pros
14 pages
PHD Thesis Diagram
100% (1)
PHD Thesis Diagram
7 pages
Computer Abstractions and Technology: Omputer Rganization AND Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization AND Esign
50 pages
SAP Table Authorizations
No ratings yet
SAP Table Authorizations
24 pages
Programming Fundamentals C++ NCCM221
No ratings yet
Programming Fundamentals C++ NCCM221
4 pages
Jetrion 4900M & Jetrion 4900M-330: Flexible and Profi Table Label Printing Solutions
No ratings yet
Jetrion 4900M & Jetrion 4900M-330: Flexible and Profi Table Label Printing Solutions
4 pages
Basic Block and Flow Graph
No ratings yet
Basic Block and Flow Graph
11 pages
Writing Assignment 5
No ratings yet
Writing Assignment 5
1 page
Podcasts, Voice Tools and Audio Recorders
No ratings yet
Podcasts, Voice Tools and Audio Recorders
39 pages
Cryptography
No ratings yet
Cryptography
23 pages
sVMS2000 User Manual V3.0.0.53
No ratings yet
sVMS2000 User Manual V3.0.0.53
15 pages
Riyad Ankeh: Mechanical Engineering Student
No ratings yet
Riyad Ankeh: Mechanical Engineering Student
1 page
Final Demo Zenith API
No ratings yet
Final Demo Zenith API
26 pages
Cyber Attacks
No ratings yet
Cyber Attacks
18 pages

BCS702 Module1 Detailed Notes

Uploaded by

BCS702 Module1 Detailed Notes

Uploaded by

BCS702 – Module 1 Notes

Parallel Hardware and Parallel Software

- SISD: Single instruction, single data

- SIMD: Single instruction, multiple data

- MIMD: Multiple instruction, multiple data

Another classification is based on memory access:

- Shared memory: Cores access common memory

Used in image processing, simulations, and vector math operations.

Limitation: All data streams must execute the same instruction.

Vector processors and GPUs are examples of SIMD systems.

Processors run asynchronously, suitable for complex and diverse tasks.

- Shared-memory systems (e.g., multicore CPUs)

- Distributed-memory systems (e.g., clusters of computers)

Shared-memory systems use buses and crossbars.

Distributed-memory systems use rings, meshes, hypercubes, and omega networks.

- Latency: Delay before data starts transferring

- Bandwidth: Rate at which data transfers

- Snooping: Cores monitor a shared bus for updates

- Directory-based: Each memory block tracks which cores have cached it

Distributed-memory: Harder to program, but scales better with more processors.

Programmers need to manage synchronization, communication, and potential errors like

Load balancing and communication minimization are key.

Parallelizing is the act of converting a serial program to run in parallel.

- Dynamic (created/destroyed as needed)

- Static (created once and reused)

- Nondeterminism: Output may vary by run

- Race conditions: Conflicting writes to the same variable

Solutions: Mutexes (locks), semaphores, and monitors.

Message-passing APIs include send() and receive() functions.

MPI (Message Passing Interface) is the most popular API.

Drawback: Requires significant program restructuring.

GPU programming is heterogeneous:

- Code runs on both CPU (host) and GPU (device)

- Data must be transferred between host and device

Popular platforms: CUDA and OpenCL

You might also like