0% found this document useful (0 votes)

32 views8 pages

Parallel and Distributed Systems Assignment

The document outlines an assignment brief for a student at Cavendish University, focusing on Parallel and Distributed Systems. It includes instructions for submission, assessment criteria, and specific topics to cover, such as Google's distribution model, MapReduce, and Apache Hadoop. Additionally, it discusses methods for message routing in parallel machines, emphasizing the importance of efficient communication in distributed computing.

Uploaded by

bumdaddush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views8 pages

Parallel and Distributed Systems Assignment

Uploaded by

bumdaddush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

CAVENDISH UNIVERSITY – ZAMBIA

ASSIGNMENT BRIEF AND FEEDBACK FORM

STUDENT No. 110 174

LECTURER:

MODULE: Parallel and Distributed system

MODULE CODE: IT 222

ASSIGNMENT NUMBER: One

DATE HANDED OUT:

DATE DUE IN: 2/04/2025

ASSIGNMENT BRIEF/

STUDENT INSTRUCTIONS

1. This form must be attached to the front of your assignment.

2. The assignment must be handed in without fail by submission date (see assessment schedule for your course)

3. Ensure that submission date is date stamped by the reception stuff when you hand it in.

4. Late submission will not be entertained unless with prior agreement with the tutor

5. All assessable assignments must be word processed.

This assignment is intended to assess the student’s knowledge in all of the following areas.
However, greater emphasis should be given to those item marked with a

(Tutor: - please tick as applicable)

SL ASSESSMENT SKILLS Please Tick

No
1 Good and adequate interpretation of the question

2 Knowledge and application of the relevant theories

3 Use of relevant and practical examples to back up theories

4 Ability to transfer and relate subject topic to each other

5 Application and use of appropriate models

6 Evidence of library research

7 Knowledge of theories

8 Written business English communication skills

9 Use of visual (graphs) communication

10 Self-assessed ‘time management’

11 Evidence of field research

Tutor’s Marks contribution

(Administrative only)
LECTURER’S FEEDBACK

1. Describe Google distribution model and Map Reduce.

2. Explain Apache Hadoop and its ecosystem.

3. Briefly discuss the methods for message routing in parallel machines.

GOOGLE DISTRIBUTION MODLE & MAP REDUCE

GOOGLE DISTRIBUTION

The Google Distribution Model describes the company's method to distributing work, data, and
processing resources throughout its extensive infrastructure. This strategy integrates several essential
principles, including data sharding, which divides enormous datasets into smaller, more manageable
chunks known as shards that may be stored and processed on numerous servers. Load balancing ensures
that workloads are spread evenly across available resources, maximizing usage and avoiding overburden.
Furthermore, Google's strategy stresses fault tolerance, including techniques such as replication and
checkpointing to assure continuous operation despite hardware failures. Finally, scalability is key,
allowing Google to scale up or down in response to changing workloads, which is essential for services
such as Google Search and Google Ads.

MAP REDUCE

MapReduce is a programming technique for processing huge datasets across distributed computers.
MapReduce simplifies large-scale data processing by separating difficult operations into two basic
phases, Map and Reduce. The Map phase begins with the input data being broken down into key-value
pairs, which are then mapped to produce intermediate key-value pairs. These pairs are then grouped by
key to prepare for the next phase. The Shuffle phase then redistributes the intermediate data, ensuring
that all values associated with the same key get to the same reducer.The gathered data is then processed in
the Reduce phase to create the final output, which usually entails information aggregation or summary. A
key component of big data processing, this scalable and adaptable framework makes it possible to process
massive datasets across machine clusters in an efficient manner.

PAGE \* MERGEFORMAT 4
APACHE HADOOP & ITS ECO SYSTEM

Apache Hadoop is an important open-source framework for the distributing , processing and
storage of large datasets across multiple computers. Hadoop, developed by the Apache
Software Foundation, is commonly recognised for its scalability, fault tolerance, and low cost
due to its use of commodity hardware. This makes it an affordable option for big data
applications. Hadoop is built around a few key components. The Hadoop Distributed File
System (HDFS) is the storage layer that distributes and replicates data across multiple nodes to
ensure high availability and throughput. Yet Another Resource Negotiator serves as a resource
management and job scheduling framework, continuously allocating resources to applications
and supervising task completion. MapReduce is Hadoop's programming model for processing
large datasets in parallel, dividing tasks into "map" and "reduce" operations to allow for efficient
computation.

The advantages of Hadoop are numerous. It may expand from one node to thousands of nodes
thanks to its scalability. It offers adaptability in managing a variety of datasets by supporting a
broad range of data types, including unstructured, semi-structured, and structured data. By
duplicating data among nodes and reducing data loss in the event of hardware failures, its fault-
tolerant design guarantees dependability. Hadoop is also an affordable option for companies due
to its compatibility with commodity hardware. Hadoop continues to be a key tool in the big data
world, offering a strong basis for sophisticated analytics and data-driven insights, even in the
face of competition from more recent technologies like Apache Spark. Its heritage keeps
influencing contemporary frameworks for data processing, guaranteeing its applicability in the
ever changing digital environment.

The Hadoop ecosystem is an all-inclusive framework that expands on Hadoop's fundamental

features, facilitating effective data processing, storing, and analysing in distributed computing
settings. It contains a range of frameworks and tools designed for particular big data tasks. On
top of HDFS, Apache HBase functions as a distributed NoSQL database that offers great
scalability for sparse data and real-time access to large datasets. With its SQL-like interface,
Apache Hive makes data warehousing easier by converting queries into MapReduce processes
for smooth big data processing. For processing huge datasets without the complexity of Java-

PAGE \* MERGEFORMAT 4
based MapReduce programming, Apache Pig provides a scripting language called Pig Latin.
Apache Spark, on the other hand, supports a variety of workloads like machine learning, real-
time streaming, and graph processing, and improves performance through in-memory
processing.

Additional key components are Apache Sqoop, which is intended for mass data transfers
between relational databases and Hadoop, and Apache Flume, which makes it easier to integrate
streaming data, like logs, into HDFS. In Hadoop systems, Apache Oozie serves as a workflow
scheduler to handle intricate job sequences. Furthermore, Apache Zookeeper ensures
synchronisation and dependability by offering centralised coordination for distributed systems.
These technologies work together to provide a strong ecosystem that enables developers and
analysts to effectively manage massive amounts of data while utilising Hadoop's distributed
architecture.

BRIEFLY DISCUSS THE METHODS FOR MESSAGE ROUTING IN PARALLEL

MACHINES

Message routing in parallel machines plays a vital role in distributed computing by ensuring efficient
communication between processors or nodes. This process involves transferring data throughout the
system to support parallel computation, which is crucial for tackling complex challenges in areas such as
scientific simulations, data analytics, and machine learning. Message routing techniques can generally be
divided into direct communication, indirect communication, dynamic routing, and specialized methods,
with each offering distinct advantages and use cases.

METHODS FOR MESSAGE ROUTING

ADAPTIVE ROUTING

Message pathways are dynamically changed by adaptive routing in response to current network
conditions, such as failures or congestion. By selecting alternate routes when necessary, adaptive
algorithms improve communication compared to static routing techniques, which rely on set paths. This

PAGE \* MERGEFORMAT 4
maximises the effective use of network resources and enhances fault tolerance. In large-scale parallel
systems, where network conditions might vary significantly, adaptive routing is particularly
advantageous.

DIRECT COMMUNICATION

Sending messages straight from the source node to the destination node without the involvement of
middlemen is known as direct communication. The simplest method is point-to-point routing, in which
every node speaks with its target directly. Because each node can only accept a certain amount of direct
connections, this approach may not scale well for bigger systems, despite being easy to create. Static
routing is an additional kind of direct communication in which messages are transferred over
predetermined pathways. Static routing is less flexible in settings where network conditions change
often, even while it lowers overhead by doing away with the necessity for dynamic path determination.

INDIRECT COMMUNICATION

On the other hand, indirect communication techniques entail sending messages via intermediary nodes
before arriving at their destination. Store-and-forward routing is a popular method in this category, in
which the message is momentarily stored by each intermediate node before being forwarded to the
subsequent node. This technique allows messages to reach their destination even in the absence of a
direct channel, making it particularly helpful in networks with limited direct connectivity.

Packet switching is an additional indirect technique in which communications are divided into smaller
packets and routed separately. Although it necessitates dependable means to reassemble the packets at
the destination, this strategy increases flexibility and efficiency by permitting packets to travel multiple
paths.

PAGE \* MERGEFORMAT 4
REFERENCES

1. Aggarwal, G., Ailon, N., Constantin, F., et al. (2008) 'Theory Research at Google', ACM
SIGACT News, vol. 39, no. 2, pp. 10–39. Available at:
https://static.googleusercontent.com/media/research.google.com/en/pubs/archive/36011.pdf
[Accessed 2 April 2025].

2. Crego, R.D., Stabach, J.A., Connette, G., and Grant (2022) 'Implementation of species
distribution models in Google Earth Engine', Diversity and Distributions, vol. 28, no. 1, pp. 134–
150. Available at: https://onlinelibrary.wiley.com/doi/10.1111/ddi.13491 [Accessed 2 April
2025].

3. Google Research (2025) 'Publications – Distributed Systems and Parallel Computing'.

Available at: https://research.google/pubs/?area=distributed-systems-and-parallel-computing
[Accessed 2 April 2025].

4. Google Developers (2025) 'Species Distribution Modeling | Google for Developers'.

Available at: https://developers.google.com/earth-engine/tutorials/community/species-
distribution-modeling/species-distribution-modeling [Accessed 2 April 2025].

5. Huyen Chip (2022) 'Data Distribution Shifts and Monitoring'. Available at:
https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html [Accessed 2
April 2025].

PAGE \* MERGEFORMAT 4
PAGE \* MERGEFORMAT 4

Week 02
No ratings yet
Week 02
115 pages
I Am Preparing For A Big Data Analytics University...
No ratings yet
I Am Preparing For A Big Data Analytics University...
15 pages
Data Analyst
No ratings yet
Data Analyst
9 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Spark Introduction
No ratings yet
Spark Introduction
90 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
Asset-V1 E-SHE+EX101+Q1+Type@Asset+Block@Chapter2 Session 4 PDF
No ratings yet
Asset-V1 E-SHE+EX101+Q1+Type@Asset+Block@Chapter2 Session 4 PDF
8 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Introduction To Big DAta
No ratings yet
Introduction To Big DAta
2 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Scheduling For Hadoop Cluster
No ratings yet
Scheduling For Hadoop Cluster
5 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Unit 5
No ratings yet
Unit 5
32 pages
Hadoop & MapReduce Overview
No ratings yet
Hadoop & MapReduce Overview
18 pages
Big Data Technology E1UJ502B
No ratings yet
Big Data Technology E1UJ502B
11 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
Hadoop
No ratings yet
Hadoop
93 pages
Big Data
No ratings yet
Big Data
29 pages
Hadoop Introduction
No ratings yet
Hadoop Introduction
29 pages
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
No ratings yet
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
9 pages
Data Engineering Essentials
No ratings yet
Data Engineering Essentials
61 pages
Unit 4 Map Reduce
No ratings yet
Unit 4 Map Reduce
7 pages
BDA Lec5
No ratings yet
BDA Lec5
40 pages
Hadoop
No ratings yet
Hadoop
5 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Big Data Analytics
No ratings yet
Big Data Analytics
50 pages
Cloud Notes - Unit - 5
No ratings yet
Cloud Notes - Unit - 5
31 pages
PDC Lecture 13
No ratings yet
PDC Lecture 13
32 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Day1 CloudComputing
No ratings yet
Day1 CloudComputing
37 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Unit 4 Endsem PYQs
No ratings yet
Unit 4 Endsem PYQs
24 pages
Sub Unit 3
No ratings yet
Sub Unit 3
9 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
Biggdata
No ratings yet
Biggdata
24 pages
Part2 HDFS
No ratings yet
Part2 HDFS
33 pages
Unit-4 CC
No ratings yet
Unit-4 CC
72 pages
Chapter 7
No ratings yet
Chapter 7
51 pages
DA
No ratings yet
DA
51 pages
Bda 2
No ratings yet
Bda 2
35 pages
Unit-I Material
No ratings yet
Unit-I Material
32 pages
Big Data
No ratings yet
Big Data
8 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Inside Cloud - Case Study
No ratings yet
Inside Cloud - Case Study
11 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
Optimize Small Files in Hadoop
No ratings yet
Optimize Small Files in Hadoop
62 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
25 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
BDA Unit 3
No ratings yet
BDA Unit 3
7 pages
Unit 3 & 4 Big Data
No ratings yet
Unit 3 & 4 Big Data
18 pages
A Common-Sense Pragmatic Theory of Truth
No ratings yet
A Common-Sense Pragmatic Theory of Truth
19 pages
References
No ratings yet
References
2 pages
AutoCAD Basics and Tools Guide
No ratings yet
AutoCAD Basics and Tools Guide
2 pages
Project Report On Pizza Hut
No ratings yet
Project Report On Pizza Hut
31 pages
MLT Quantum
No ratings yet
MLT Quantum
163 pages
Ch1 Making Measurements PDF
No ratings yet
Ch1 Making Measurements PDF
19 pages
Advance Java First Presentation
No ratings yet
Advance Java First Presentation
8 pages
Standard Specifications For Road Works SERIES 6000-Structures
No ratings yet
Standard Specifications For Road Works SERIES 6000-Structures
65 pages
Brochure FX 404
No ratings yet
Brochure FX 404
3 pages
OOP Polymorphism Essentials
No ratings yet
OOP Polymorphism Essentials
22 pages
Math 2 (1st 2nd) May2023
No ratings yet
Math 2 (1st 2nd) May2023
3 pages
MIL std275 PDF
No ratings yet
MIL std275 PDF
50 pages
HOYA 2023 CatalogueV2
No ratings yet
HOYA 2023 CatalogueV2
296 pages
Dimensions Specifications: 125 Booster Pump Control
No ratings yet
Dimensions Specifications: 125 Booster Pump Control
6 pages
(Answer Key) Math Final Exam Review
No ratings yet
(Answer Key) Math Final Exam Review
15 pages
Exercise FR Centum VP
100% (1)
Exercise FR Centum VP
5 pages
Holiday Homework Class 8TH (Mathematics)
100% (1)
Holiday Homework Class 8TH (Mathematics)
6 pages
Graphwork
No ratings yet
Graphwork
2 pages
Aquamatic Stager Valves: Model Number Body Material Number of Ports Typical Applications
No ratings yet
Aquamatic Stager Valves: Model Number Body Material Number of Ports Typical Applications
3 pages
TSN2101 - Operating Systems: Trimester 1, 2013/2014
No ratings yet
TSN2101 - Operating Systems: Trimester 1, 2013/2014
42 pages
Type 3730-1 Electropneumatic Positioner
No ratings yet
Type 3730-1 Electropneumatic Positioner
112 pages
UNIT 11.4 - Interpreting Graphs
No ratings yet
UNIT 11.4 - Interpreting Graphs
29 pages
Book List Class XI (2024-25)
No ratings yet
Book List Class XI (2024-25)
2 pages
Experiment # 10 Study and Implementation of Series DC Circuit Using Multisim
No ratings yet
Experiment # 10 Study and Implementation of Series DC Circuit Using Multisim
5 pages
SImplification
No ratings yet
SImplification
23 pages
Fallacious Appeal to Authority Guide
No ratings yet
Fallacious Appeal to Authority Guide
4 pages
A Critical Review of Mine Subsidence Prediction Methods PDF
100% (3)
A Critical Review of Mine Subsidence Prediction Methods PDF
14 pages
Macro Problem Set 2
No ratings yet
Macro Problem Set 2
3 pages
Concrete Core Strength Report
No ratings yet
Concrete Core Strength Report
1 page
Escripting Guide
No ratings yet
Escripting Guide
28 pages

Parallel and Distributed Systems Assignment

Uploaded by

Parallel and Distributed Systems Assignment

Uploaded by

CAVENDISH UNIVERSITY – ZAMBIA

ASSIGNMENT BRIEF AND FEEDBACK FORM

STUDENT No. 110 174

MODULE: Parallel and Distributed system

MODULE CODE: IT 222

ASSIGNMENT NUMBER: One

DATE HANDED OUT:

DATE DUE IN: 2/04/2025

1. This form must be attached to the front of your assignment.

5. All assessable assignments must be word processed.

(Tutor: - please tick as applicable)

SL ASSESSMENT SKILLS Please Tick

2 Knowledge and application of the relevant theories

3 Use of relevant and practical examples to back up theories

4 Ability to transfer and relate subject topic to each other

5 Application and use of appropriate models

6 Evidence of library research

8 Written business English communication skills

9 Use of visual (graphs) communication

10 Self-assessed ‘time management’

11 Evidence of field research

1. Describe Google distribution model and Map Reduce.

2. Explain Apache Hadoop and its ecosystem.

3. Briefly discuss the methods for message routing in parallel machines.

The Hadoop ecosystem is an all-inclusive framework that expands on Hadoop's fundamental

BRIEFLY DISCUSS THE METHODS FOR MESSAGE ROUTING IN PARALLEL

METHODS FOR MESSAGE ROUTING

3. Google Research (2025) 'Publications – Distributed Systems and Parallel Computing'.

4. Google Developers (2025) 'Species Distribution Modeling | Google for Developers'.

You might also like