[go: up one dir, main page]

0% found this document useful (0 votes)
32 views8 pages

Parallel and Distributed Systems Assignment

The document outlines an assignment brief for a student at Cavendish University, focusing on Parallel and Distributed Systems. It includes instructions for submission, assessment criteria, and specific topics to cover, such as Google's distribution model, MapReduce, and Apache Hadoop. Additionally, it discusses methods for message routing in parallel machines, emphasizing the importance of efficient communication in distributed computing.

Uploaded by

bumdaddush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

Parallel and Distributed Systems Assignment

The document outlines an assignment brief for a student at Cavendish University, focusing on Parallel and Distributed Systems. It includes instructions for submission, assessment criteria, and specific topics to cover, such as Google's distribution model, MapReduce, and Apache Hadoop. Additionally, it discusses methods for message routing in parallel machines, emphasizing the importance of efficient communication in distributed computing.

Uploaded by

bumdaddush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

CAVENDISH UNIVERSITY – ZAMBIA

ASSIGNMENT BRIEF AND FEEDBACK FORM

STUDENT No. 110 174

LECTURER:

MODULE: Parallel and Distributed system

MODULE CODE: IT 222

ASSIGNMENT NUMBER: One

DATE HANDED OUT:

DATE DUE IN: 2/04/2025

ASSIGNMENT BRIEF/

STUDENT INSTRUCTIONS

1. This form must be attached to the front of your assignment.

2. The assignment must be handed in without fail by submission date (see assessment schedule for your course)

3. Ensure that submission date is date stamped by the reception stuff when you hand it in.

4. Late submission will not be entertained unless with prior agreement with the tutor

5. All assessable assignments must be word processed.


This assignment is intended to assess the student’s knowledge in all of the following areas.
However, greater emphasis should be given to those item marked with a

(Tutor: - please tick as applicable)

SL ASSESSMENT SKILLS Please Tick


No
1 Good and adequate interpretation of the question

2 Knowledge and application of the relevant theories

3 Use of relevant and practical examples to back up theories

4 Ability to transfer and relate subject topic to each other

5 Application and use of appropriate models

6 Evidence of library research

7 Knowledge of theories

8 Written business English communication skills

9 Use of visual (graphs) communication

10 Self-assessed ‘time management’

11 Evidence of field research


Tutor’s Marks contribution

(Administrative only)
LECTURER’S FEEDBACK

1. Describe Google distribution model and Map Reduce.

2. Explain Apache Hadoop and its ecosystem.

3. Briefly discuss the methods for message routing in parallel machines.


GOOGLE DISTRIBUTION MODLE & MAP REDUCE

GOOGLE DISTRIBUTION

The Google Distribution Model describes the company's method to distributing work, data, and
processing resources throughout its extensive infrastructure. This strategy integrates several essential
principles, including data sharding, which divides enormous datasets into smaller, more manageable
chunks known as shards that may be stored and processed on numerous servers. Load balancing ensures
that workloads are spread evenly across available resources, maximizing usage and avoiding overburden.
Furthermore, Google's strategy stresses fault tolerance, including techniques such as replication and
checkpointing to assure continuous operation despite hardware failures. Finally, scalability is key,
allowing Google to scale up or down in response to changing workloads, which is essential for services
such as Google Search and Google Ads.

MAP REDUCE

MapReduce is a programming technique for processing huge datasets across distributed computers.
MapReduce simplifies large-scale data processing by separating difficult operations into two basic
phases, Map and Reduce. The Map phase begins with the input data being broken down into key-value
pairs, which are then mapped to produce intermediate key-value pairs. These pairs are then grouped by
key to prepare for the next phase. The Shuffle phase then redistributes the intermediate data, ensuring
that all values associated with the same key get to the same reducer.The gathered data is then processed in
the Reduce phase to create the final output, which usually entails information aggregation or summary. A
key component of big data processing, this scalable and adaptable framework makes it possible to process
massive datasets across machine clusters in an efficient manner.

PAGE \* MERGEFORMAT 4
APACHE HADOOP & ITS ECO SYSTEM

Apache Hadoop is an important open-source framework for the distributing , processing and
storage of large datasets across multiple computers. Hadoop, developed by the Apache
Software Foundation, is commonly recognised for its scalability, fault tolerance, and low cost
due to its use of commodity hardware. This makes it an affordable option for big data
applications. Hadoop is built around a few key components. The Hadoop Distributed File
System (HDFS) is the storage layer that distributes and replicates data across multiple nodes to
ensure high availability and throughput. Yet Another Resource Negotiator serves as a resource
management and job scheduling framework, continuously allocating resources to applications
and supervising task completion. MapReduce is Hadoop's programming model for processing
large datasets in parallel, dividing tasks into "map" and "reduce" operations to allow for efficient
computation.

The advantages of Hadoop are numerous. It may expand from one node to thousands of nodes
thanks to its scalability. It offers adaptability in managing a variety of datasets by supporting a
broad range of data types, including unstructured, semi-structured, and structured data. By
duplicating data among nodes and reducing data loss in the event of hardware failures, its fault-
tolerant design guarantees dependability. Hadoop is also an affordable option for companies due
to its compatibility with commodity hardware. Hadoop continues to be a key tool in the big data
world, offering a strong basis for sophisticated analytics and data-driven insights, even in the
face of competition from more recent technologies like Apache Spark. Its heritage keeps
influencing contemporary frameworks for data processing, guaranteeing its applicability in the
ever changing digital environment.

The Hadoop ecosystem is an all-inclusive framework that expands on Hadoop's fundamental


features, facilitating effective data processing, storing, and analysing in distributed computing
settings. It contains a range of frameworks and tools designed for particular big data tasks. On
top of HDFS, Apache HBase functions as a distributed NoSQL database that offers great
scalability for sparse data and real-time access to large datasets. With its SQL-like interface,
Apache Hive makes data warehousing easier by converting queries into MapReduce processes
for smooth big data processing. For processing huge datasets without the complexity of Java-

PAGE \* MERGEFORMAT 4
based MapReduce programming, Apache Pig provides a scripting language called Pig Latin.
Apache Spark, on the other hand, supports a variety of workloads like machine learning, real-
time streaming, and graph processing, and improves performance through in-memory
processing.

Additional key components are Apache Sqoop, which is intended for mass data transfers
between relational databases and Hadoop, and Apache Flume, which makes it easier to integrate
streaming data, like logs, into HDFS. In Hadoop systems, Apache Oozie serves as a workflow
scheduler to handle intricate job sequences. Furthermore, Apache Zookeeper ensures
synchronisation and dependability by offering centralised coordination for distributed systems.
These technologies work together to provide a strong ecosystem that enables developers and
analysts to effectively manage massive amounts of data while utilising Hadoop's distributed
architecture.

BRIEFLY DISCUSS THE METHODS FOR MESSAGE ROUTING IN PARALLEL


MACHINES

Message routing in parallel machines plays a vital role in distributed computing by ensuring efficient
communication between processors or nodes. This process involves transferring data throughout the
system to support parallel computation, which is crucial for tackling complex challenges in areas such as
scientific simulations, data analytics, and machine learning. Message routing techniques can generally be
divided into direct communication, indirect communication, dynamic routing, and specialized methods,
with each offering distinct advantages and use cases.

METHODS FOR MESSAGE ROUTING

ADAPTIVE ROUTING

Message pathways are dynamically changed by adaptive routing in response to current network
conditions, such as failures or congestion. By selecting alternate routes when necessary, adaptive
algorithms improve communication compared to static routing techniques, which rely on set paths. This

PAGE \* MERGEFORMAT 4
maximises the effective use of network resources and enhances fault tolerance. In large-scale parallel
systems, where network conditions might vary significantly, adaptive routing is particularly
advantageous.

DIRECT COMMUNICATION

Sending messages straight from the source node to the destination node without the involvement of
middlemen is known as direct communication. The simplest method is point-to-point routing, in which
every node speaks with its target directly. Because each node can only accept a certain amount of direct
connections, this approach may not scale well for bigger systems, despite being easy to create. Static
routing is an additional kind of direct communication in which messages are transferred over
predetermined pathways. Static routing is less flexible in settings where network conditions change
often, even while it lowers overhead by doing away with the necessity for dynamic path determination.

INDIRECT COMMUNICATION

On the other hand, indirect communication techniques entail sending messages via intermediary nodes
before arriving at their destination. Store-and-forward routing is a popular method in this category, in
which the message is momentarily stored by each intermediate node before being forwarded to the
subsequent node. This technique allows messages to reach their destination even in the absence of a
direct channel, making it particularly helpful in networks with limited direct connectivity.

Packet switching is an additional indirect technique in which communications are divided into smaller
packets and routed separately. Although it necessitates dependable means to reassemble the packets at
the destination, this strategy increases flexibility and efficiency by permitting packets to travel multiple
paths.

PAGE \* MERGEFORMAT 4
REFERENCES

1. Aggarwal, G., Ailon, N., Constantin, F., et al. (2008) 'Theory Research at Google', ACM
SIGACT News, vol. 39, no. 2, pp. 10–39. Available at:
https://static.googleusercontent.com/media/research.google.com/en/pubs/archive/36011.pdf
[Accessed 2 April 2025].

2. Crego, R.D., Stabach, J.A., Connette, G., and Grant (2022) 'Implementation of species
distribution models in Google Earth Engine', Diversity and Distributions, vol. 28, no. 1, pp. 134–
150. Available at: https://onlinelibrary.wiley.com/doi/10.1111/ddi.13491 [Accessed 2 April
2025].

3. Google Research (2025) 'Publications – Distributed Systems and Parallel Computing'.


Available at: https://research.google/pubs/?area=distributed-systems-and-parallel-computing
[Accessed 2 April 2025].

4. Google Developers (2025) 'Species Distribution Modeling | Google for Developers'.


Available at: https://developers.google.com/earth-engine/tutorials/community/species-
distribution-modeling/species-distribution-modeling [Accessed 2 April 2025].

5. Huyen Chip (2022) 'Data Distribution Shifts and Monitoring'. Available at:
https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html [Accessed 2
April 2025].

PAGE \* MERGEFORMAT 4
PAGE \* MERGEFORMAT 4

You might also like