Unit 3 Map Reduce

Unit-3-Map-reduce in cloud computing computer science engineering

Uploaded by

cophidden

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views3 pages

Unit 3 Map Reduce

Unit-3-Map-reduce in cloud computing computer science engineering

Uploaded by

cophidden

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

MapReduce is a programming model used in cloud computing to process large data sets with a

distributed algorithm on a cluster. It was introduced by Google and has since become a standard
approach for processing vast amounts of data in a parallel and efficient manner. The model
simplifies data processing across massive clusters of servers.

Key Concepts

1. Map Function: The map function takes an input pair and produces a set of intermediate
key/value pairs. A typical map function processes the input data into smaller, manageable
chunks, which are then distributed across different nodes in the cluster for parallel
processing.
2. Reduce Function: The reduce function takes the intermediate key/value pairs produced
by the map function and merges these pairs to produce the final result. This involves
aggregating the results of the map phase, such as summing values or combining data sets.

Workflow

1. Splitting: The input data is split into chunks that can be processed in parallel by multiple
map tasks.
2. Mapping: Each map task processes a chunk of data and outputs key/value pairs. These
pairs are then shuffled and sorted by key.
3. Shuffling and Sorting: The intermediate key/value pairs are shuffled so that all values
associated with the same key are grouped together. This is necessary for the reduce
phase.
4. Reducing: Reduce tasks take the grouped key/value pairs and process them to produce
the final output.
5. Output: The results from the reduce tasks are combined and written to storage,
completing the MapReduce job.

Advantages of MapReduce

 Scalability: Can handle petabytes of data by distributing the processing across many
servers.
 Fault Tolerance: Can recover from failures, as tasks can be rerun on different nodes if a
node fails.
 Simplicity: Provides a simple model for writing distributed applications without needing
to manage the details of parallelization, fault tolerance, and data distribution
Real-World Applications

MapReduce is used in various real-world applications beyond log analysis, including:

 Indexing web pages: Used by search engines to create an index of the web.
 Analyzing social media: Processing large volumes of social media data to extract trends
and insights.

In the context of MapReduce, a key-value pair is a fundamental data structure used to represent
and process data. Here's a breakdown:
Key: A unique identifier or label associated with a piece of data. It's used to categorize,
sort, and aggregate data.
Value: The actual data or information associated with the key.
Together, the key-value pair provides a way to store, process, and retrieve data efficiently. In
MapReduce, key-value pairs are used throughout the processing pipeline:
1. Input data: The initial data is split into key-value pairs, where each pair represents a single
record or observation.
2. Map phase: The mapper processes each key-value pair, transforming it into a new key-value
pair (or pairs) as output.
3. Shuffle phase: The output key-value pairs from the map phase are partitioned and transferred
to nodes for reduction, based on the key.
4. Reduce phase: The reducer aggregates the key-value pairs with the same key, producing a
final output key-value pair.

Example
Input data: A log file with website traffic data
Key: URL (unique identifier)
Value: Number of visits (data associated with the key)
In this example, the key-value pair would be:
URL (key) : Number of visits (value)
The mapper might process this pair to output a new key-value pair, such as:
URL (key) : 1 (value)
The reducer would then aggregate all the key-value pairs with the same URL key, producing a
final output:
URL (key) : Total number of visits (value)
Key-value pairs are a simple yet powerful concept in MapReduce, enabling efficient data
processing and scalability.

MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Data Science
No ratings yet
Data Science
7 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
MapReduce: Big Data Processing Guide
No ratings yet
MapReduce: Big Data Processing Guide
25 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
MapReduce: Working and Advantages
No ratings yet
MapReduce: Working and Advantages
12 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
MAPREDUCEFRAMEWORK
No ratings yet
MAPREDUCEFRAMEWORK
12 pages
MapReduce for Big Data Enthusiasts
No ratings yet
MapReduce for Big Data Enthusiasts
18 pages
What Is Mapreduce?
No ratings yet
What Is Mapreduce?
3 pages
Lecture 2 - Map Reduce
No ratings yet
Lecture 2 - Map Reduce
20 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Fundamentals of MapReduce With Example
No ratings yet
Fundamentals of MapReduce With Example
2 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
MapReduce BigData 09
No ratings yet
MapReduce BigData 09
9 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
29 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
Map Reduce
No ratings yet
Map Reduce
2 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Practical 1: Data Mining and Business Intelligence Practical-1
No ratings yet
Practical 1: Data Mining and Business Intelligence Practical-1
10 pages
Bwu BTD 21 079-Pratap
No ratings yet
Bwu BTD 21 079-Pratap
9 pages
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
100% (1)
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
15 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Map Reduce
No ratings yet
Map Reduce
1 page
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Big Data
No ratings yet
Big Data
120 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
MapReduce and HDFS Architecture Guide
No ratings yet
MapReduce and HDFS Architecture Guide
9 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
MapReduce: Large-Scale Data Processing
100% (1)
MapReduce: Large-Scale Data Processing
13 pages
Management Consulting Expertise
No ratings yet
Management Consulting Expertise
1 page
ACRIS Service Definition-Baggage Check-In Draft v09
No ratings yet
ACRIS Service Definition-Baggage Check-In Draft v09
36 pages
Introduction To Electrical Circuit Analysis 1st Edition by Ozgur Ergul 1119284937 9781119284932 Digital Version 2025
No ratings yet
Introduction To Electrical Circuit Analysis 1st Edition by Ozgur Ergul 1119284937 9781119284932 Digital Version 2025
90 pages
Review of Related Literature and Studies
No ratings yet
Review of Related Literature and Studies
10 pages
Billing Form for Account Managers
No ratings yet
Billing Form for Account Managers
2 pages
User-Manual-5439926 Govee Smart Humidifier
No ratings yet
User-Manual-5439926 Govee Smart Humidifier
22 pages
Deck API Business Insight - Updated 2024
No ratings yet
Deck API Business Insight - Updated 2024
15 pages
B.Tech C Programming Lab Guide
No ratings yet
B.Tech C Programming Lab Guide
43 pages
KR C4 Na: Controller
No ratings yet
KR C4 Na: Controller
185 pages
Tour Operations Problem Using Kruskal's Algorithm Daa Mini Project Report
No ratings yet
Tour Operations Problem Using Kruskal's Algorithm Daa Mini Project Report
13 pages
Automated University Timetable System
No ratings yet
Automated University Timetable System
8 pages
TTFM100B UK REV232 - Rev1
No ratings yet
TTFM100B UK REV232 - Rev1
97 pages
Emerging Technology
No ratings yet
Emerging Technology
8 pages
Seminar PPT On SHM
67% (3)
Seminar PPT On SHM
30 pages
Fundamentals of Parallel Computing
No ratings yet
Fundamentals of Parallel Computing
42 pages
KCUT WEDM Programming Manual
No ratings yet
KCUT WEDM Programming Manual
26 pages
PCS-902 - X - Instruction Manual - EN - Overseas General - X - R3.00 Distance Relay
No ratings yet
PCS-902 - X - Instruction Manual - EN - Overseas General - X - R3.00 Distance Relay
630 pages
"Dynamutt" Driver Board: For The Dynaco St-70
No ratings yet
"Dynamutt" Driver Board: For The Dynaco St-70
16 pages
Automata Theory T
No ratings yet
Automata Theory T
5 pages
BEE213051 Saad Ahmed BEE213013 Ahsan Ghazanfar
No ratings yet
BEE213051 Saad Ahmed BEE213013 Ahsan Ghazanfar
55 pages
ELE704 - Lecture Notes - II - 04!04!2024
No ratings yet
ELE704 - Lecture Notes - II - 04!04!2024
117 pages
Lesson 4 Research For Content in
No ratings yet
Lesson 4 Research For Content in
32 pages
Pattern Recognition
No ratings yet
Pattern Recognition
12 pages
Industrial Automation Training
No ratings yet
Industrial Automation Training
2 pages
VANTAGE 850dda Manual
No ratings yet
VANTAGE 850dda Manual
32 pages
IV SW MID-1 Objective Set-I
No ratings yet
IV SW MID-1 Objective Set-I
1 page
Mimio 9 SW Manual
No ratings yet
Mimio 9 SW Manual
16 pages
ST Paul Minnesota
No ratings yet
ST Paul Minnesota
4 pages
List Terbaru
No ratings yet
List Terbaru
162 pages
Parallel Computing for Experts
No ratings yet
Parallel Computing for Experts
36 pages

Unit 3 Map Reduce

Uploaded by

Unit 3 Map Reduce

Uploaded by

MapReduce is a programming model used in cloud computing to process large data sets with a

MapReduce is used in various real-world applications beyond log analysis, including:

You might also like