0% found this document useful (0 votes)

5 views4 pages

What Is MapReduce

Uploaded by

yosery834

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

What Is MapReduce

Uploaded by

yosery834

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

‫بسم هللا الرحمن الرحيم‬

our Agenda
Today, we’re going to cover the following points
MapReduce general view

Why do we need MapReduce?

How MapReduce Works?

Is MapReduce outdated?
References

so
What is MapReduce?

• MapReduce is the main batch processing framework from the Apache Hadoop project.

• Originally developed by Google and described in 2004.

• Later implemented in Apache Hadoop by Doug Cutting (2006) with the first release in
2007.

• It is a distributed framework that processes large datasets across multiple machines

(commodity hardware).

• It has three main phases:

o Map: Filters or transforms the input data.

o Shuffle: Organizes data to send related pieces together.

o Reduce: Aggregates or processes the grouped data into results.

Why do we need MapReduce?

• Handles massive data: Can process datasets ranging from megabytes to petabytes with
the same code.

• Runs on normal hardware: No need for expensive specialized servers.

• Foundation for Big Data evolution: Enabled the rise of Big Data technologies and
adoption by major companies like Yahoo, Facebook, IBM, LinkedIn, etc.

• Inspired easier frameworks: Tools like Apache Hive (developed at Facebook) made it
easier for people with SQL knowledge to harness MapReduce power.

:
Now les see How MapReduce Works?

Map Side (Input → Intermediate Output)

1. Input Splitting

o The input file is split into blocks (e.g., 64MB), and each is assigned to a Map
task.

2. Map Execution

o Each Map task reads its input and applies the Map function, producing key-
value pairs (e.g., <word, 1>).

3. Buffering & Spilling

o Intermediate outputs are stored in memory (100MB buffer).

o When ~80% full, data is sorted, optionally combined, and spilled to disk.

4. Merging Spill Files

o All spill files are merged into a single sorted & partitioned output file (one per
reducer).

o Optional compression can reduce disk usage and network transfer.

5. Map Completion

o The Map task notifies the system that output is ready for reducers.

Shuffle Phase (Data Movement)

• Reducers fetch and copy intermediate data from all relevant Map tasks.

• This includes sorting, merging, and optional combiner steps.

• Data is grouped by key and routed to the appropriate reducer.

Reduce Side (Aggregation → Final Output)

1. Fetching & Buffering

o Reducers collect outputs from multiple Mappers (stored in memory or disk
depending on size).

2. Merging

o Intermediate files are merged, maintaining sorted order by key.

3. Reduce Function Execution

o The reduce function runs once per unique key, aggregating all values (e.g.,
summing counts).

4. Final Output

o Results are written to HDFS or another output destination.

Now lets see Is MapReduce outdated?

1. Performance:

• Its Slower compared to modern technologies like Apache Spark.

• MapReduce writes intermediate data to disk, which slows down processing,

while Spark processes data in-memory, making it much faster.

2. Use Cases:

• Best suited for:

o Large batch processing (e.g., massive log analysis, ETL jobs).

• And Not ideal for:

o Real-time analytics.

o Applications requiring fast, near-instant responses or interactive machine

learning.

4. Popularity:
• Its popularity has significantly declined with the rise of Apache Spark, Apache Flink, and
other modern Big Data technologies.

• Most new projects favor faster, more flexible frameworks.

5. Flexibility:

• It has Limited flexibility:

o Difficult to handle iterative or real-time data processing.

o Strict processing flow (Map ➔ Shuffle ➔ Reduce) that is hard to adapt or

optimize dynamically.

6. Current Use:

• Still actively used in:

o Older Hadoop-based infrastructures.

o Cost-sensitive environments where upgrading to faster solutions is not yet

necessary.

So we can say
➔ MapReduce is not completely outdated, but it is no longer the first choice for
modern Big Data processing needs.

Bigdata All Mid-1
No ratings yet
Bigdata All Mid-1
10 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
MapReduce and Hadoop Overview
No ratings yet
MapReduce and Hadoop Overview
69 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Big Data
No ratings yet
Big Data
120 pages
Big Data & Hadoop Overview
No ratings yet
Big Data & Hadoop Overview
44 pages
3 Unit
No ratings yet
3 Unit
17 pages
PDC Lecture 13
No ratings yet
PDC Lecture 13
32 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Unit 3
No ratings yet
Unit 3
22 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Hadoop
No ratings yet
Hadoop
34 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Second Exam Summary
No ratings yet
Second Exam Summary
44 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
BDP 2024 09
No ratings yet
BDP 2024 09
24 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
MapReduce BigData 09
No ratings yet
MapReduce BigData 09
9 pages
BDA Unit-2
100% (1)
BDA Unit-2
11 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Data Science
No ratings yet
Data Science
7 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Own Answer 2
No ratings yet
Own Answer 2
22 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
Introduction To Batch Processing
No ratings yet
Introduction To Batch Processing
23 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
Big Data
No ratings yet
Big Data
12 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
M5
No ratings yet
M5
18 pages
Bwu BTD 21 079-Pratap
No ratings yet
Bwu BTD 21 079-Pratap
9 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
الوصف العلمي لمهام التدريب
No ratings yet
الوصف العلمي لمهام التدريب
2 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
15 pages
In The Beginning
No ratings yet
In The Beginning
2 pages
Tran Z Former
No ratings yet
Tran Z Former
4 pages
API Concepts
No ratings yet
API Concepts
2 pages
Ai With Aws
No ratings yet
Ai With Aws
9 pages
SPLK 1001 Questions
No ratings yet
SPLK 1001 Questions
5 pages
Jinja2 Hands-On Exercises
No ratings yet
Jinja2 Hands-On Exercises
4 pages
J For C Programmers Henry Rich Instant Download
No ratings yet
J For C Programmers Henry Rich Instant Download
77 pages
Docsity Sellers Guidelines Earnings and Withdrawals
No ratings yet
Docsity Sellers Guidelines Earnings and Withdrawals
10 pages
Materiels XR
No ratings yet
Materiels XR
5 pages
DBMS LAB MANUAL FINAL (AutoRecovered)
No ratings yet
DBMS LAB MANUAL FINAL (AutoRecovered)
46 pages
Cybersecurity Red Team Audit Part1
No ratings yet
Cybersecurity Red Team Audit Part1
2 pages
Aksum University ICT Policy
No ratings yet
Aksum University ICT Policy
20 pages
Flytrap-FirmwareUpgradeProcedures Linksys WRT300N v2 fw2 00 08
No ratings yet
Flytrap-FirmwareUpgradeProcedures Linksys WRT300N v2 fw2 00 08
8 pages
IT Model Exam With Brief Answers
No ratings yet
IT Model Exam With Brief Answers
98 pages
Phase 3 Xii Ip (24-12-2024) Set A
No ratings yet
Phase 3 Xii Ip (24-12-2024) Set A
9 pages
At Flat HMP-1
No ratings yet
At Flat HMP-1
6 pages
Railclamp Low Capacitance Tvs Diode Array: Protection Products Description Features
No ratings yet
Railclamp Low Capacitance Tvs Diode Array: Protection Products Description Features
13 pages
Lecture Notes w4 - Scrum and Kanban
No ratings yet
Lecture Notes w4 - Scrum and Kanban
65 pages
Ai - DT
No ratings yet
Ai - DT
52 pages
Modbus Protocol for Engineers
No ratings yet
Modbus Protocol for Engineers
4 pages
Domain: Embedded System Technology: AI: Web: Further Details Contact: 08772261612 Email: +91-9030333433
No ratings yet
Domain: Embedded System Technology: AI: Web: Further Details Contact: 08772261612 Email: +91-9030333433
5 pages
UCSM GUI Configuration Guide
No ratings yet
UCSM GUI Configuration Guide
6 pages
Predicting The Future With Artificial Inteligence
No ratings yet
Predicting The Future With Artificial Inteligence
10 pages
CIM and CAD/CAM Study Guide
No ratings yet
CIM and CAD/CAM Study Guide
6 pages
Mathematical Notation Guide
No ratings yet
Mathematical Notation Guide
7 pages
Rectifier, +24 VDC, 2175W: Description
No ratings yet
Rectifier, +24 VDC, 2175W: Description
2 pages
Ruwais Refinery Quality Records Procedure
No ratings yet
Ruwais Refinery Quality Records Procedure
9 pages
Beamex MC2 Brochure ENG
No ratings yet
Beamex MC2 Brochure ENG
12 pages
Avid Shortcut Keys PDF
No ratings yet
Avid Shortcut Keys PDF
1 page
Day 2 - MSP Bootcamp Training 201
No ratings yet
Day 2 - MSP Bootcamp Training 201
71 pages
Final Report of Catia 2
No ratings yet
Final Report of Catia 2
38 pages
BYJU'S Onboarding Documents - Vishweshdwivedi76@gmail - Com - 15-03-2022 (Bharathiraja) - Encrypted
No ratings yet
BYJU'S Onboarding Documents - Vishweshdwivedi76@gmail - Com - 15-03-2022 (Bharathiraja) - Encrypted
15 pages
Math Kangaroo Online Exam Guide
No ratings yet
Math Kangaroo Online Exam Guide
2 pages
ZTE BS8932 M1821E Product Description
No ratings yet
ZTE BS8932 M1821E Product Description
18 pages

What Is MapReduce

Uploaded by

What Is MapReduce

Uploaded by

‫بسم هللا الرحمن الرحيم‬

Why do we need MapReduce?

• Originally developed by Google and described in 2004.

• It is a distributed framework that processes large datasets across multiple machines

• It has three main phases:

o Map: Filters or transforms the input data.

o Shuffle: Organizes data to send related pieces together.

o Reduce: Aggregates or processes the grouped data into results.

Why do we need MapReduce?

• Runs on normal hardware: No need for expensive specialized servers.

Map Side (Input → Intermediate Output)

3. Buffering & Spilling

o Intermediate outputs are stored in memory (100MB buffer).

4. Merging Spill Files

o Optional compression can reduce disk usage and network transfer.

Shuffle Phase (Data Movement)

• This includes sorting, merging, and optional combiner steps.

• Data is grouped by key and routed to the appropriate reducer.

Reduce Side (Aggregation → Final Output)

1. Fetching & Buffering

o Intermediate files are merged, maintaining sorted order by key.

3. Reduce Function Execution

o Results are written to HDFS or another output destination.

Now lets see Is MapReduce outdated?

• Its Slower compared to modern technologies like Apache Spark.

• MapReduce writes intermediate data to disk, which slows down processing,

• Best suited for:

o Large batch processing (e.g., massive log analysis, ETL jobs).

• And Not ideal for:

o Applications requiring fast, near-instant responses or interactive machine

• Most new projects favor faster, more flexible frameworks.

• It has Limited flexibility:

o Difficult to handle iterative or real-time data processing.

o Strict processing flow (Map ➔ Shuffle ➔ Reduce) that is hard to adapt or

• Still actively used in:

o Older Hadoop-based infrastructures.

o Cost-sensitive environments where upgrading to faster solutions is not yet

You might also like